Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-getting-started-sphinx-search

31 Mar 2011

6 min read

Getting Started with Sphinx Search

31 Mar 2011

Sphinx is a full-text search engine. So, before going any further, we need to understand what full-text search is and how it excels over the traditional searching. What is full-text search? Full-text search is one of the techniques for searching a document or database stored on a computer. While searching, the search engine goes through and examines all of the words stored in the document and tries to match the search query against those words. A complete examination of all the words (text) stored in the document is undertaken and hence it is called a full-text search. Full-text search excels in searching large volumes of unstructured text quickly and effectively. It returns pages based on how well they match the user's query. Traditional search To understand the difference between a normal search and full-text search, let's take an example of a MySQL database table and perform searches on it. It is assumed that MySQL Server and phpMyAdmin are already installed on your system. Time for action – normal search in MySQL Open phpMyAdmin in your browser and create a new database called myblog. Select the myblog database: Create a table by executing the following query: CREATE TABLE `posts` ( `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY , `title` VARCHAR( 255 ) NOT NULL , `description` TEXT NOT NULL , `created` DATETIME NOT NULL , `modified` DATETIME NOT NULL ) ENGINE = MYISAM; Queries can be executed from the SQL page in phpMyAdmin. You can find the link to that page in the top menu. Populate the table with some records: INSERT INTO `posts`(`id`, `title`, `description`, `created`, `modified`) VALUES (1, 'PHP scripting language', 'PHP is a web scripting language originally created by Rasmus Lerdorf', NOW(), NOW()), (2, 'Programming Languages', 'There are many languages available to cater any kind of programming need', NOW(), NOW()), (3, 'My Life', 'This post is about my life which in a sense is beautiful', NOW(), NOW()), (4, 'Life on Mars', 'Is there any life on mars?', NOW(), NOW()); Next, run the following queries against the table: SELECT * FROM posts WHERE title LIKE 'programming%'; The above query returns row 2. SELECT * FROM posts WHERE description LIKE '%life%'; The above query return rows 3 and 4. SELECT * FROM posts WHERE description LIKE '%scripting language%'; The above query returns row 1. SELECT * FROM posts WHERE description LIKE '%beautiful%' OR description LIKE '%programming%'; The above query returns rows 2 and 3. phpMyAdmin To administer MySQL database, I highly recommend using a GUI interface tool like phpMyAdmin (http://www.phpmyadmin.net). All the above mentioned queries can easily be executed What just happened? We first created a table posts to hold some data. Each post has a title and a description. We then populated the table with some records. With the first SELECT query we tried to find all posts where the title starts with the word programming. This correctly gave us the row number 2. But what if you want to search for the word anywhere in the field and not just at that start? For this we fired the second query, wherein we searched for the word life anywhere in the description of the post. Again this worked pretty well for us and as expected we got the result in the form of row numbers 3 and 4. Now what if we wanted to search for multiple words? For this we fired the third query where we searched for the words scripting language. As row 1 has those words in its description, it was returned correctly. Until now everything looked fine and we were able to perform searches without any hassle. The query gets complex when we want to search for multiple words and those words are not necessarily placed consecutively in a field, that is, side by side. One such example is shown in the form of our fourth query where we tried to search for the words programming and beautiful in the description of the posts. Since the number of words we need to search for increases, this query gets complicated, and moreover, slow in execution, since it needs to match each word individually. The previous SELECT queries and their output also don't give us any information about the relevance of the search terms with the results found. Relevance can be defined as a measure of how closely the returned database records match the user's search query. In other words, how pertinent the result set is to the search query. Relevance is very important in the search world because users want to see the items with highest relevance at the top of their search results. One of the major reasons for the success of Google is that their search results are always sorted by relevance. MySQL full-text search This is where full-text search comes to the rescue. MySQL has inbuilt support for full-text search and you only need to add FULLTEXT INDEX to the field against which you want to perform your search. Continuing the earlier example of the posts table, let's add a full-text index to the description field of the table. Run the following query: ALTER TABLE `posts` ADD FULLTEXT ( `description` ); The query will add an INDEX of type FULLTEXT to the description field of the posts table. Only MyISAM Engine in MySQL supports the full-text indexes. Now to search for all the records which contain the words programming or beautiful anywhere in their description, the query would be: SELECT * FROM posts WHERE MATCH (description) AGAINST ('beautiful programming'); This query will return rows 2 and 3, and the returned results are sorted by relevance. One more thing to note is that this query takes less time than the earlier query, which used LIKE for matching. By default, the MATCH() function performs a natural language search, it attempts to use natural language processing to understand the nature of the query and then search accordingly. Full-text search in MySQL is a big topic in itself and we have only seen the tip of the iceberg. For a complete reference, please refer to the MySQL manual at http://dev.mysql.com/doc/. Advantages of full-text search The following points are some of the major advantages of full-text search: It is quicker than traditional searches as it benefits from an index of words that is used to look up records instead of doing a full table scan It gives results that can be sorted by relevance to the searched phrase or term, with sophisticated ranking capabilities to find the best documents or records It performs very well on huge databases with millions of records It skips the common words such as the, an, for, and so on When to use a full-text search? When there is a high volume of free-form text data to be searched When there is a need for highly optimized search results When there is a demand for flexible search querying

0
0
2017

Packt

16 Mar 2011

9 min read

Sphinx: Index Searching

Packt

16 Mar 2011

9 min read

Client API implementations for Sphinx Sphinx comes with a number of native searchd client API implementations. Some third-party open source implementations for Perl, Ruby, and C++ are also available. All APIs provide the same set of methods and they implement the same network protocol. As a result, they more or less all work in a similar fashion, they all work in a similar fashion. All examples in this article are for PHP implementation of the Sphinx API. However, you can just as easily use other programming languages. Sphinx is used with PHP more widely than any other language. Search using client API Let's see how we can use native PHP implementation of Sphinx API to search. We will add a configuration related to searchd and then create a PHP file to search the index using the Sphinx client API implementation for PHP. Time for action – creating a basic search script Add the searchd config section to /usr/local/sphinx/etc/sphinx-blog.conf: source blog { # source options } index posts { # index options } indexer { # indexer options } # searchd options (used by search daemon) searchd { listen = 9312 log = /usr/local/sphinx/var/log/searchd.log query_log = /usr/local/sphinx/var/log/query.log max_children = 30 pid_file = /usr/local/sphinx/var/log/searchd.pid } Start the searchd daemon (as root user): $ sudo /usr/local/sphinx/bin/searchd -c /usr/local/sphinx/etc/ sphinx-blog.conf Copy the sphinxapi.php file (the class with PHP implementation of Sphinx API) from the sphinx source directory to your working directory: $ mkdir /path/to/your/webroot/sphinx $ cd /path/to/your/webroot/sphinx $ cp /path/to/sphinx-0.9.9/api/sphinxapi.php ./ Create a simple_search.php script that uses the PHP client API class to search the Sphinx-blog index, and execute it in the browser: <?php require_once('sphinxapi.php'); // Instantiate the sphinx client $client = new SphinxClient(); // Set search options $client->SetServer('localhost', 9312); $client->SetConnectTimeout(1); $client->SetArrayResult(true); // Query the index $results = $client->Query('php'); // Output the matched results in raw format print_r($results['matches']); The output of the given code, as seen in a browser, will be similar to what's shown in the following screenshot: What just happened? Firstly, we added the searchd configuration section to our sphinx-blog.conf file. The following options were added to searchd section: listen: This options specifies the IP address and port that searchd will listen on. It can also specify the Unix-domain socket path. This options was introduced in v0.9.9 and should be used instead of the port (deprecated) option. If the port part is omitted, then the default port used is 9312.Examples: listen = localhost listen = 9312 listen = localhost:9898 listen = 192.168.1.25:4000 listen = /var/run/sphinx.s log: Name of the file where all searchd runtime events will be logged. This is an optional setting and the default value is "searchd.log". query_log: Name of the file where all search queries will be logged. This is an optional setting and the default value is empty, that is, do not log queries. max_children: The maximum number of concurrent searches to run in parallel. This is an optional setting and the default value is 0 (unlimited). pid_file: Filename of the searchd process ID. This is a mandatory setting. The file is created on startup and it contains the head daemon process ID while the daemon is running. The pid_file becomes unlinked when the daemon is stopped. Once we were done with adding searchd configuration options, we started the searchd daemon with root user. We passed the path of the configuration file as an argument to searchd. The default configuration file used is /usr/local/sphinx/etc/sphinx.conf. After a successful startup, searchd listens on all network interfaces, including all the configured network cards on the server, at port 9312. If we want searchd to listen on a specific interface then we can specify the hostname or IP address in the value of the listen option: listen = 192.168.1.25:9312 The listen setting defined in the configuration file can be overridden in the command line while starting searchd by using the -l command line argument. There are other (optional) arguments that can be passed to searchd as seen in the following screenshot: searchd needs to be running all the time when we are using the client API. The first thing you should always check is whether searchd is running or not, and start it if it is not running. We then created a PHP script to search the sphinx-blog index. To search the Sphinx index, we need to use the Sphinx client API. As we are working with a PHP script, we copied the PHP client implementation class, (sphinxapi.php) which comes along with Sphinx source, to our working directory so that we can include it in our script. However, you can keep this file anywhere on the file system as long as you can include it in your PHP script. Throughout this article we will be using /path/to/webroot/sphinx as the working directory and we will create all PHP scripts in that directory. We will refer to this directory simply as webroot. We initialized the SphinxClient class and then used the following class methods to set upthe Sphinx client API: SphinxClient::SetServer($host, $port)—This method sets the searchd hostname and port. All subsequent requests use these settings unless this method is called again with some different parameters. The default host is localhost and port is 9312. SphinxClient::SetConnectTimeout($timeout)—This is the maximum time allowed to spend trying to connect to the server before giving up. SphinxClient::SetArrayResult($arrayresult)—This is a PHP client APIspecific method. It specifies whether the matches should be returned as an array or a hash. The Default value is false, which means that matches will be returned in a PHP hash format, where document IDs will be the keys, and other information (attributes, weight) will be the values. If $arrayresult is true, then the matches will be returned in plain arrays with complete per-match information. After that, the actual querying of index was pretty straightforward using the SphinxClient::Query($query) method. It returned an array with matched results, as well as other information such as error, fields in index, attributes in index, total records found, time taken for search, and so on. The actual results are in the $results['matches'] variable. We can run a loop on the results, and it is a straightforward job to get the actual document's content from the document ID and display it. Matching modes When a full-text search is performed on the Sphinx index, different matching modes can be used by Sphinx to find the results. The following matching modes are supported by Sphinx: SPH_MATCH_ALL—This is the default mode and it matches all query words, that is, only records that match all of the queried words will be returned. SPH_MATCH_ANY—This matches any of the query words. SPH_MATCH_PHRASE—This matches query as a phrase and requires a perfect match. SPH_MATCH_BOOLEAN—This matches query as a Boolean expression. SPH_MATCH_EXTENDED—This matches query as an expression in Sphinx internal query language. SPH_MATCH_EXTENDED2—This matches query using the second version of Extended matching mode. This supersedes SPH_MATCH_EXTENDED as of v0.9.9. SPH_MATCH_FULLSCAN—In this mode the query terms are ignored and no text-matching is done, but filters and grouping are still applied. Time for action – searching with different matching modes Create a PHP script display_results.php in your webroot with the following code: <?php // Database connection credentials $dsn ='mysql:dbname=myblog;host=localhost'; $user = 'root'; $pass = ''; // Instantiate the PDO (PHP 5 specific) class try { $dbh = new PDO($dsn, $user, $pass); } catch (PDOException $e){ echo'Connection failed: '.$e->getMessage(); } // PDO statement to fetch the post data $query = "SELECT p.*, a.name FROM posts AS p " . "LEFT JOIN authors AS a ON p.author_id = a.id " . "WHERE p.id = :post_id"; $post_stmt = $dbh->prepare($query); // PDO statement to fetch the post's categories $query = "SELECT c.name FROM posts_categories AS pc ". "LEFT JOIN categories AS c ON pc.category_id = c.id " . "WHERE pc.post_id = :post_id"; $cat_stmt = $dbh->prepare($query); // Function to display the results in a nice format function display_results($results, $message = null) { global $post_stmt, $cat_stmt; if ($message) { print "<h3>$message</h3>"; } if (!isset($results['matches'])) { print "No results found<hr />"; return; } foreach ($results['matches'] as $result) { // Get the data for this document (post) from db $post_stmt->bindParam(':post_id', $result['id'], PDO::PARAM_INT); $post_stmt->execute(); $post = $post_stmt->fetch(PDO::FETCH_ASSOC); // Get the categories of this post $cat_stmt->bindParam(':post_id', $result['id'], PDO::PARAM_INT); $cat_stmt->execute(); $categories = $cat_stmt->fetchAll(PDO::FETCH_ASSOC); // Output title, author and categories print "Id: {$posmt['id']}<br />" . "Title: {$post['title']}<br />" . "Author: {$post['name']}"; $cats = array(); foreach ($categories as $category) { $cats[] = $category['name']; } if (count($cats)) { print "<br />Categories: " . implode(', ', $cats); } print "<hr />"; } } Create a PHP script search_matching_modes.php in your webroot with the following code: <?php // Include the api class Require('sphinxapi.php'); // Include the file which contains the function to display results require_once('display_results.php'); $client = new SphinxClient(); // Set search options $client->SetServer('localhost', 9312); $client->SetConnectTimeout(1); $client->SetArrayResult(true); // SPH_MATCH_ALL mode will be used by default // and we need not set it explicitly display_results( $client->Query('php'), '"php" with SPH_MATCH_ALL'); display_results( $client->Query('programming'), '"programming" with SPH_MATCH_ALL'); display_results( $client->Query('php programming'), '"php programming" with SPH_MATCH_ALL'); // Set the mode to SPH_MATCH_ANY $client->SetMatchMode(SPH_MATCH_ANY); display_results( $client->Query('php programming'), '"php programming" with SPH_MATCH_ANY'); // Set the mode to SPH_MATCH_PHRASE $client->SetMatchMode(SPH_MATCH_PHRASE); display_results( $client->Query('php programming'), '"php programming" with SPH_MATCH_PHRASE'); display_results( $client->Query('scripting language'), '"scripting language" with SPH_MATCH_PHRASE'); // Set the mode to SPH_MATCH_FULLSCAN $client->SetMatchMode(SPH_MATCH_FULLSCAN); display_results( $client->Query('php'), '"php programming" with SPH_MATCH_FULLSCAN'); Execute search_matching_modes.php in a browser (http://localhost/sphinx/search_matching_modes.php).

0
0
3137

article-image-oracle-goldengate-11g-performance-tuning

Packt

01 Mar 2011

12 min read

Oracle GoldenGate 11g: Performance Tuning

Packt

01 Mar 2011

12 min read

0
0
6095

article-image-oracle-goldengate-considerations-designing-solution

Packt

24 Feb 2011

8 min read

Oracle GoldenGate: Considerations for Designing a Solution

Packt

24 Feb 2011

8 min read

Oracle GoldenGate 11g Implementer's guide Design, install, and configure high-performance data replication solutions using Oracle GoldenGate The very first book on GoldenGate, focused on design and performance tuning in enterprise-wide environments Exhaustive coverage and analysis of all aspects of the GoldenGate software implementation, including design, installation, and advanced configuration Migrate your data replication solution from Oracle Streams to GoldenGate Design a GoldenGate solution that meets all the functional and non-functional requirements of your system Written in a simple illustrative manner, providing step-by-step guidance with discussion points Goes way beyond the manual, appealing to Solution Architects, System Administrators and Database Administrators At a high level, the design must include the following generic requirements: Hardware Software Network Storage Performance All the above must be factored into the overall system architecture. So let's take a look at some of the options and the key design issues. Replication methods So you have a fast reliable network between your source and target sites. You also have a schema design that is scalable and logically split. You now need to choose the replication architecture; One to One, One to Many, active-active, active-passive, and so on. This consideration may already be answered for you by the sheer fact of what the system has to achieve. Let's take a look at some configuration options. Active-active Let's assume a multi-national computer hardware company has an office in London and New York. Data entry clerks are employed at both sites inputting orders into an Order Management System. There is also a procurement department that updates the system inventory with volumes of stock and new products related to a US or European market. European countries are managed by London, and the US States are managed by New York. A requirement exists where the underlying database systems must be kept in synchronisation. Should one of the systems fail, London users can connect to New York and vice-versa allowing business to continue and orders to be taken. Oracle GoldenGate's active-active architecture provides the best solution to this requirement, ensuring that the database systems on both sides of the pond are kept synchronised in case of failure. Another feature the active-active configuration has to offer is the ability to load balance operations. Rather than have effectively a DR site in both locations, the European users could be allowed access to New York and London systems and viceversa. Should a site fail, then the DR solution could be quickly implemented. Active-passive The active-passive bi-directional configuration replicates data from an active primary database to a full replica database. Sticking with the earlier example, the business would need to decide which site is the primary where all users connect. For example, in the event of a failure in London, the application could be configured to failover to New York. Depending on the failure scenario, another option is to start up the passive configuration, effectively turning the active-passive configuration into active-active. Cascading The Cascading GoldenGate topology offers a number of "drop-off" points that are intermediate targets being populated from a single source. The question here is "what data do I drop at which site?" Once this question has been answered by the business, it is then a case of configuring filters in Replicat parameter files allowing just the selected data to be replicated. All of the data is passed on to the next target where it is filtered and applied again. This type of configuration lends itself to a head office system updating its satellite office systems in a round robin fashion. In this case, only the relevant data is replicated at each target site. Another design, is the Hub and Spoke solution, where all target sites are updated simultaneously. This is a typical head office topology, but additional configuration and resources would be required at the source site to ship the data in a timely manner. The CPU, network, and file storage requirements must be sufficient to accommodate and send the data to multiple targets. Physical Standby A Physical Standby database is a robust Oracle DR solution managed by the Oracle Data Guard product. The Physical Standby database is essentially a mirror copy of its Primary, which lends itself perfectly for failover scenarios. However , it is not easy to replicate data from the Physical Standby database, because it does not generate any of its own redo. That said, it is possible to configure GoldenGate to read the archived standby logs in Archive Log Only (ALO) mode. Despite being potentially slower, it may be prudent to feed a downstream system on the DR site using this mechanism, rather than having two data streams configured from the Primary database. This reduces network bandwidth utilization, as shown in the following diagram: Reducing network traffic is particularly important when there is considerable distance between the primary and the DR site. Networking The network should not be taken for granted. It is a fundamental component in data replication and must be considered in the design process. Not only must it be fast, it must be reliable. In the following paragraphs, we look at ways to make our network resilient to faults and subsequent outages, in an effort to maintain zero downtime. Surviving network outages Probably one of your biggest fears in a replication environment is network failure. Should the network fail, the source trail will fill as the transactions continue on the source database, ultimately filling the filesystem to 100% utilization, causing the Extract process to abend. Depending on the length of the outage, data in the database's redologs may be overwritten causing you the additional task of configuring GoldenGate to extract data from the database's archived logs. This is not ideal as you already have the backlog of data in the trail files to ship to the target site once the network is restored. Therefore, ensure there is sufficient disk space available to accommodate data for the longest network outage during the busiest period. Disks are relatively cheap nowadays. Providing ample space for your trail files will help to reduce the recovery time from the network outage. Redundant networks One of the key components in your GoldenGate implementation is the network. Without the ability to transfer data from the source to the target, it is rendered useless. So, you not only need a fast network but one that will always be available. This is where redundant networks come into play, offering speed and reliability. NIC teaming One method of achieving redundancy is Network Interface Card (NIC) teaming or bonding. Here two or more Ethernet ports can be "coupled" to form a bonded network supporting one IP address. The main goal of NIC teaming is to use two or more Ethernet ports connected to two or more different access network switches thus avoiding a single point of failure. The following diagram illustrates the redundant features of NIC teaming: Linux (OEL/RHEL 4 and above) supports NIC teaming with no additional software requirements. It is purely a matter of network configuration stored in text files in the /etc/sysconfig/network-scripts directory. The following steps show how to configure a server for NIC teaming: First, you need to log on as root user and create a bond0 config file using the vi text editor. # vi /etc/sysconfig/network-scripts/ifcfg-bond0 Append the following lines to it, replacing the IP address with your actual IP address, then save file and exit to shell prompt: DEVICE=bond0 IPADDR=192.168.1.20 NETWORK=192.168.1.0 NETMASK=255.255.255.0 USERCTL=no BOOTPROTO=none ONBOOT=yes Choose the Ethernet ports you wish to bond, and then open both configurations in turn using the vi text editor, replacing ethn with the respective port number. # vi /etc/sysconfig/network-scripts/ifcfg-eth2 # vi /etc/sysconfig/network-scripts/ifcfg-eth4 Modify the configuration as follows: DEVICE=ethn USERCTL=no ONBOOT=yes MASTER=bond0 SLAVE=yes BOOTPROTO=none Save the files and exit to shell prompt. To make sure the bonding module is loaded when the bonding interface (bond0) is brought up, you need to modify the kernel modules configuration file: # vi /etc/modprobe.conf Append the following two lines to the file: alias bond0 bonding options bond0 mode=balance-alb miimon=100 Finally, load the bonding module and restart the network services: # modprobe bonding # service network restart You now have a bonded network that will load balance when both physical networks are available, providing additional bandwidth and enhanced performance. Should one network fail, the available bandwidth will be halved, but the network will still be available. Non-functional requirements (NFRs) Irrespective of the functional requirements, the design must also include the nonfunctional requirements (NFR) in order to achieve the overall goal of delivering a robust, high performance, and stable system. Latency One of the main NFRs is performance. How long does it take to replicate a transaction from the source database to the target? This is known as end-to-end latency that typically has a threshold that must not be breeched in order to satisfy the specified NFR. GoldenGate refers to latency as lag, which can be measured at different intervals in the replication process. These are: Source to Extract: The time taken for a record to be processed by the Extract compared to the commit timestamp on the database Replicat to Target: The time taken for the last record to be processed by the Replicat compared to the record creation time in the trail file A well designed system may encounter spikes in latency but it should never be continuous or growing. Trying to tune GoldenGate when the design is poor is a difficult situation to be in. For the system to perform well you may need to revisit the design.

0
0
4669

article-image-oracle-goldengate-11g-configuration-high-availability

Packt

23 Feb 2011

10 min read

Oracle GoldenGate 11g: Configuration for High Availability

Packt

23 Feb 2011

10 min read

0
0
5527

article-image-openam-oracle-dsee-and-multiple-data-stores

Packt

03 Feb 2011

6 min read

OpenAM: Oracle DSEE and Multiple Data Stores

Packt

03 Feb 2011

6 min read

OpenAM Written and tested with OpenAM Snapshot 9—the Single Sign-On (SSO) tool for securing your web applications in a fast and easy way The first and the only book that focuses on implementing Single Sign-On using OpenAM Learn how to use OpenAM quickly and efficiently to protect your web applications with the help of this easy-to-grasp guide Written by Indira Thangasamy, core team member of the OpenSSO project from which OpenAM is derived Real-world examples for integrating OpenAM with various applications Oracle Directory Server Enterprise Edition The Oracle Directory Server Enterprise Edition type of identity store is predominantly used by customers and natively supported by the OpenSSO server. It is the only data store where all the user management features offered by OpenSSO is supported with no exceptions. In the console it is labeled as Sun DS with OpenSSO schema. After Oracle acquired Sun Microsystems Inc., the brand name for the Sun Directory Server Enterprise Edition has been changed to Oracle Directory Server Enterprise Edition (ODSEE). You need to have the ODSEE configured prior to creating the data store either from the console or CLI as shown in the following section. Creating a data store for Oracle DSEE Creating a data store from the OpenSSO console is the easiest way to achieve this task. However, if you want to automate this process in a repeatable fashion, then using the ssoadm tool is the right choice. However, the problem here is to obtain the list of the attributes and their corresponding values. It is not documented anywhere in the publicly available documentation for OpenSSO. Let me show you the easy way out of this by providing the required options and their values for the ssoadm: ./ssoadm create-datastore -m odsee_datastore -t LDAPv3ForAMDS -u amadmin-f /tmp/.passwd_of_amadmin -D data_store_odsee.txt -e / This command will create the data store that talks to an Oracle DSEE store. The key options in the preceding command are LDAPv3ForAMDS (instructing the server to create a datastore of type ODSEE) and the properties that have been included in the data_ store_odsee.txt. You can find the complete contents of this file as part of the code bundle provided by Packt Publishers. Some excerpts from the file are given as follows: sun-idrepo-ldapv3-config-ldap-server=odsee.packt-services. net:4355 sun-idrepo-ldapv3-config-authid=cn=directory manager sun-idrepo-ldapv3-config-authpw=dssecret12 com.iplanet.am.ldap.connection.delay.between.retries=1000 sun-idrepo-ldapv3-config-auth-naming-attr=uid <contents removed to save paper space > sun-idrepo-ldapv3-config-users-search-attribute=uid sun-idrepo-ldapv3-config-users-search-filter=(objectclass= inetorgperson) sunIdRepoAttributeMapping= sunIdRepoClass=com.sun.identity.idm.plugins.ldapv3.LDAPv3Repo sunIdRepoSupportedOperations=filteredrole=read,create,edit, delete sunIdRepoSupportedOperations=group=read,create,edit,delete sunIdRepoSupportedOperations=realm=read,create,edit,delete, service sunIdRepoSupportedOperations=role=read,create,edit,delete sunIdRepoSupportedOperations=user=read,create,edit,delete, service Among these properties, the first three provide critical information for the whole thing to work. As it is evident from the name of the property they denote the ODSEE server name, port, bind DN, and the password. The password is in plain text so you need to remove the password from the input file after creating the data store, for security reasons. Updating the data store Like we discussed in the previous section, one can invoke the update-datastore sub command with the appropriate properties and its values along with the -a switch to the ssoadm tool. If you have more properties to be updated, then put them in a text file and use it with the -D option like the create-datastore sub command. Deleting the data store A data store can be deleted by just selecting it's specific name from the console. There will be no warnings issued while deleting the data stores, and you could eventually delete all of the data stores in a realm. Be cautious about this behavior, you might end up deleting all of them unintentionally. Deleting data store does not remove any existing data in the underlying LDAP server, it only removes the configuration from the OpenSSO server: ./ssoadm delete-datastores -e / -m odsee_datastore -f /tmp/ .passwd_of_amadmin -u amadmin Data store for OpenDS OpenDS is one of the popular LDAP servers that is completely written in Java and available freely under open source license. As a matter of fact the embedded configuration store that is built in the OpenSSO server is the embedded version of OpenDS. It has been fully tested with the OpenDS standalone version for the identity store usage. The data store creation and management are pretty much similar to the steps described in the foregoing section, except the type of store and the corresponding properties' values. The properties and their values are given in the file data_store_opends.txt (available as part of code bundle). Invoke the ssoadm tool with this property file after making appropriate changes to fit to your deployment. Here is the sample command that creates the datastore for OpenDS: ./ssoadm create-datastore -u amadmin -f /tmp/.passwd_of_amadmin -e / -m "OpenDS-store" -t LDAPv3ForOpenDS -D data_store_opends.txt Data store for Tivoli DS The IBM Tivoli Directory Server 6.2 is one of the supported LDAP servers for the OpenSSO server to provide authentication and authorization services. A specific sub configuration LDAPv3ForTivoli is available out of the box to support this server. You can find the data_store_tivoli.txt to create a new data store by supplying the -t LDAPv3ForTivoli option to the ssoadm tool. Here is the sample command that creates the datastore for Tivoli DS. The sample (data_store_tivoli. txt can be found as part of the code bundle) file contains the entries including the default group, that are shipped with the Tivoli DS for easy understanding. You can customize it to any valid values: ./ssoadm create-datastore -u amadmin -f /tmp/.passwd_of_amadmin -e / -m "Tivoli-store" -t LDAPv3ForTivoli -D data_store_tivoli.txt Data store for Active Directory Microsoft Active Directory provides most of the LDAPv3 features including support for persistent search notifications. Creating a data store for this is also a straightforward process and is available out-of-the-box. You can find the data_ store_ad.txt to create a new data store by supplying the -t LDAPv3ForAD option to the ssoadm tool. Here is the sample command that creates the datastore for AD: ./ssoadm create-datastore -u amadmin -f /tmp/.passwd_of_amadmin -e / -m "AD-store" -t LDAPv3ForAD -D data_store_ad.txt . Data store for Active Directory Application Mode Microsoft Active Directory Application Mode (ADAM) is the lightweight version of the Active directory with simplified schema. In the ADAM instance it is possible to set user password over LDAP unlike Active Directory where password-related operations must happen over LDAPS. Creating a data store for this is also a straightforward process and is available out-of-the-box. You can find the data_store_adam. txt to create a new data store by supplying the -t LDAPv3ForADAM option to the ssoadm tool: ./ssoadm create-datastore -u amadmin -f /tmp/.passwd_of_amadmin -e / -m "ADAM-store" -t LDAPv3ForADAM -D data_store_adam.txt

0
0
3034

article-image-openam-backup-recovery-and-logging

Packt

03 Feb 2011

9 min read

OpenAM: Backup, Recovery, and Logging

Packt

03 Feb 2011

9 min read

0
0
3021

article-image-creating-time-series-charts-r

Packt

01 Feb 2011

5 min read

Creating Time Series Charts in R

Packt

01 Feb 2011

5 min read

Formatting time series data for plotting Time series or trend charts are the most common form of line graphs. There are a lot of ways in R to plot such data, however it is important to first format the data in a suitable format that R can understand. In this recipe, we will look at some ways of formatting time series data using the base and some additional packages. Getting ready In addition to the basic R functions, we will also be using the zoo package in this recipe. So first we need to install it: install.packages("zoo") How to do it... Let's use the dailysales.csv example dataset and format its date column: sales<-read.csv("dailysales.csv") d1<-as.Date(sales$date,"%d/%m/%y") d2<-strptime(sales$date,"%d/%m/%y") data.class(d1) [1] "Date" data.class(d2) [1] "POSIXt" How it works... We have seen two different functions to convert a character vector into dates. If we did not convert the date column, R would not automatically recognize the values in the column as dates. Instead, the column would be treated as a character vector or a factor. The as.Date() function takes at least two arguments: the character vector to be converted to dates and the format to which we want it converted. It returns an object of the Date class, represented as the number of days since 1970-01-01, with negative values for earlier dates. The values in the date column are in a DD/MM/YYYY format (you can verify this by typing sales$date at the R prompt). So, we specify the format argument as "%d/%m/%y". Please note that this order is important. If we instead use "%m/%d/%y", then our days will be read as months and vice-versa. The quotes around the value are also necessary. The strptime() function is another way to convert character vectors into dates. However, strptime() returns a different kind of object of class POSIXlt, which is a named list of vectors representing the different components of a date and time, such as year, month, day, hour, seconds, minutes, and a few more. POSIXlt is one of the two basic classes of date/times in R. The other class POSIXct represents the (signed) number of seconds since the beginning of 1970 (in the UTC time zone) as a numeric vector. POSIXct is more convenient for including in data frames, and POSIXlt is closer to human readable forms. A virtual class POSIXt inherits from both of the classes. That's why when we ran the data.class() function on d2 earlier, we get POSIXt as the result. strptime() also takes a character vector to be converted and the format as arguments. There's more... The zoo package is handy for dealing with time series data. The zoo() function takes an argument x, which can be a numeric vector, matrix, or factor. It also takes an order.by argument which has to be an index vector with unique entries by which the observations in x are ordered: library(zoo) d3<-zoo(sales$units,as.Date(sales$date,"%d/%m/%y")) data.class(d3) [1] "zoo" See the help on DateTimeClasses to find out more details about the ways dates can be represented in R. Plotting date and time on the X axis In this recipe, we will learn how to plot formatted date or time values on the X axis. Getting ready For the first example, we only need to use the base graphics function plot(). How to do it... We will use the dailysales.csv example dataset to plot the number of units of a product sold daily in a month: sales<-read.csv("dailysales.csv") plot(sales$units~as.Date(sales$date,"%d/%m/%y"),type="l", xlab="Date",ylab="Units Sold") How it works... Once we have formatted the series of dates using as.Date(), we can simply pass it to the plot() function as the x variable in either the plot(x,y) or plot(y~x) format. We can also use strptime() instead of using as.Date(). However, we cannot pass the object returned by strptime() to plot() in the plot(y~x) format. We must use the plot(x,y) format as follows: plot(strptime(sales$date,"%d/%m/%Y"),sales$units,type="l", xlab="Date",ylab="Units Sold") There's more... We can plot the example using the zoo() function as follows (assuming zoo is already installed): library(zoo) plot(zoo(sales$units,as.Date(sales$date,"%d/%m/%y"))) Note that we don't need to specify x and y separately when plotting using zoo; we can just pass the object returned by zoo() to plot(). We also need not specify the type as "l". Let's look at another example which has full date and time values on the X axis, instead of just dates. We will use the openair.csv example dataset for this example: air<-read.csv("openair.csv") plot(air$nox~as.Date(air$date,"%d/%m/%Y %H:%M"),type="l", xlab="Time", ylab="Concentration (ppb)", main="Time trend of Oxides of Nitrogen") (Move the mouse over the image to enlarge it.) The same graph can be made using zoo as follows: plot(zoo(air$nox,as.Date(air$date,"%d/%m/%Y %H:%M")), xlab="Time", ylab="Concentration (ppb)", main="Time trend of Oxides of Nitrogen")

0
0
7915

article-image-faqs-microsoft-sql-server-2008-high-availability

Packt

28 Jan 2011

6 min read

FAQs on Microsoft SQL Server 2008 High Availability

Packt

28 Jan 2011

6 min read

Microsoft SQL Server 2008 High Availability Minimize downtime, speed up recovery, and achieve the highest level of availability and reliability for SQL server applications by mastering the concepts of database mirroring,log shipping,clustering, and replication Install various SQL Server High Availability options in a step-by-step manner A guide to SQL Server High Availability for DBA aspirants, proficient developers and system administrators Learn the pre and post installation concepts and common issues you come across while working on SQL Server High Availability Tips to enhance performance with SQL Server High Availability External references for further study Q: What is Clustering? A: Clustering is usually deployed when there is a critical business application running that needs to be available 24 X 7 or in terminology—High Availability. These clusters are known as Failover clusters because the primary goal to set up the cluster is to make services or business processes that are critical for business and should be available 24 X 7 with 99.99% up time. Q: How does MS Windows server Enterprise and Datacenter edition support failover clustering? A: MS Windows server Enterprise and Datacenter edition supports failover clustering. This is achieved by having two or more identical nodes connected to each other by means of private network and commonly used resources. In case of failure of any common resource or services, the first node (Active) passes the ownership to another node (Passive). Q: What is MSDTC? A: Microsoft Distributed Transaction Coordinator (MSDTC) is a service used by the SQL Server when it is required to have distributed transactions between more than one machine. In a clustered environment, SQL Server service can be hosted on any of the available nodes if the active node fails, and in this case MSDTC comes into the picture in case we have distributed queries and for replication, and hence the MSDTC service should be running. Following are a couple of questions with regard to MSDTC. Q: What will happen to the data that is being accessed? A: The data is taken care of, by shared disk arrays as it is shared and every node that is part of the cluster can access it; however, one node at a time can access and own it. Q: What about clients that were connected previously? Does the failover mean that developers will have to modify the connection string? A: Nothing like this happens. SQL Server is installed as a virtual server and it has a virtual IP address and that too is shared by every cluster node. So, the client actually knows only one SQL Server or its IP address. Here are the steps that explain how Failover will work: Node 1 owns the resources as of now, and is active node. The network adapter driver gets corrupted or suffers a physical damage. Heartbeat between Node1 and Node 2 is broken. Node 2 initiates the process to take ownership of the resources owned by the Node 1. It would approximately take two to five minutes to complete the process. Q: What is Hyper-V? What are their uses? A: Let's see what the Hyper-V is: It is a hypervisor-based technology that allows multiple operating systems to run on a host operating system at the same time. It has advantages of using SQL Server 2008 R2 on Windows Server 2008 R2 with Hyper-V. One such example could be the ability to migrate a live server, thereby increasing high availability without incurring downtime, among others. Hyper-V now supports up to 64 logical processors. It can host up to four VMs on a single licensed host server. SQL Server 2008 R2 allows an unrestricted number of virtual servers, thus making consolidation easy. It has the ability to manage multiple SQL Servers centrally using Utility Control Point (UCP). Sysprep utility can be used to create preconfigured VMs so that SQL Server deployment becomes easier. Q: What are the Hardware, Software and Operating system requirements for installing SQL Server 2008 R2? A: The following are the hardware requirements: Processor: Intel Pentium 3 or Higher Processor Speed: 1 GHZ or Higher RAM: 512 MB of RAM but 2 GB is recommended Display : VGA or Higher The following are the software requirements: Operating system: Windows 7 Ultimate, Windows Server 2003 (x86 or x64), Windows Server 2008 (x86 or x64) Disk space: Minimum 1 GB .Net Framework 3.5 Windows Installer 4.5 or later MDAC 2.8 SP1 or later The following are the operating system requirements for clustering: To install SQL Server 2008 clustering, it's essential to have Windows Server 2008 Enterprise or Data Center Edition installed on our host system with Full Installation, so that we don't have to go back and forth and install the required components and restart the system. Q: What is to be done when we see the network binding warning coming up? A: In this scenario, we will have to go to Network and Sharing Center | Change Adapter Settings. Once there, pressing Alt + F, we will select Advanced Settings. Select Public Network and move it up if it is not and repeat this process on the second node. Q: What is the difference between Active/Passive and Active/Active Failover Cluster? A: In reality, there is only one difference between Single-instance (Active/Passive Failover Cluster) and Multi-instance (Active/Active Failover Cluster). As its name suggests, in a Multi-instance cluster, there will be two or more SQL Server active instances running in a cluster, compared to one instance running in Single-instance. Also, to configure a multi-instance Cluster, we may need to procure additional disks, IP addresses, and network names for the SQL Server. Q: What is the benefit of having Multi-instance, that is, Active/Active configuration? A: Depending on the business requirement and the capability of our hardware, we may have one or more instances running in our cluster environment. The main goal is to have a better uptime and better High Availability by having multiple SQL Server instances running in an environment. Should anything go wrong with the one SQL Server instance, another instance can easily take over the control and keep the business-critical application up and running! Q: What will be the difference in the prerequisites for the Multi-instance Failover Cluster as compared to the Single-instance Failover Cluster? A: There will be no difference compared to a Single-instance Failover Cluster, except that we need to procure additional disk(s), network name, and IP addresses. We need to make sure that our hardware is capable of handling requests that come from client machines for both the instances. Installing a Multi-instance cluster is almost similar to adding a Single-instance cluster, except for the need to add a few resources along with a couple of steps here and there.

0
0
1520

article-image-openam-identity-stores-types-supported-types-caching-and-notification

Packt

27 Jan 2011

11 min read

OpenAM Identity Stores: Types, Supported Types, Caching and Notification

Packt

27 Jan 2011

11 min read

0
0
3097

article-image-choosing-styles-various-graph-elements-r

Packt

25 Jan 2011

4 min read

Choosing Styles of Various Graph Elements in R

Packt

25 Jan 2011

4 min read

R Graph Cookbook Detailed hands-on recipes for creating the most useful types of graphs in R – starting from the simplest versions to more advanced applications Learn to draw any type of graph or visual data representation in R Filled with practical tips and techniques for creating any type of graph you need; not just theoretical explanations All examples are accompanied with the corresponding graph images, so you know what the results look like Each recipe is independent and contains the complete explanation and code to perform the task as efficiently as possible Choosing plotting point symbol styles and sizes In this recipe, we will see how we can adjust the styling of plotting symbols, which is useful and necessary when we plot more than one set of points representing different groups of data on the same graph. Getting ready All you need to try out this recipe is to run R and type the recipe at the command prompt. You can also choose to save the recipe as a script so that you can use it again later on. We will also use the cityrain.csv example data file. Please read the file into R as follows: rain<-read.csv("cityrain.csv") The code file can be downloaded from here. How to do it... The plotting symbol and size can be set using the pch and cex arguments: plot(rnorm(100),pch=19,cex=2) How it works... The pch argument stands for plotting character (symbol). It can take numerical values (usually between 0 and 25) as well as single character values. Each numerical value represents a different symbol. For example, 1 represents circles, 2 represents triangles, 3 represents plus signs, and so on. If we set the value of pch to a character such as "*" or "£" in inverted commas, then the data points are drawn as that character instead of the default circles. The size of the plotting symbol is controlled by the cex argument, which takes numerical values starting at 0 giving the amount by which plotting symbols should be magnified relative to the default. Note that cex takes relative values (the default is 1). So, the absolute size may vary depending on the defaults of the graphic device in use. For example, the size of plotting symbols with the same cex value may be different for a graph saved as a PNG file versus a graph saved as a PDF. There’s more... The most common use of pch and cex is when we don’t want to use color to distinguish between different groups of data points. This is often the case in scientific journals which do not accept color images. For example, let’s plot the city rainfall data as a set of points instead of lines: plot(rain$Tokyo, ylim=c(0,250), main="Monthly Rainfall in major cities", xlab="Month of Year", ylab="Rainfall (mm)", pch=1) points(rain$NewYork,pch=2) points(rain$London,pch=3) points(rain$Berlin,pch=4) legend("top", legend=c("Tokyo","New York","London","Berlin"), ncol=4, cex=0.8, bty="n", pch=1:4) Choosing line styles and width Similar to plotting point symbols, R provides simple ways to adjust the style of lines in graphs. Getting ready All you need to try out this recipe is to run R and type the recipe at the command prompt. You can also choose to save the recipe as a script so that you can use it again later on. We will again use the cityrain.csv data file. How to do it... Line styles can be set by using the lty and lwd arguments (for line type and width respectively) in the plot(), lines(), and par() commands. Let’s take our rainfall example and apply different line styles keeping the color the same: plot(rain$Tokyo, ylim=c(0,250), main="Monthly Rainfall in major cities", xlab="Month of Year", ylab="Rainfall (mm)", type="l", lty=1, lwd=2) lines(rain$NewYork,lty=2,lwd=2) lines(rain$London,lty=3,lwd=2) lines(rain$Berlin,lty=4,lwd=2) legend("top", legend=c("Tokyo","New York","London","Berlin"), ncol=4, cex=0.8, bty="n", lty=1:4, lwd=2) How it works... Both line type and width can be set with numerical values as shown in the previous example. Line type number values correspond to types of lines: 0: blank 1: solid (default) 2: dashed 3: dotted 4: dotdash 5: longdash 6: twodash We can also use the character strings instead of numbers, for example, lty="dashed" instead of lty=2. The line width argument lwd takes positive numerical values. The default value is 1. In the example we used a value of 2, thus making the lines thicker than default.

0
0
2347

article-image-adjusting-key-parameters-r

Packt

24 Jan 2011

6 min read

Adjusting Key Parameters in R

Packt

24 Jan 2011

6 min read

R Graph Cookbook Detailed hands-on recipes for creating the most useful types of graphs in R – starting from the simplest versions to more advanced applications Learn to draw any type of graph or visual data representation in R Filled with practical tips and techniques for creating any type of graph you need; not just theoretical explanations All examples are accompanied with the corresponding graph images, so you know what the results look like Each recipe is independent and contains the complete explanation and code to perform the task as efficiently as possible Setting colors of points, lines, and bars In this recipe we will learn the simplest way to change the colors of points, lines, and bars in scatter plots, line plots, histograms, and bar plots. Getting ready All you need to try out this recipe is to run R and type the recipe at the command prompt. You can also choose to save the recipe as a script so that you can use it again later on. How to do it... The simplest way to change the color of any graph element is by using the col argument. For example, the plot() function takes the col argument: plot(rnorm(1000), col="red") The code file can be downloaded from here. If we choose plot type as line, then the color is applied to the plotted line. Let’s use the dailysales.csv example dataset. First, we need to load it: Sales <- read.csv("dailysales.csv",header=TRUE) plot(sales$units~as.Date(sales$date,"%d/%m/%y"), type="l", #Specify type of plot as l for line col="blue") Similarly, the points() and lines() functions apply the col argument's value to the plotted points and lines respectively. barplot() and hist() also take the col argument and apply them to the bars. So the following code would produce a bar plot with blue bars: barplot(sales$ProductA~sales$City, col="blue") The col argument for boxplot() is applied to the color of the boxes plotted. How it works... The col argument automatically applies the specified color to the elements being plotted, based on the plot type. So, if we do not specify a plot type or choose points, then the color is applied to points. Similarly, if we choose plot type as line then the color is applied to the plotted line and if we use the col argument in the barplot() or histogram() commands, then the color is applied to the bars. col accepts names of colors such as red, blue, and black. The colors() (or colours()) function lists all the built-in colors (more than 650) available in R. We can also specify colors as hexadecimal codes such as #FF0000 (for red), #0000FF (for blue), and #000000 (for black). If you have ever made any web pages¸ you would know that these hex codes are used in HTML to represent colors. col can also take numeric values. When it is set to a numeric value, the color corresponding to that index in the current color palette is used. For example, in the default color palette the first color is black and the second color is red. So col=1 and col=2 refers to black and red respectively. Index 0 corresponds to the background color. There's more... In many settings, col can also take a vector of multiple colors, instead of a single color. This is useful if you wish to use more than one color in a graph. The heat.colors() function takes a number as an argument and returns a vector of those many colors. So heat.colors(5) produces a vector of five colors. Type the following at the R prompt: heat.colors(5) You should get the following output: [1] "#FF0000FF" "#FF5500FF" "#FFAA00FF" "#FFFF00FF" "#FFFF80FF" Those are five colors in the hexadecimal format. Another way of specifying a vector of colors is to construct one: barplot(as.matrix(sales[,2:4]), beside=T, legend=sales$City, col=c("red","blue","green","orange","pink"), border="white") In the example, we set the value of col to c("red","blue","green","orange","pink"), which is a vector of five colors. We have to take care to make a vector matching the length of the number of elements, in this case bars we are plotting. If the two numbers don’t match, R will 'recycle’ values by repeating colors from the beginning of the vector. For example, if we had fewer colors in the vector than the number of elements, say if we had four colors in the previous plot, then R would apply the four colors to the first four bars and then apply the first color to the fifth bar. This is called recycling in R: barplot(as.matrix(sales[,2:4]), beside=T, legend=sales$City, col=c("red","blue","green","orange"), border="white") In the example, both the bars for the first and last data rows (Seattle and Mumbai) would be of the same color (red), making it difficult to distinguish one from the other. One good way to ensure that you always have the correct number of colors is to find out the length of the number of elements first and pass that as an argument to one of the color palette functions. For example, if we did not know the number of cities in the example we have just seen; we could do the following to make sure the number of colors matches the number of bars plotted: barplot(as.matrix(sales[,2:4]), beside=T, legend=sales$City, col=heat.colors(length(sales$City)), border="white") We used the length() function to find out the length or the number of elements in the vector sales$City and passed that as the argument to heat.colors(). So, regardless of the number of cities we will always have the right number of colors. See also In the next four recipes, we will see how to change the colors of other elements. The fourth recipe is especially useful where we look at color combinations and palettes.

0
0
7555

article-image-microsoft-sql-server-2008-high-availability-understanding-domains-users-and-security

Packt

21 Jan 2011

14 min read

Microsoft SQL Server 2008 High Availability: Understanding Domains, Users, and Security

Packt

21 Jan 2011

14 min read

Microsoft SQL Server 2008 High Availability Minimize downtime, speed up recovery, and achieve the highest level of availability and reliability for SQL server applications by mastering the concepts of database mirroring,log shipping,clustering, and replication Install various SQL Server High Availability options in a step-by-step manner A guide to SQL Server High Availability for DBA aspirants, proficient developers and system administrators Learn the pre and post installation concepts and common issues you come across while working on SQL Server High Availability Tips to enhance performance with SQL Server High Availability External references for further study Windows domains and domain users In the early era of Windows, operating system user were created standalone until Windows NT operating system hit the market. Windows NT, that is, Windows New Technology introduced some great feature to the world—including domains. A domain is a group of computers that run on Windows operating systems. Amongst them is a computer that holds all the information related to user authentication and user database and is called the domain controller (server), whereas every user who is part of this user database on the domain controller is called a domain user. Domain users have access to any resource across the domain and its subdomains with the privilege they have, unlike the standalone user who has access to the resources available to a specific system. With the release of Windows Server 2000, Microsoft released Active Directory (AD), which is now widely used with Windows operating system networks to store, authenticate, and control users who are part of the domain. A Windows domain uses various modes to authenticate users—encrypted passwords, various handshake methods such as PKI, Kerberos, EAP, SSL certificates, NAP, LDAP, and IP Sec policy—and makes it robust authentication. One can choose the authentication method that suits business needs and based on the environment. Let's now see various authentication methods in detail. Public Key Infrastructure (PKI): This is the most common method used to transmit data over insecure channels such as the Internet using digital certificates. It has generally two parts—the public and private keys. These keys are generated by a Certificate Authority, such as, Thawte. Public keys are stored in a directory where they are accessible by all parties. The public key is used by the message sender to send encrypted messages, which then can be decrypted using the private key. Kerberos: This is an authentication method used in client server architecture to authorize the client to use service(s) on a server in a network. In this method, when a client sends a request to use a service to a server, a request goes to the authentication server, which will generate a session key and a random value based on the username. This session key and a random value are then passed to the server, which grants or rejects the request. These sessions are for certain time period, which means for that particular amount of time the client can use the service without having to re-authenticate itself. Extensible Authentication Protocol (EAP): This is an authentication protocol generally used in wireless and point-to-point connections. SSL Certificates: A Secure Socket Layer certificate (SSL) is a digital certificate that is used to identify a website or server that provides a service to clients and sends the data in an encrypted form. SSL certificates are typically used by websites such as GMAIL. When we type a URL and press Enter, the web browser sends a request to the web server to identify itself. The web server then sends a copy of its SSL certificate, which is checked by the browser. If the browser trusts the certificate (this is generally done on the basis of the CA and Registration Authority and directory verification), it will send a message back to the server and in reply the web server sends an acknowledgement to the browser to start an encrypted session. Network Access Protection (NAP): This is a new platform introduced by Microsoft with the release of Windows Server 2008. It will provide access to the client, based on the identity of the client, the group it belongs to, and the level compliance it has with the policy defined by the Network Administrators. If the client doesn't have a required compliance level, NAP has mechanisms to bring the client to the compliance level dynamically and allow it to access the network. Lightweight Directory Access Protocol (LDAP): This is a protocol that runs over TCP/IP directly. It is a set of objects, that is, organizational units, printers, groups, and so on. When the client sends a request for a service, it queries the LDAP server to search for availability of information, and based on that information and level of access, it will provide access to the client. IP Security (IPSEC): IP Security is a set of protocols that provides security at the network layer. IP Sec provides two choices: Authentication Header: Here it encapsulates the authentication of the sender in a header of the network packet. Encapsulating Security Payload: Here it supports encryption of both the header and data. Now that we know basic information on Windows domains, domain users, and various authentication methods used with Windows servers, I will walk you through some of the basic and preliminary stuff about SQL Server security! Understanding SQL Server Security Security!! Now-a-days we store various kinds of information into databases and we just want to be sure that they are secured. Security is the most important word to the IT administrator and vital for everybody who has stored their information in a database as he/she needs to make sure that not a single piece of data should be made available to someone who shouldn't have access. Because all the information stored in the databases is vital, everyone wants to prevent unauthorized access to highly confidential data and here is how security implementation in SQL Server comes into the picture. With the release of SQL Server 2000, Microsoft (MS) has introduced some great security features such as authentication roles (fixed server roles and fixed database roles), application roles, various permissions levels, forcing protocol encryption, and so on, which are widely used by administrators to tighten SQL Server security. Basically, SQL Server security has two paradigms: one is SQL Server's own set of security measures and other is to integrate them with the domain. SQL Server has two methods of authentication. Windows authentication In Windows authentication mode, we can integrate domain accounts to authenticate users, and based on the group they are members of and level of access they have, DBAs can provide them access on the particular SQL server box. Whenever a user tries to access the SQL Server, his/her account is validated by the domain controller first, and then based on the permission it has, the domain controller allows or rejects the request; here it won't require separate login ID and password to authenticate. Once the user is authenticated, SQL server will allow access to the user based on the permission it has. These permissions are in form of Roles including Server, fixed DB Roles, and Application roles. Fixed Server Roles: These are security principals that have server-wide scope. Basically, fixed server roles are expected to manage the permissions at server level. We can add SQL logins, domain accounts, and domain groups to these roles. There are different roles that we can assign to a login, domain account, or group—the following table lists them. Fixed DB Roles: These are the roles that are assigned to a particular login for the particular database; its scope is limited to the database it has permission to. There are various fixed database roles, including the ones shown in the following table: Application Role: The Application role is a database principal that is widely used to assign user permissions for an application. For example, in a home-grown ERP, some users require only to view the data; we can create a role and add a db_datareader permission to it and then can add all those users who require read-only permission. Mixed Authentication: In the Mixed authentication mode, logins can be authenticated by the Windows domain controller or by SQL Server itself. DBAs can create logins with passwords in SQL Server. With the release of SQL Server 2005, MS has introduced password policies for SQL Server logins. Mixed mode authentication is used when one has to run a legacy application and it is not on the domain network. In my opinion, Windows authentication is good because we can use various handshake methods such as PKI, Kerberos, EAP, SSL NAP, LDAP, or IPSEC to tighten the security. SQL Server 2005 has enhancements in its security mechanisms. The most important features amongst them are password policy, native encryption, separation of users and schema, and no need to provide system administrator (SA) rights to run profiler. These are good things because SA is the super user, and with the power this account has, a user can do anything on the SQL box, including: The user can grant ALTER TRACE permission to users who require to run profiler The user can create login and users The user can grant or revoke permission to other users A schema is an object container that is owned by a user and is transferable to any other users. In earlier versions, objects are owned by users, so if the user leaves the company we cannot delete his/her account from the SQL box just because there is some object he/she has created. We first have to change the owner of that object and then we can delete that account. On the other hand, in the case of a schema, we could have dropped the user account because the object is owned by the schema. Now, SQL Server 2008 will give you more control over the configuration of security mechanisms. It allows you to configure metadata access, execution context, and auditing events using DDL triggers—the most powerful feature to audit any DDL event. If one wishes to know more about what we have seen till now, he/she can go through the following links: http://www.microsoft.com/sqlserver/2008/en/us/Security.aspx http://www.microsoft.com/sqlserver/2005/en/us/security-features.aspx http://technet.microsoft.com/en-us/library/cc966507.aspx In this section, we understood the basics of domains, domain users, and SQL Server security. We also learned why security is given so much emphasize these days. In the next section, we will understand the basics of clustering and its components. What is clustering? Before starting with SQL Server clustering, let's have a look at clustering in general and Windows clusters. The word Cluster itself is self-descriptive—a bunch or group. When two or more than two computers are connected to each other by means of a network and share some of the common resources to provide redundancy or performance improvement, they are known as a cluster of computers. Clustering is usually deployed when there is a critical business application running that needs to be available 24 X 7 or in terminology—High Availability. These clusters are known as Failover clusters because the primary goal to set up the cluster is to make services or business processes that are business critical available 24 X 7. MS Windows server Enterprise and Datacenter edition supports failover clustering. This is achieved by having two identical nodes connected to each other by means of private network or commonly used resources. In case of failure of any common resource or services, the first node (Active) passes the ownership to another node (Passive). SQL Server Clustering is built on top of Windows Clustering, which means before we go about installing SQL Server clustering, we should have Windows clustering installed. Before we start, let's understand the commonly used shared resources for the cluster server. Clusters with 2, 4, 8, 12 or 32 nodes can be built Windows Server 2008 R2. Clusters are categorized in the following manner: High-Availability Clusters: This type of cluster is known as a Failover cluster. High Availability clusters are implemented when the purpose is to provide highly available services. For implementing a failover or high availability cluster one may have up to 16 nodes in a Microsoft Cluster. Clustering in Windows operating systems was first introduced with the release of Windows NT 4.0 Enterprise Edition, and was enhanced gradually. Even though we can have non-identical hardware, we should use identical hardware. This is because if the node to which cluster fails over has lower configuration, then we might face degradation in performance. Load Balancing: This is the second form of cluster that can be configured. This type of cluster can be configured by linking multiple computers with each other and making use of each resource they need for operation. From the user's point of view, all of these servers/nodes linked to each other are different. However, it is collectively and virtually a single system, with the main goal being to balance work by sharing CPU, disk, and every possible resource among the linked nodes and that is why it is known as a Load Balancing cluster. SQL Server doesn't support this form of clustering. Compute Clusters: When computers are linked together with the purpose of using them for simulation for aircraft, they are known as a compute cluster. A well-known example is Beowulf computers. Grid Computing: This is one kind of clustering solution, but it is more often used when there is a dispersed location. This kind of cluster is called a Supercomputer or HPC. The main application is scientific research, academic, mathematical, or weather forecasting where lots of CPUs and disks are required—SETI@home is a well-known example. If we talk about SQL Server clusters, there are some cool new features that are added in the latest release of SQL Server 2008, although with the limitation that these features are available only if SQL Server 2008 is used with Windows Server 2008. So, let's have a glance at these features: Service SID: Service SIDs were introduced with Windows Vista and Windows Server 2008. They enable us to bind permissions directly to Windows services. In the earlier version of SQL Server 2005, we need to have a SQL Server Services account that is a member of a domain group so that it can have all the required permissions. This is not the case with SQL Server 2008 and we may choose Service SIDs to bypass the need to provision domain groups. Support for 16 nodes: We may add up to 16 nodes in our SQL Server 2008 cluster with SQL Server 2008 Enterprise 64-bit edition. New cluster validation: As a part of the installation steps, a new cluster validation step is implemented. Prior to adding a node into an existing cluster, this validation step checks whether or not the cluster environment is compatible. Mount Drives: Drive letters are no longer essential. If we have a cluster configured that has limited drive letters available, we may mount a drive and use it in our cluster environment, provided it is associated with the base drive letter. Geographically dispersed cluster node: This is the super-cool feature introduced with SQL Server 2008, which uses VLAN to establish connectivity with other cluster nodes. It breaks the limitation of having all clustered nodes at a single location. IPv6 Support: SQL Server 2008 natively supports IPv6, which increases network IP address size to 128 bit from 32 bit. DHCP Support: Windows server 2008 clustering introduced the use of DHCP-assigned IP addresses by the cluster services and it is supported by SQL Server 2008 clusters. It is recommended to use static IP addresses. This is because if some of our application depends on IP addresses, or in case of failure of renewing IP address from DHCP server, there would be a failure of IP address resources. iSCSI Support: Windows server 2008 supports iSCSI to be used as storage connection; in earlier versions, only SAN and fibre channels were supported.

0
0
8215

article-image-introduction-database-mirroring-microsoft-sql-server-2008-high-availability

Packt

19 Jan 2011

5 min read

Introduction to Database Mirroring in Microsoft SQL Server 2008 High Availability

Packt

19 Jan 2011

5 min read

Microsoft SQL Server 2008 High Availability Minimize downtime, speed up recovery, and achieve the highest level of availability and reliability for SQL server applications by mastering the concepts of database mirroring,log shipping,clustering, and replication Install various SQL Server High Availability options in a step-by-step manner A guide to SQL Server High Availability for DBA aspirants, proficient developers and system administrators Learn the pre and post installation concepts and common issues you come across while working on SQL Server High Availability Tips to enhance performance with SQL Server High Availability External references for further study Introduction Mr. Young, who is the Principal DBA in XY Incorporation, a large manufacturing company, was asked to come up with a solution that could serve his company's needs to make a database server highly available, without a manual or minimal human intervention. He was also asked to keep in mind the limited budget the company has for the financial year. After careful research, he has come up with an idea to go with Database Mirroring as it provides an option of Automatic Failover—a cost effective solution. He has prepared a technical document for the management and peers, in order to make them understand how it works, based on tests he performed on virtual test servers. What is Database Mirroring Database Mirroring is an option that can be used to cater to the business need, in order to increase the availability of SQL Server database as standby, for it to be used as an alternate production server in the case of any emergency. As its name suggests, mirroring stands for making an exact copy of the data. Mirroring can be done onto a disk, website, or somewhere else. Similarly, Microsoft has introduced Database Mirroring with the launch of SQL Server 2005 post SP1, which performs the same function—making an exact copy of the database between two physically separate database servers. As Mirroring is a database-wide feature, it can be implemented per database instead of implementing it server wide. Disk Mirroring is a technology wherein data is stored on physically separate but identical hard disks at the same time called hardware array disk 1 or RAID 1. Different components of the Database Mirroring To install Database Mirroring, there are three components that are required. They are as follows: Principal Server: This is the database server that will send out the data to the participant server (we'll call it secondary/standby server) in the form of transactions. Secondary Server: This is the database server that receives all the transactions that are sent by the Principal Server in order to keep the database identical. Witness Server (optional): This server will continuously monitor the Principal and Secondary Server and will enable automatic failover. This is an optional server. To have the automatic failover feature, the Principal Server should be in the synchronous mode. How Database Mirroring works In Database Mirroring, every server is a known partner and they complement each other as Principal and Mirror. There will be only one Principal and only one Mirror at any given time. In reality, DML operations that are performed on the Principal Server are all re-performed at the Mirror server. As we all know, the data is written into the Log Buffer before it is written into data pages. Database Mirroring sends data that is written into Principal Server's Log Buffer simultaneously to the Mirror database. All these transactions are sent in a sequential manner and as quickly as possible. There are two different operating modes at which Database Mirroring operates—asynchronous and synchronous. Asynchronous a.k.a. High Performance mode The transactions are sent to the Secondary Server as soon as they are written into the Log Buffer. In this mode of operation, the data is first committed at the Principal Server before it actually is written into the Log Buffer of the Secondary Server. Hence this mode is called High Performance mode, but at the same time, it introduces a chance of data loss. Synchronous a.k.a. High Safety mode The transactions are sent to the Secondary Server as soon as they are written to the Log Buffer. These transactions are then committed simultaneously at both the ends. Prerequisites Let's now have a look at the prerequisite to have Database Mirroring in place. Please be cautious with the prerequisites, as a single missed requisite would result in a failure in installation. Recovery Mode: To implement Database Mirroring, the database should be in the Full Recovery mode. Initialization of database: The database on which to install Database Mirroring should be present in the mirror database. To achieve this, we can restore the most recent backup, followed by the transaction log with the NORECOVERY option. Database name: Ensure that the database name is the same for both, the principal as well as the mirror database. Compatibility: Ensure that the partner servers are on the same edition of the SQL Server. Database Mirroring is supported by the Standard and Enterprise edition. Synchronous mode with Automatic failover is an Enterprise-only feature. Disk space: Ensure that we have ample space available for the mirror database. Service accounts: Ensure that the service accounts that we have used are domain accounts and they have the required permission, that is, CONNECT permission to the endpoints. Listener port: These are TCP ports on which a Database Mirroring session is established between the Principal and Mirror server. Endpoints: These are the objects that are dedicated to Database Mirroring and enable the SQL Server to communicate over the network. Authentication: While both the Principal and Mirror servers talk to each other, they should authenticate each other. For this, the accounts that we use—local accounts or domain accounts—should have login and send message permissions. If the accounts we use are using local logins as service accounts, we must use Certificates to authenticate a connection request.

0
0
3126

Packt

17 Jan 2011

7 min read

Creating Line Graphs in R

Packt

17 Jan 2011

7 min read

Adding customized legends for multiple line graphs Line graphs with more than one line, representing more than one variable, are quite common in any kind of data analysis. In this recipe we will learn how to create and customize legends for such graphs. Getting ready We will use the base graphics library for this recipe, so all you need to do is run the recipe at the R prompt. It is good practice to save your code as a script to use again later. How to do it... First we need to load the cityrain.csv example data file, which contains monthly rainfall data for four major cities across the world. You can download this file from here. We will use the cityrain.csv example dataset. rain<-read.csv("cityrain.csv") plot(rain$Tokyo,type="b",lwd=2, xaxt="n",ylim=c(0,300),col="black", xlab="Month",ylab="Rainfall (mm)", main="Monthly Rainfall in major cities") axis(1,at=1:length(rain$Month),labels=rain$Month) lines(rain$Berlin,col="red",type="b",lwd=2) lines(rain$NewYork,col="orange",type="b",lwd=2) lines(rain$London,col="purple",type="b",lwd=2) legend("topright",legend=c("Tokyo","Berlin","New York","London"), lty=1,lwd=2,pch=21,col=c("black","red","orange","purple"), ncol=2,bty="n",cex=0.8, text.col=c("black","red","orange","purple"), inset=0.01) How it works... We used the legend() function. It is quite a flexible function and allows us to adjust the placement and styling of the legend in many ways. The first argument we passed to legend() specifies the position of the legend within the plot region. We used "topright"; other possible values are "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "right", and "center". We can also specify the location of legend with x and y co-ordinates as we will soon see. The other important arguments specific to lines are lwd and lty which specify the line width and type drawn in the legend box respectively. It is important to keep these the same as the corresponding values in the plot() and lines() commands. We also set pch to 21 to replicate the type="b" argument in the plot() command. cex and text.col set the size and colors of the legend text. Note that we set the text colors to the same colors as the lines they represent. Setting bty (box type) to "n" ensures no box is drawn around the legend. This is good practice as it keeps the look of the graph clean. ncol sets the number of columns over which the legend labels are spread and inset sets the inset distance from the margins as a fraction of the plot region. There's more... Let's experiment by changing some of the arguments discussed: legend(1,300,legend=c("Tokyo","Berlin","New York","London"), lty=1,lwd=2,pch=21,col=c("black","red","orange","purple"), horiz=TRUE,bty="n",bg="yellow",cex=1, text.col=c("black","red","orange","purple")) This time we used x and y co-ordinates instead of a keyword to position the legend. We also set the horiz argument to TRUE. As the name suggests, horiz makes the legend labels horizontal instead of the default vertical. Specifying horiz overrides the ncol argument. Finally, we made the legend text bigger by setting cex to 1 and did not use the inset argument. An alternative way of creating the previous plot without having to call plot() and lines() multiple times is to use the matplot() function. To see details on how to use this function, please see the help file by running ?matplot or help(matplot) at the R prompt. Using margin labels instead of legends for multiple line graphs While legends are the most commonly used method of providing a key to read multiple variable graphs, they are often not the easiest to read. Labelling lines directly is one way of getting around that problem. Getting ready We will use the base graphics library for this recipe, so all you need to do is run the recipe at the R prompt. It is good practice to save your code as a script to use again later. How to do it... Let's use the gdp.txt example dataset to look at the trends in the annual GDP of five countries: gdp<-read.table("gdp_long.txt",header=T) library(RColorBrewer) pal<-brewer.pal(5,"Set1") par(mar=par()$mar+c(0,0,0,2),bty="l") plot(Canada~Year,data=gdp,type="l",lwd=2,lty=1,ylim=c(30,60), col=pal[1],main="Percentage change in GDP",ylab="") mtext(side=4,at=gdp$Canada[length(gdp$Canada)],text="Canada", col=pal[1],line=0.3,las=2) lines(gdp$France~gdp$Year,col=pal[2],lwd=2) mtext(side=4,at=gdp$France[length(gdp$France)],text="France", col=pal[2],line=0.3,las=2) lines(gdp$Germany~gdp$Year,col=pal[3],lwd=2) mtext(side=4,at=gdp$Germany[length(gdp$Germany)],text="Germany", col=pal[3],line=0.3,las=2) lines(gdp$Britain~gdp$Year,col=pal[4],lwd=2) mtext(side=4,at=gdp$Britain[length(gdp$Britain)],text="Britain", col=pal[4],line=0.3,las=2) lines(gdp$USA~gdp$Year,col=pal[5],lwd=2) mtext(side=4,at=gdp$USA[length(gdp$USA)]-2, text="USA",col=pal[5],line=0.3,las=2) How it works... We first read the gdp.txt data file using the read.table() function. Next we loaded the RColorBrewer color palette library and set our color palette pal to "Set1" (with five colors). Before drawing the graph, we used the par() command to add extra space to the right margin, so that we have enough space for the labels. Depending on the size of the text labels you may have to experiment with this margin until you get it right. Finally, we set the box type (bty) to an L-shape ("l") so that there is no line on the right margin. We can also set it to "c" if we want to keep the top line. We used the mtext() function to label each of the lines individually in the right margin. The first argument we passed to the function is the side where we want the label to be placed. Sides (margins) are numbered starting from 1 for the bottom side and going round in a clockwise direction so that 2 is left, 3 is top, and 4 is right. The at argument was used to specify the Y co-ordinate of the label. This is a bit tricky because we have to make sure we place the label as close to the corresponding line as possible. So, here we have used the last value of each line. For example, gdp$France[length(gdp$France) picks the last value in the France vector by using its length as the index. Note that we had to adjust the value for USA by subtracting 2 from its last value so that it doesn't overlap the label for Canada. We used the text argument to set the text of the labels as country names. We set the col argument to the appropriate element of the pal vector by using a number index. The line argument sets an offset in terms of margin lines, starting at 0 counting outwards. Finally, setting las to 2 rotates the labels to be perpendicular to the axis, instead of the default value of 1 which makes them parallel to the axis. Sometimes, simply using the last value of a set of values may not work because the value may be missing. In that case we can use the second last value or visually choose a value that places the label closest to the line. Also, the size of the plot window and the proximity of the final values may cause overlapping of labels. So, we may need to iterate a few times before we get the placement right. We can write functions to automate this process but it is still good to visually inspect the outcome.

0
0
4319

How-To Tutorials - Data

Getting Started with Sphinx Search

Sphinx: Index Searching

Oracle GoldenGate 11g: Performance Tuning

Oracle GoldenGate: Considerations for Designing a Solution

Oracle GoldenGate 11g: Configuration for High Availability

OpenAM: Oracle DSEE and Multiple Data Stores

OpenAM: Backup, Recovery, and Logging

Creating Time Series Charts in R

FAQs on Microsoft SQL Server 2008 High Availability

OpenAM Identity Stores: Types, Supported Types, Caching and Notification

Trending Topics

Choosing Styles of Various Graph Elements in R

Adjusting Key Parameters in R

Microsoft SQL Server 2008 High Availability: Understanding Domains, Users, and Security

Introduction to Database Mirroring in Microsoft SQL Server 2008 High Availability

Creating Line Graphs in R

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access