Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Data

1229 Articles
article-image-exporting-sap-businessobjects-dashboards-different-environments
Packt
02 Jun 2011
7 min read
Save for later

Exporting SAP BusinessObjects Dashboards into Different Environments

Packt
02 Jun 2011
7 min read
  SAP BusinessObjects Dashboards 4.0 Cookbook Introduction First, the visual model is compiled to a SWF file format. Compiling to a SWF file format ensures that the dashboard plays smoothly on different screen sizes and across different platforms. It also ensures that the users aren't given huge 10+ megabyte files. After compilation of the visual model to a SWF file, developers can then publish it to a format of their choice. The following are the available choices—Flash (SWF), AIR, SAP BusinessObjects Platform, HTML, PDF, PPT, Outlook, and Word. Once publishing is complete, the dashboard is ready to share! Exporting to a standard SWF, PPT, PDF, and so on After developing a Visual Model on Dashboard Design, we will need to somehow share it with users. We want to put it into a format that everyone can see on their machines. The simplest way is to export to a standard SWF file. One of the great features Dashboard Design has is to be able to embed dashboards into different office file formats. For example, a presenter could have a PowerPoint deck, and in the middle of the presentation, have a working dashboard that presents an important set of data values to the audience. Another example could be an executive level user who is viewing a Word document created by an analyst. The analyst could create a written document in Word and then embed a working dashboard with the most updated data to present important data values to the executive level user. You can choose to embed a dashboard in the following file types: PowerPoint Word PDF Outlook HTML Getting ready Make sure your visual model is complete and ready for sharing. How to do it... In the menu toolbar, go to File | Export | Flash (SWF). Select the directory in which you want the SWF to go to and the name of your SWF file How it works... Xcelsius compiles the visual model into an SWF file that everyone is able to see. Once the SWF file has been compiled, the dashboard will then be ready for sharing. It is mandatory that anyone viewing the dashboard have Adobe Flash installed. If not, they can download and install it from http://www.adobe.com/products/flashplayer/. If we export to PPT, we can then edit the PowerPoint file however we desire. If you have an existing PowerPoint presentation deck and want to append the dashboard to it, the easiest way is to first embed the dashboard SWF to a temporary PowerPoint file and then copy that slide to your existing PowerPoint file. There's more... Exporting to an SWF file makes it very easy for distribution, thus making the presentation of mockups great at a business level. Developers are able to work very closely with the business and iteratively come up with a visual model closest to the business goals. It is important though, when distributing SWF files, that everyone viewing the dashboards has the same version, otherwise confusion may occur. Thus, as a best practice, versioning every SWF that is distributed is very important. It is important to note that when the much anticipated Adobe Flash 10.1 was released, there were problems with embedding Dashboard Design dashboards in DOC, PPT, PDF, and so on. However, with the 10.1.82.76 Adobe Flash Player update, this has been fixed. Thus, it is important that if users have Adobe Flash Player 10.1+ installed, the version is higher than or equal to 10.1.82.76. When exporting to PDF, please take the following into account: In Dashboard Design 2008, the default format for exporting to PDF is Acrobat 9.0 (PDF 1.8). If Acrobat Reader 8.0 is installed, the default exported PDF cannot be opened. If using Acrobat Reader 8.0 or older, change the format to "Acrobat 6.0 (PDF 1.5)" before exporting to PDF. Exporting to SAP Business Objects Enterprise After Dashboard Design became a part of BusinessObjects, it was important to be able to export dashboards into the BusinessObjects Enterprise system. Once a dashboard is exported to BusinessObjects Enterprise, users can then easily access their dashboards through InfoView (now BI launch pad). On top of that, administrators are able control dashboard security. Getting ready Make sure your visual model is complete and ready for sharing. How to do it... From the menu toolbar, go to File | Export | Export to SAP BusinessObjects Platform. Enter your BusinessObjects login credentials and then select the location in the SAP BusinessObjects Enterprise system, where you want to store the SWF file, as shown in the following screenshot: Log into BI launch pad (formerly known as InfoView) and verify that you can access the dashboard. (Move the mouse over the image to enlarge.) How it works... When we export a dashboard to SAP BusinessObjects Enterprise, we basically place it in the SAP BusinessObjects Enterprise content management system. From there, we can control accessibility to the dashboard and make sure that we have one source of truth instead of sending out multiple dashboards through e-mail and possibly getting mixed up with what is the latest version. When we log into BI launch pad (formerly known as Infoview), it also passes the login token to the dashboard, so we don't have to enter our credentials again when connecting to SAP BusinessObjects Enterprise data. This is important because we don't have to manually create and pass any additional tokens once we have logged in. There's more... To give a true website type feel, developers can house their dashboards in a website type format using Dashboard Builder. This in turn provides a better experience for users, as they don't have to navigate through many folders in order to access the dashboard that they are looking for. Publishing to SAP BW This recipe shows you how to publish Dashboard Design dashboards to a SAP BW system. Once a dashboard is saved to the SAP BW system, it can be published within a SAP Enterprise Portal iView and made available for the users. Getting ready For this recipe, you will need an Dashboard Design dashboard model. This dashboard does not necessarily have to include a data connection to SAP BW. How to do it... Select Publish in the SAP menu. If you want to save the Xcelsius model with a different name, select the Publish As... option. If you are not yet connected to the SAP BW system, a pop up will appear. Select the appropriate system and fill in your username and password in the dialog box. If you want to disconnect from the SAP BW system and connect to a different system, select the Disconnect option from the SAP menu. Enter the Description and Technical Name of the dashboard. Select the location you want to save the dashboard to and click on Save. The dashboard is now published to the SAP BW system. To launch the dashboard and view it from the SAP BW environment, select the Launch option from the SAP menu. You will be asked to log in to the SAP BW system before you are able to view the dashboard. How it works... As we have seen in this recipe, the publishing of an Dashboard Design dashboard to SAP BW is quite straightforward. As the dashboard is part of the SAP BW environment after publishing, the model can be transported between SAP BW systems like all other SAP BW objects. There is more... After launching step 5, the Dashboard Design dashboard will load in your browser from the SAP BW server. You can add the displayed URL to an SAP Enterprise Portal iView to make the dashboard accessible for portal users.  
Read more
  • 0
  • 0
  • 7527

article-image-sap-businessobjects-customizing-dashboard
Packt
27 May 2011
4 min read
Save for later

SAP BusinessObjects: Customizing the Dashboard

Packt
27 May 2011
4 min read
SAP BusinessObjects Dashboards 4.0 Cookbook Over 90 simple and incredibly effective recipes for transforming your business data into exciting dashboards with SAP BusinessObjects Dashboards 4.0 Xcelsius Introduction In this article, we will go through certain techniques on how you can utilize the different cosmetic features Dashboard Design provides, in order to improve the look of your dashboard. Dashboard Design provides a powerful way to capture the audience versus other dashboard tools. It allows developers to build dashboards with the important 'wow' factor that other tools lack. Let's take, for example, two dashboards that have the exact same functionality, placement of charts, and others. However, one dashboard looks much more attractive than the other. In general, people looking at the nicer looking dashboard will be more interested and thus get more value of the data that comes out of it. Thus, not only does Dashboard Design provide a powerful and flexible way of presenting data, but it also provides the 'wow' factor to capture a user's interest. Changing the look of a chart This recipe will run through changing the look of a chart. Particularly, it will go through each tab in the appearance icon of the chart properties. We will then make modifications and see the resulting changes. Getting ready Insert a chart object onto the canvas. Prepare some data and bind it to the chart. How to do it... Double-click/right-click on the chart object on the canvas/object properties window to go into Chart Properties. In the Layout tab, uncheck Show Chart Background. (Move the mouse over the image to enlarge.) In the Series tab, click on the colored square box circled in the next screenshot to change the color of the bar to your desired color. Then change the width of each bar; click on the Marker Size area and change it to 35. Click on the colored boxes circled in red in the Axes tab and choose dark blue to modify the horizontal and vertical axes separately. Uncheck Show Minor Gridlines at the bottom so that we remove all the horizontal lines in between each of the major gridlines. Next, go to the Text and Color tabs, where you can make changes to all the different text areas of the chart. How it works... As you can see, the default chart looks plain and the bars are skinny so it's harder to visualize things. It is a good idea to remove the chart background if there is an underlying background so that the chart blends in better. In addition, the changes to the chart colors and text provide additional aesthetics that help improve the look of the chart. Adding a background to your dashboard This recipe shows the usefulness of backgrounds in the dashboard. It will show how backgrounds can help provide additional depth to objects and help to group certain areas together for better visualization. Getting ready Make sure you have all your objects such as charts and selectors ready on the canvas. Here's an example of the two charts before the makeover. Bind some data to the charts if you want to change the coloring of the series How to do it... Choose Background4 from the Art and Backgrounds tab of the Components window. Stretch the background so that it fills the size of the canvas. Make sure that ordering of the backgrounds is before the charts. To change the ordering of the background, go to the object browser, select the background object and then press the "-" key until the background object is behind the chart. Select Background1 from the Art and Backgrounds tab and put two of them under the charts, as shown in the following screenshot: When the backgrounds are in the proper place, open the properties window for the backgrounds and set the background color to your desired color. In this example we picked turquoise blue for each background. How it works... As you can see with the before and after pictures, having backgrounds can make a huge difference in terms of aesthetics. The objects are much more pleasant to look at now and there is certainly a lot of depth with the charts. The best way to choose the right backgrounds that fit your dashboard is to play around with the different background objects and their colors. If you are not very artistic, you can come up with a bunch of examples and demonstrate it to the business user to see which one they prefer the most.  
Read more
  • 0
  • 0
  • 7021

article-image-opencv-image-processing-using-morphological-filters
Packt
25 May 2011
6 min read
Save for later

OpenCV: Image Processing using Morphological Filters

Packt
25 May 2011
6 min read
  OpenCV 2 Computer Vision Application Programming Cookbook Over 50 recipes to master this library of programming functions for real-time computer vision         Read more about this book       Morphological filtering is a theory developed in the 1960s for the analysis and processing of discrete images. It defines a series of operators which transform an image by probing it with a predefined shape element. The way this shape element intersects the neighborhood of a pixel determines the result of the operation. This article presents the most important morphological operators. It also explores the problem of image segmentation using algorithms working on the image morphology. Eroding and dilating images using morphological filters Erosion and dilation are the most fundamental morphological operators. Therefore, we will present them in this first recipe. The fundamental instrument in mathematical morphology is the structuring element. A structuring element is simply defined as a configuration of pixels (a shape) on which an origin is defined (also called anchor point). Applying a morphological filter consists of probing each pixel of the image using this structuring element. When the origin of the structuring element is aligned with a given pixel, its intersection with the image defines a set of pixels on which a particular morphological operation is applied. In principle, the structuring element can be of any shape, but most often, a simple shape such as a square, circle, or diamond with the origin at the center is used (mainly for efficiency reasons). Getting ready As morphological filters usually work on binary images, we will use a binary image produced through thresholding. However, since in morphology, the convention is to have foreground objects represented by high (white) pixel values and background by low (black) pixel values, we have negated the image. How to do it... Erosion and dilation are implemented in OpenCV as simple functions which are cv::erode and cv::dilate. Their use is straightforward: // Read input imagecv::Mat image= cv::imread("binary.bmp");// Erode the imagecv::Mat eroded; // the destination imagecv::erode(image,eroded,cv::Mat());// Display the eroded imagecv::namedWindow("Eroded Image");");cv::imshow("Eroded Image",eroded);// Dilate the imagecv::Mat dilated; // the destination imagecv::dilate(image,dilated,cv::Mat());// Display the dilated imagecv::namedWindow("Dilated Image");cv::imshow("Dilated Image",dilated); The two images produced by these function calls are seen in the following screenshot. Erosion is shown first: Followed by the dilation result: How it works... As with all other morphological filters, the two filters of this recipe operate on the set of pixels (or neighborhood) around each pixel, as defined by the structuring element. Recall that when applied to a given pixel, the anchor point of the structuring element is aligned with this pixel location, and all pixels intersecting the structuring element are included in the current set. Erosion replaces the current pixel with the minimum pixel value found in the defined pixel set. Dilation is the complementary operator, and it replaces the current pixel with the maximum pixel value found in the defined pixel set. Since the input binary image contains only black (0) and white (255) pixels, each pixel is replaced by either a white or black pixel. A good way to picture the effect of these two operators is to think in terms of background (black) and foreground (white) objects. With erosion, if the structuring element when placed at a given pixel location touches the background (that is, one of the pixels in the intersecting set is black), then this pixel will be sent to background. While in the case of dilation, if the structuring element on a background pixel touches a foreground object, then this pixel will be assigned a white value. This explains why in the eroded image, the size of the objects has been reduced. Observe how some of the very small objects (that can be considered as "noisy" background pixels) have also been completely eliminated. Similarly, the dilated objects are now larger and some of the "holes" inside of them have been filled. By default, OpenCV uses a 3x3 square structuring element. This default structuring element is obtained when an empty matrix (that is cv::Mat()) is specified as the third argument in the function call, as it was done in the preceding example. You can also specify a structuring element of the size (and shape) you want by providing a matrix in which the non-zero element defines the structuring element. In the following example, a 7x7 structuring element is applied: cv::Mat element(7,7,CV_8U,cv::Scalar(1));cv::erode(image,eroded,element); The effect is obviously much more destructive in this case as seen here: Another way to obtain the same result is to repetitively apply the same structuring element on an image. The two functions have an optional parameter to specify the number of repetitions: // Erode the image 3 times.cv::erode(image,eroded,cv::Mat(),cv::Point(-1,-1),3); The origin argument cv::Point(-1,-1) means that the origin is at the center of the matrix (default), and it can be defined anywhere on the structuring element. The image obtained will be identical to the one we obtained with the 7x7 structuring element. Indeed, eroding an image twice is like eroding an image with a structuring element dilated with itself. This also applies to dilation. Finally, since the notion of background/foreground is arbitrary, we can make the following observation (which is a fundamental property of the erosion/dilation operators). Eroding foreground objects with a structuring element can be seen as a dilation of the background part of the image. Or more formally: The erosion of an image is equivalent to the complement of the dilation of the complement image. The dilation of an image is equivalent to the complement of the erosion of the complement image. There's more... It is important to note that even if we applied our morphological filters on binary images here, these can also be applied on gray-level images with the same definitions. Also note that the OpenCV morphological functions support in-place processing. This means you can use the input image as the destination image. So you can write: cv::erode(image,image,cv::Mat()); OpenCV creates the required temporary image for you for this to work properly.
Read more
  • 0
  • 0
  • 11793

article-image-sphinx-search-faqs
Packt
23 May 2011
9 min read
Save for later

Sphinx Search FAQs

Packt
23 May 2011
9 min read
Got questions on Sphinx, the open source search engine? Not sure if it's the right tool for you? You're in the right place - we've put together an FAQ on Sphinx. It should help you make the right decision about the software that powers your search. If you've got questions on the other kind of Sphinx, we recommend you look here instead. What is Sphinx? Sphinx is a full-text search engine (generally standalone) which provides fast, relevant, efficient full-text search functionality to third-party applications. It was especially created to facilitate searches on SQL databases and integrates very well with scripting languages; such as PHP, Python, Perl, Ruby, and Java. What are the major features of Sphinx? Some of the major features of Sphinx include: High indexing speed (up to 10 MB/sec on modern CPUs) High search speed (average query is under 0.1 sec on 2 to 4 GB of text collection) High scalability (up to 100 GB of text, up to 100 Million documents on a single CPU) Supports distributed searching (since v.0.9.6) Supports MySQL (MyISAM and InnoDB tables are both supported) and PostgreSQL natively Supports phrase searching Supports phrase proximity ranking, providing good relevance Supports English and Russian stemming Supports any number of document fields (weights can be changed on the fly) Supports document groups Supports stopwords, that is, that it indexes only what's most relevant from a given list of words Supports different search modes ("match extended", "match all", "match phrase" and "match any" as of v.0.9.5) Generic XML interface which greatly simplifies custom integration Pure-PHP (that is, NO module compiling and so on) search client API Which operating systems does Sphinx run on? Sphinx was developed and tested mostly on UNIX based systems. All modern UNIX based operating systems with an ANSI compliant compiler should be able to compile and run Sphinx without any issues. However, Sphinx has also been found running on the following operating systems without any issues. Linux (Kernel 2.4.x and 2.6.x of various distributions) Microsoft Windows 2000 and XP FreeBSD 4.x, 5.x, 6.x NetBSD 1.6, 3.0 Solaris 9, 11 Mac OS X What does the configure command do? The configure command gets the details of our machine and also checks for all dependencies. If any of the dependency is missing, it will throw an error. Which are the various options for the configure command? There are many options that can be passed to the configure command but we will take a look at a few important ones: prefix=/path: This option specifies the path to install the sphinx binaries. with-mysql=/path: Sphinx needs to know where to find MySQL's include and library files. It auto-detects this most of the time but if for any reason it fails, you can supply the path here. with-pgsql=/path: Same as –-with-mysql but for PostgreSQL. What is full-text search? Full-text search is one of the techniques for searching a document or database stored on a computer. While searching, the search engine goes through and examines all of the words stored in the document and tries to match the search query against those words. A complete examination of all the words (text) stored in the document is undertaken and hence it is called a full-text search. Full-text search excels in searching large volumes of unstructured text quickly and effectively. It returns pages based on how well they match the user's query. What are the advantages of full-text search? The following points are some of the major advantages of full-text search: It is quicker than traditional searches as it benefits from an index of words that is used to look up records instead of doing a full table scan It gives results that can be sorted by relevance to the searched phrase or term, with sophisticated ranking capabilities to find the best documents or records It performs very well on huge databases with millions of records It skips the common words such as the, an, for, and so on When should you use full-text search? You should use full-text search when: When there is a high volume of free-form text data to be searched When there is a need for highly optimized search results When there is a demand for flexible search querying Why use Sphinx for full-text search? If you're looking for a good Database Management System (DBMS), there are plenty of options available with support for full-text indexing and searches, such as MySQL, PostgreSQL, and SQL Server. There are also external full-text search engines, such as Lucene and Solr. Let's see the advantages of using Sphinx over the DBMS's full-text searching capabilities and other external search engines: It has a higher indexing speed. It is 50 to 100 times faster than MySQL FULLTEXT and 4 to 10 times faster than other external search engines. It also has higher searching speed since it depends heavily on the mode, Boolean vs. phrase, and additional processing. It is up to 500 times faster than MySQL FULLTEXT in cases involving a large result set with GROUP BY. It is more than two times faster in searching than other external search engines available. Relevancy is among the key features one expects when using a search engine, and Sphinx performs very well in this area. It has phrase-based ranking in addition to classic statistical BM25 ranking. Last but not the least, Sphinx has better scalability. It can be scaled vertically (utilizing many CPUs, many HDDs) or horizontally (utilizing many servers), and this comes out of the box with Sphinx. One of the biggest known Sphinx cluster has over 3 billion records with more than 2 terabytes of size. What are indexes? Indexes in Sphinx are a bit different from indexes we have in databases. The data that Sphinx indexes is a set of structured documents and each document has the same set of fields. This is very similar to SQL, where each row in the table corresponds to a document and each column to a field. Sphinx builds a special data structure that is optimized for answering full-text search queries. This structure is called an index and the process of creating an index from the data is called indexing. The indexes in Sphinx can also contain attributes that are highly optimized for filtering. These attributes are not full-text indexed and do not contribute to matching. However, they are very useful at filtering out the results we want based on attribute values. There can be different types of indexes suited for different tasks. The index type, which has been implemented in Sphinx, is designed for maximum indexing and searching speed. What are multi-value attributes (MVA)? MVAs are a special type of attribute in Sphinx that make it possible to attach multiple values to every document. These attributes are especially useful in cases where each document can have multiple values for the same property (field). How does weighting help? Weighting decides which document gets priority over other documents and appear at the top. In Sphinx, weighting depends on the search mode. Weight can also be referred to as ranking. There are two major parts which are used in weighting functions: Phrase rank: This is based on the length of Longest Common Subsequence (LCS) of search words between document body and query phrase. This means that the documents in which the queried phrase matches perfectly will have a higher phrase rank and the weight would be equal to the query word counts. Statistical rank: This is based on BM25 function which takes only the frequency of the queried words into account. So, if a word appears only one time in the whole document then its weight will be low. On the other hand if a word appears a lot in the document then its weight will be higher. The BM25 weight is a floating point number between 0 and 1. What is index merging, exactly? Index merging is more efficient than indexing the data from scratch, that is, all over again. In this technique we define a delta index in the Sphinx configuration file. The delta index always gets the new data to be indexed. However, the main index acts as an archive and holds data that never changes. What is SphinxQL? Programmers normally issue search queries using one or more client libraries that relate to the database on which the search is to be performed. Some programmers may also find it easier to write an SQL query than to use the Sphinx Client API library. SphinxQL is used to issue search queries in the form of SQL queries. These queries can be fired from any client of the database in question, and returns the results in the way that a normal query would. Currently MySQL binary network protocol is supported and this enables Sphinx to be accessed with the regular MySQL API. What do you mean by Geo-distance search? In a Geo-distance search, you can find geo coordinates nearby to the base anchor point. Thus you can use this technique to find the nearby places to the given location. It can be useful in many applications like hotel search, property search, restaurant search, tourist destination search etc. Sphinx makes it very easy to perform a geo-distance search by providing an API method wherein you can set the anchor point (if you have latitude and longitude in your index) and all searches performed thereafter will return the results with a magic attribute "@geodist" holding the values of distance from the anchor point. You can then filter or sort your results based on this attribute.   Further resources on this subject: Sphinx: Index Searching [Article] Getting Started with Sphinx Search [Article] Search Engine Optimization in Joomla! [Article] Blogger: Improving Your Blog with Google Analytics and Search Engine Optimization [Article] Drupal 6 Search Engine Optimization [Book]
Read more
  • 0
  • 0
  • 6671

article-image-getting-started-sphinx-search
Packt
31 Mar 2011
6 min read
Save for later

Getting Started with Sphinx Search

Packt
31 Mar 2011
6 min read
Sphinx is a full-text search engine. So, before going any further, we need to understand what full-text search is and how it excels over the traditional searching. What is full-text search? Full-text search is one of the techniques for searching a document or database stored on a computer. While searching, the search engine goes through and examines all of the words stored in the document and tries to match the search query against those words. A complete examination of all the words (text) stored in the document is undertaken and hence it is called a full-text search. Full-text search excels in searching large volumes of unstructured text quickly and effectively. It returns pages based on how well they match the user's query. Traditional search To understand the difference between a normal search and full-text search, let's take an example of a MySQL database table and perform searches on it. It is assumed that MySQL Server and phpMyAdmin are already installed on your system. Time for action – normal search in MySQL Open phpMyAdmin in your browser and create a new database called myblog. Select the myblog database: Create a table by executing the following query: CREATE TABLE `posts` ( `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY , `title` VARCHAR( 255 ) NOT NULL , `description` TEXT NOT NULL , `created` DATETIME NOT NULL , `modified` DATETIME NOT NULL ) ENGINE = MYISAM; Queries can be executed from the SQL page in phpMyAdmin. You can find the link to that page in the top menu. Populate the table with some records: INSERT INTO `posts`(`id`, `title`, `description`, `created`, `modified`) VALUES (1, 'PHP scripting language', 'PHP is a web scripting language originally created by Rasmus Lerdorf', NOW(), NOW()), (2, 'Programming Languages', 'There are many languages available to cater any kind of programming need', NOW(), NOW()), (3, 'My Life', 'This post is about my life which in a sense is beautiful', NOW(), NOW()), (4, 'Life on Mars', 'Is there any life on mars?', NOW(), NOW()); Next, run the following queries against the table: SELECT * FROM posts WHERE title LIKE 'programming%'; The above query returns row 2. SELECT * FROM posts WHERE description LIKE '%life%'; The above query return rows 3 and 4. SELECT * FROM posts WHERE description LIKE '%scripting language%'; The above query returns row 1. SELECT * FROM posts WHERE description LIKE '%beautiful%' OR description LIKE '%programming%'; The above query returns rows 2 and 3. phpMyAdmin To administer MySQL database, I highly recommend using a GUI interface tool like phpMyAdmin (http://www.phpmyadmin.net). All the above mentioned queries can easily be executed What just happened? We first created a table posts to hold some data. Each post has a title and a description. We then populated the table with some records. With the first SELECT query we tried to find all posts where the title starts with the word programming. This correctly gave us the row number 2. But what if you want to search for the word anywhere in the field and not just at that start? For this we fired the second query, wherein we searched for the word life anywhere in the description of the post. Again this worked pretty well for us and as expected we got the result in the form of row numbers 3 and 4. Now what if we wanted to search for multiple words? For this we fired the third query where we searched for the words scripting language. As row 1 has those words in its description, it was returned correctly. Until now everything looked fine and we were able to perform searches without any hassle. The query gets complex when we want to search for multiple words and those words are not necessarily placed consecutively in a field, that is, side by side. One such example is shown in the form of our fourth query where we tried to search for the words programming and beautiful in the description of the posts. Since the number of words we need to search for increases, this query gets complicated, and moreover, slow in execution, since it needs to match each word individually. The previous SELECT queries and their output also don't give us any information about the relevance of the search terms with the results found. Relevance can be defined as a measure of how closely the returned database records match the user's search query. In other words, how pertinent the result set is to the search query. Relevance is very important in the search world because users want to see the items with highest relevance at the top of their search results. One of the major reasons for the success of Google is that their search results are always sorted by relevance. MySQL full-text search This is where full-text search comes to the rescue. MySQL has inbuilt support for full-text search and you only need to add FULLTEXT INDEX to the field against which you want to perform your search. Continuing the earlier example of the posts table, let's add a full-text index to the description field of the table. Run the following query: ALTER TABLE `posts` ADD FULLTEXT ( `description` ); The query will add an INDEX of type FULLTEXT to the description field of the posts table. Only MyISAM Engine in MySQL supports the full-text indexes. Now to search for all the records which contain the words programming or beautiful anywhere in their description, the query would be: SELECT * FROM posts WHERE MATCH (description) AGAINST ('beautiful programming'); This query will return rows 2 and 3, and the returned results are sorted by relevance. One more thing to note is that this query takes less time than the earlier query, which used LIKE for matching. By default, the MATCH() function performs a natural language search, it attempts to use natural language processing to understand the nature of the query and then search accordingly. Full-text search in MySQL is a big topic in itself and we have only seen the tip of the iceberg. For a complete reference, please refer to the MySQL manual at http://dev.mysql.com/doc/. Advantages of full-text search The following points are some of the major advantages of full-text search: It is quicker than traditional searches as it benefits from an index of words that is used to look up records instead of doing a full table scan It gives results that can be sorted by relevance to the searched phrase or term, with sophisticated ranking capabilities to find the best documents or records It performs very well on huge databases with millions of records It skips the common words such as the, an, for, and so on When to use a full-text search? When there is a high volume of free-form text data to be searched When there is a need for highly optimized search results When there is a demand for flexible search querying
Read more
  • 0
  • 0
  • 2017

article-image-sphinx-index-searching
Packt
16 Mar 2011
9 min read
Save for later

Sphinx: Index Searching

Packt
16 Mar 2011
9 min read
Client API implementations for Sphinx Sphinx comes with a number of native searchd client API implementations. Some third-party open source implementations for Perl, Ruby, and C++ are also available. All APIs provide the same set of methods and they implement the same network protocol. As a result, they more or less all work in a similar fashion, they all work in a similar fashion. All examples in this article are for PHP implementation of the Sphinx API. However, you can just as easily use other programming languages. Sphinx is used with PHP more widely than any other language. Search using client API Let's see how we can use native PHP implementation of Sphinx API to search. We will add a configuration related to searchd and then create a PHP file to search the index using the Sphinx client API implementation for PHP. Time for action – creating a basic search script Add the searchd config section to /usr/local/sphinx/etc/sphinx-blog.conf: source blog { # source options } index posts { # index options } indexer { # indexer options } # searchd options (used by search daemon) searchd { listen = 9312 log = /usr/local/sphinx/var/log/searchd.log query_log = /usr/local/sphinx/var/log/query.log max_children = 30 pid_file = /usr/local/sphinx/var/log/searchd.pid } Start the searchd daemon (as root user): $ sudo /usr/local/sphinx/bin/searchd -c /usr/local/sphinx/etc/ sphinx-blog.conf Copy the sphinxapi.php file (the class with PHP implementation of Sphinx API) from the sphinx source directory to your working directory: $ mkdir /path/to/your/webroot/sphinx $ cd /path/to/your/webroot/sphinx $ cp /path/to/sphinx-0.9.9/api/sphinxapi.php ./ Create a simple_search.php script that uses the PHP client API class to search the Sphinx-blog index, and execute it in the browser: <?php require_once('sphinxapi.php'); // Instantiate the sphinx client $client = new SphinxClient(); // Set search options $client->SetServer('localhost', 9312); $client->SetConnectTimeout(1); $client->SetArrayResult(true); // Query the index $results = $client->Query('php'); // Output the matched results in raw format print_r($results['matches']); The output of the given code, as seen in a browser, will be similar to what's shown in the following screenshot: What just happened? Firstly, we added the searchd configuration section to our sphinx-blog.conf file. The following options were added to searchd section: listen: This options specifies the IP address and port that searchd will listen on. It can also specify the Unix-domain socket path. This options was introduced in v0.9.9 and should be used instead of the port (deprecated) option. If the port part is omitted, then the default port used is 9312.Examples: listen = localhost listen = 9312 listen = localhost:9898 listen = 192.168.1.25:4000 listen = /var/run/sphinx.s log: Name of the file where all searchd runtime events will be logged. This is an optional setting and the default value is "searchd.log". query_log: Name of the file where all search queries will be logged. This is an optional setting and the default value is empty, that is, do not log queries. max_children: The maximum number of concurrent searches to run in parallel. This is an optional setting and the default value is 0 (unlimited). pid_file: Filename of the searchd process ID. This is a mandatory setting. The file is created on startup and it contains the head daemon process ID while the daemon is running. The pid_file becomes unlinked when the daemon is stopped. Once we were done with adding searchd configuration options, we started the searchd daemon with root user. We passed the path of the configuration file as an argument to searchd. The default configuration file used is /usr/local/sphinx/etc/sphinx.conf. After a successful startup, searchd listens on all network interfaces, including all the configured network cards on the server, at port 9312. If we want searchd to listen on a specific interface then we can specify the hostname or IP address in the value of the listen option: listen = 192.168.1.25:9312 The listen setting defined in the configuration file can be overridden in the command line while starting searchd by using the -l command line argument. There are other (optional) arguments that can be passed to searchd as seen in the following screenshot: searchd needs to be running all the time when we are using the client API. The first thing you should always check is whether searchd is running or not, and start it if it is not running. We then created a PHP script to search the sphinx-blog index. To search the Sphinx index, we need to use the Sphinx client API. As we are working with a PHP script, we copied the PHP client implementation class, (sphinxapi.php) which comes along with Sphinx source, to our working directory so that we can include it in our script. However, you can keep this file anywhere on the file system as long as you can include it in your PHP script. Throughout this article we will be using /path/to/webroot/sphinx as the working directory and we will create all PHP scripts in that directory. We will refer to this directory simply as webroot. We initialized the SphinxClient class and then used the following class methods to set upthe Sphinx client API: SphinxClient::SetServer($host, $port)—This method sets the searchd hostname and port. All subsequent requests use these settings unless this method is called again with some different parameters. The default host is localhost and port is 9312. SphinxClient::SetConnectTimeout($timeout)—This is the maximum time allowed to spend trying to connect to the server before giving up. SphinxClient::SetArrayResult($arrayresult)—This is a PHP client APIspecific method. It specifies whether the matches should be returned as an array or a hash. The Default value is false, which means that matches will be returned in a PHP hash format, where document IDs will be the keys, and other information (attributes, weight) will be the values. If $arrayresult is true, then the matches will be returned in plain arrays with complete per-match information. After that, the actual querying of index was pretty straightforward using the SphinxClient::Query($query) method. It returned an array with matched results, as well as other information such as error, fields in index, attributes in index, total records found, time taken for search, and so on. The actual results are in the $results['matches'] variable. We can run a loop on the results, and it is a straightforward job to get the actual document's content from the document ID and display it. Matching modes When a full-text search is performed on the Sphinx index, different matching modes can be used by Sphinx to find the results. The following matching modes are supported by Sphinx: SPH_MATCH_ALL—This is the default mode and it matches all query words, that is, only records that match all of the queried words will be returned. SPH_MATCH_ANY—This matches any of the query words. SPH_MATCH_PHRASE—This matches query as a phrase and requires a perfect match. SPH_MATCH_BOOLEAN—This matches query as a Boolean expression. SPH_MATCH_EXTENDED—This matches query as an expression in Sphinx internal query language. SPH_MATCH_EXTENDED2—This matches query using the second version of Extended matching mode. This supersedes SPH_MATCH_EXTENDED as of v0.9.9. SPH_MATCH_FULLSCAN—In this mode the query terms are ignored and no text-matching is done, but filters and grouping are still applied. Time for action – searching with different matching modes Create a PHP script display_results.php in your webroot with the following code: <?php // Database connection credentials $dsn ='mysql:dbname=myblog;host=localhost'; $user = 'root'; $pass = ''; // Instantiate the PDO (PHP 5 specific) class try { $dbh = new PDO($dsn, $user, $pass); } catch (PDOException $e){ echo'Connection failed: '.$e->getMessage(); } // PDO statement to fetch the post data $query = "SELECT p.*, a.name FROM posts AS p " . "LEFT JOIN authors AS a ON p.author_id = a.id " . "WHERE p.id = :post_id"; $post_stmt = $dbh->prepare($query); // PDO statement to fetch the post's categories $query = "SELECT c.name FROM posts_categories AS pc ". "LEFT JOIN categories AS c ON pc.category_id = c.id " . "WHERE pc.post_id = :post_id"; $cat_stmt = $dbh->prepare($query); // Function to display the results in a nice format function display_results($results, $message = null) { global $post_stmt, $cat_stmt; if ($message) { print "<h3>$message</h3>"; } if (!isset($results['matches'])) { print "No results found<hr />"; return; } foreach ($results['matches'] as $result) { // Get the data for this document (post) from db $post_stmt->bindParam(':post_id', $result['id'], PDO::PARAM_INT); $post_stmt->execute(); $post = $post_stmt->fetch(PDO::FETCH_ASSOC); // Get the categories of this post $cat_stmt->bindParam(':post_id', $result['id'], PDO::PARAM_INT); $cat_stmt->execute(); $categories = $cat_stmt->fetchAll(PDO::FETCH_ASSOC); // Output title, author and categories print "Id: {$posmt['id']}<br />" . "Title: {$post['title']}<br />" . "Author: {$post['name']}"; $cats = array(); foreach ($categories as $category) { $cats[] = $category['name']; } if (count($cats)) { print "<br />Categories: " . implode(', ', $cats); } print "<hr />"; } } Create a PHP script search_matching_modes.php in your webroot with the following code: <?php // Include the api class Require('sphinxapi.php'); // Include the file which contains the function to display results require_once('display_results.php'); $client = new SphinxClient(); // Set search options $client->SetServer('localhost', 9312); $client->SetConnectTimeout(1); $client->SetArrayResult(true); // SPH_MATCH_ALL mode will be used by default // and we need not set it explicitly display_results( $client->Query('php'), '"php" with SPH_MATCH_ALL'); display_results( $client->Query('programming'), '"programming" with SPH_MATCH_ALL'); display_results( $client->Query('php programming'), '"php programming" with SPH_MATCH_ALL'); // Set the mode to SPH_MATCH_ANY $client->SetMatchMode(SPH_MATCH_ANY); display_results( $client->Query('php programming'), '"php programming" with SPH_MATCH_ANY'); // Set the mode to SPH_MATCH_PHRASE $client->SetMatchMode(SPH_MATCH_PHRASE); display_results( $client->Query('php programming'), '"php programming" with SPH_MATCH_PHRASE'); display_results( $client->Query('scripting language'), '"scripting language" with SPH_MATCH_PHRASE'); // Set the mode to SPH_MATCH_FULLSCAN $client->SetMatchMode(SPH_MATCH_FULLSCAN); display_results( $client->Query('php'), '"php programming" with SPH_MATCH_FULLSCAN'); Execute search_matching_modes.php in a browser (http://localhost/sphinx/search_matching_modes.php).
Read more
  • 0
  • 0
  • 3137
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at ₹800/month. Cancel anytime
article-image-oracle-goldengate-11g-performance-tuning
Packt
01 Mar 2011
12 min read
Save for later

Oracle GoldenGate 11g: Performance Tuning

Packt
01 Mar 2011
12 min read
  Oracle GoldenGate 11g Implementer's guide Design, install, and configure high-performance data replication solutions using Oracle GoldenGate The very first book on GoldenGate, focused on design and performance tuning in enterprise-wide environments Exhaustive coverage and analysis of all aspects of the GoldenGate software implementation, including design, installation, and advanced configuration Migrate your data replication solution from Oracle Streams to GoldenGate Design a GoldenGate solution that meets all the functional and non-functional requirements of your system Written in a simple illustrative manner, providing step-by-step guidance with discussion points Goes way beyond the manual, appealing to Solution Architects, System Administrators and Database Administrators        Oracle states that GoldenGate can achieve near real-time data replication. However, out of the box, GoldenGate may not meet your performance requirements. Here we focus on the main areas that lend themselves to tuning, especially parallel processing and load balancing, enabling high data throughput and very low latency. Let's start by taking a look at some of the considerations before we start tuning Oracle GoldenGate. Before tuning GoldenGate There are a number of considerations we need to be aware of before we start the tuning process. For one, we must consider the underlying system and its ability to perform. Let's start by looking at the source of data that GoldenGate needs for replication to work the online redo logs. Online redo Before we start tuning GoldenGate, we must look at both the source and target databases and their ability to read/write data. Data replication is I/O intensive, so fast disks are important, particularly for the online redo logs. Redo logs play an important role in GoldenGate: they are constantly being written to by the database and concurrently being read by the Extract process. Furthermore, adding supplemental logging to a database can increase their size by a factor of 4! Firstly, ensure that only the necessary amount of supplemental logging is enabled on the database. In the case of GoldenGate, the logging of the Primary Key is all that is required. Next, take a look at the database wait events, in particular the ones that relate to redo. For example, if you are seeing "Log File Sync" waits, this is an indicator that either your disk writes are too slow or your application is committing too frequently, or a combination of both. RAID5 is another common problem for redo log writes. Ideally, these files should be placed on their own mirrored storage such as RAID1+0 (mirrored striped sets) or Flash disks. Many argue this to be a misconception with modern high speed disk arrays, but some production systems are still known to be suffering from redo I/O contention on RAID5. An adequate number (and size) of redo groups must be configured to prevent "checkpoint not complete" or "cannot allocate new log" warnings appearing in the database instance alert log. This occurs when Oracle attempts to reuse a log file but the checkpoint that would flush the blocks in the DB buffer cache to disk are still required for crash recovery. The database must wait until that checkpoint completes before the online redolog file can be reused, effectively stalling the database and any redo generation. Large objects (LOBs) Know your data. LOBs can be a problem in data replication by virtue of their size and the ability to extract, transmit, and deliver the data from source to target. Tables containing LOB datatypes should be isolated from regular data to use a dedicated Extract, Data Pump, and Replicat process group to enhance throughput. Also ensure that the target table has a primary key to avoid Full Table Scans (FTS), an Oracle GoldenGate best practice. LOB INSERT operations can insert an empty (null) LOB into a row before updating it with the data. This is because a LOB (depending on its size) can spread its data across multiple Logical Change Records, resulting in multiple DML operations required at the target database. Base lining Before we can start tuning, we must record our baseline. This will provide a reference point to tune from. We can later look back at our baseline and calculate the percentage improvement made from deploying new configurations. An ideal baseline is to find the "breaking point" of your application requirements. For example, the following questions must be answered: What is the maximum acceptable end to end latency? What are the maximum application transactions per second we must accommodate? To answer these questions we must start with a single threaded data replication configuration having just one Extract, one Data Pump, and one Replicat process. This will provide us with a worst case scenario in which to build improvements on. Ideally, our data source should be the application itself, inserting, deleting, and updating "real data" in the source database. However, simulated data with the ability to provide throughput profiles will allow us to gauge performance accurately Application vendors can normally provide SQL injector utilities that simulate the user activity on the system. Balancing the load across parallel process groups The GoldenGate documentation states "The most basic thing you can do to improve GoldenGate's performance is to divide a large number of tables among parallel processes and trails. For example, you can divide the load by schema".This statement is true as the bottleneck is largely due to the serial nature of the Replicat process, having to "replay" transactions in commit order. Although this can be a constraining factor due to transaction dependency, increasing the number of Replicat processes increases performance significantly. However, it is highly recommended to group tables with referential constraints together per Replicat. The number of parallel processes is typically greater on the target system compared to the source. The number and ratio of processes will vary across applications and environments. Each configuration should be thoroughly tested to determine the optimal balance, but be careful not to over allocate, as each parallel process will consume up to 55MB. Increasing the number of processes to an arbitrary value will not necessarily improve performance, in fact it may be worse and you will waste CPU and memory resources. The following data flow diagram shows a load balancing configuration including two Extract processes, three Data Pump, and five Replicats: Considerations for using parallel process groups To maintain data integrity, ensure to include tables with referential constraints between one another in the same parallel process group. It's also worth considering disabling referential constraints on the target database schema to allow child records to be populated before their parents, thus increasing throughput. GoldenGate will always commit transactions in the same order as the source, so data integrity is maintained. Oracle best practice states no more than 3 Replicat processes should read the same remote trail file. To avoid contention on Trail files, pair each Replicat with its own Trail files and Extract process. Also, remember that it is easier to tune an Extract process than a Replicat process, so concentrate on your source before moving your focus to the target. Splitting large tables into row ranges across process groups What if you have some large tables with a high data change rate within a source schema and you cannot logically separate them from the remaining tables due to referential constraints? GoldenGate provides a solution to this problem by "splitting" the data within the same schema via the @RANGE function. The @RANGE function can be used in the Data Pump and Replicat configuration to "split" the transaction data across a number of parallel processes. The Replicat process is typically the source of performance bottlenecks because, in its normal mode of operation, it is a single-threaded process that applies operations one at a time by using regular DML. Therefore, to leverage parallel operation and enhance throughput, the more Replicats the better (dependant on the number of CPUs and memory available on the target system). The RANGE function The way the @RANGE function works is it computes a hash value of the columns specified in the input. If no columns are specified, it uses the table's primary key. GoldenGate adjusts the total number of ranges to optimize the even distribution across the number of ranges specified. This concept can be compared to Hash Partitioning in Oracle tables as a means of dividing data. With any division of data during replication, the integrity is paramount and will have an effect on performance. Therefore, tables having a relationship with other tables in the source schema must be included in the configuration. If all your source schema tables are related, you must include all the tables! Adding Replicats with @RANGE function The @RANGE function accepts two numeric arguments, separated by a comma: Range: The number assigned to a process group, where the first is 1 and the second 2 and so on, up to the total number of ranges. Total number of ranges: The total number of process groups you wish to divide using the @RANGE function. The following example includes three related tables in the source schema and walks through the complete configuration from start to finish. For this example, we have an existing Replicat process on the target machine (dbserver2) named ROLAP01 that includes the following three tables: ORDERS ORDER_ITEMS PRODUCTS We are going to divide the rows of the tables across two Replicat groups. The source database schema name is SRC and target schema TGT. The following steps add a new Replicat named ROLAP02 with the relevant configuration and adjusts Replicat ROLAP01 parameters to suit. Note that before conducting any changes stop the existing Replicat processes and determine their Relative Byte Address (RBA) and Trail file log sequence number. This is important information that we will use to tell the new Replicat process from which point to start. First check if the existing Replicat process is running: GGSCI (dbserver2) 1> info all Program Status Group Lag Time Since Chkpt MANAGER RUNNING REPLICAT RUNNING ROLAP01 00:00:00 00:00:02 Stop the existing Replicat process: GGSCI (dbserver2) 2> stop REPLICAT ROLAP01 Sending STOP request to REPLICAT ROLAP01... Request processed. Add the new Replicat process, using the existing trail file. GGSCI (dbserver2) 3> add REPLICAT ROLAP02, exttrail ./dirdat/tb REPLICAT added. Now add the configuration by creating a new parameter file for ROLAP02. GGSCI (dbserver2) 4> edit params ROLAP02 -- -- Example Replicator parameter file to apply changes -- to target tables -- REPLICAT ROLAP02 SOURCEDEFS ./dirdef/mydefs.def SETENV (ORACLE_SID= OLAP) USERID ggs_admin, PASSWORD ggs_admin DISCARDFILE ./dirrpt/rolap02.dsc, PURGE ALLOWDUPTARGETMAP CHECKPOINTSECS 30 GROUPTRANSOPS 2000 MAP SRC.ORDERS, TARGET TGT.ORDERS, FILTER (@RANGE (1,2)); MAP SRC.ORDER_ITEMS, TARGET TGT.ORDER_ITEMS, FILTER (@RANGE (1,2)); MAP SRC.PRODUCTS, TARGET TGT.PRODUCTS, FILTER (@RANGE (1,2)); Now edit the configuration of the existing Replicat process, and add the @RANGE function to the FILTER clause of the MAP statement. Note the inclusion of the GROUPTRANSOPS parameter to enhance performance by increasing the number of operations allowed in a Replicat transaction. GGSCI (dbserver2) 5> edit params ROLAP01 -- -- Example Replicator parameter file to apply changes -- to target tables -- REPLICAT ROLAP01 SOURCEDEFS ./dirdef/mydefs.def SETENV (ORACLE_SID=OLAP) USERID ggs_admin, PASSWORD ggs_admin DISCARDFILE ./dirrpt/rolap01.dsc, PURGE ALLOWDUPTARGETMAP CHECKPOINTSECS 30 GROUPTRANSOPS 2000 MAP SRC.ORDERS, TARGET TGT.ORDERS, FILTER (@RANGE (2,2)); MAP SRC.ORDER_ITEMS, TARGET TGT.ORDER_ITEMS, FILTER (@RANGE (2,2)); MAP SRC.PRODUCTS, TARGET TGT.PRODUCTS, FILTER (@RANGE (2,2)); Check that both the Replicat processes exist. GGSCI (dbserver2) 6> info all Program Status Group Lag Time Since Chkpt MANAGER RUNNING REPLICAT STOPPED ROLAP01 00:00:00 00:10:35 REPLICAT STOPPED ROLAP02 00:00:00 00:12:25 Before starting both Replicat processes, obtain the log Sequence Number (SEQNO) and Relative Byte Address (RBA) from the original trail file. GGSCI (dbserver2) 7> info REPLICAT ROLAP01, detail REPLICAT ROLAP01 Last Started 2010-04-01 15:35 Status STOPPED Checkpoint Lag 00:00:00 (updated 00:12:43 ago) Log Read Checkpoint File ./dirdat/tb000279 <- SEQNO 2010-04-08 12:27:00.001016 RBA 43750979 <- RBA Extract Source Begin End ./dirdat/tb000279 2010-04-01 12:47 2010-04-08 12:27 ./dirdat/tb000257 2010-04-01 04:30 2010-04-01 12:47 ./dirdat/tb000255 2010-03-30 13:50 2010-04-01 04:30 ./dirdat/tb000206 2010-03-30 13:50 First Record ./dirdat/tb000206 2010-03-30 04:30 2010-03-30 13:50 ./dirdat/tb000184 2010-03-30 04:30 First Record ./dirdat/tb000184 2010-03-30 00:00 2010-03-30 04:30 ./dirdat/tb000000 *Initialized* 2010-03-30 00:00 ./dirdat/tb000000 *Initialized* First Record Adjust the new Replicat process ROLAP02 to adopt these values, so that the process knows where to start from on startup. GGSCI (dbserver2) 8> alter replicat ROLAP02, extseqno 279 REPLICAT altered. GGSCI (dbserver2) 9> alter replicat ROLAP02, extrba 43750979 REPLICAT altered. Failure to complete this step will result in either duplicate data or ORA-00001 against the target schema, because GoldenGate will attempt to replicate the data from the beginning of the initial trail file (./dirdat/tb000000) if it exists, else the process will abend. Start both Replicat processes. Note the use of the wildcard (*). GGSCI (dbserver2) 10> start replicat ROLAP* Sending START request to MANAGER ... REPLICAT ROLAP01 starting Sending START request to MANAGER ... REPLICAT ROLAP02 starting Check if both Replicat processes are running. GGSCI (dbserver2) 11> info all Program Status Group Lag Time Since Chkpt MANAGER RUNNING REPLICAT RUNNING ROLAP01 00:00:00 00:00:22 REPLICAT RUNNING ROLAP02 00:00:00 00:00:14 Check the detail of the new Replicat processes. GGSCI (dbserver2) 12> info REPLICAT ROLAP02, detail REPLICAT ROLAP02 Last Started 2010-04-08 14:18 Status RUNNING Checkpoint Lag 00:00:00 (updated 00:00:06 ago) Log Read Checkpoint File ./dirdat/tb000279 First Record RBA 43750979 Extract Source Begin End ./dirdat/tb000279 * Initialized * First Record ./dirdat/tb000279 * Initialized * First Record ./dirdat/tb000279 * Initialized * 2010-04-08 12:26 ./dirdat/tb000279 * Initialized * First Record Generate a report for the new Replicat process ROLAP02. GGSCI (dbserver2) 13> send REPLICAT ROLAP02, report Sending REPORT request to REPLICAT ROLAP02 ... Request processed. Now view the report to confirm the new Replicat process has started from the specified start point. (RBA 43750979 and SEQNO 279). The following is an extract from the report: GGSCI (dbserver2) 14> view report ROLAP02 2010-04-08 14:20:18 GGS INFO 379 Positioning with begin time: Apr 08, 2010 14:18:19 PM, starting record time: Apr 08, 2010 14:17:25 PM at extseqno 279, extrba 43750979.  
Read more
  • 0
  • 0
  • 6095

article-image-oracle-goldengate-considerations-designing-solution
Packt
24 Feb 2011
8 min read
Save for later

Oracle GoldenGate: Considerations for Designing a Solution

Packt
24 Feb 2011
8 min read
  Oracle GoldenGate 11g Implementer's guide Design, install, and configure high-performance data replication solutions using Oracle GoldenGate The very first book on GoldenGate, focused on design and performance tuning in enterprise-wide environments Exhaustive coverage and analysis of all aspects of the GoldenGate software implementation, including design, installation, and advanced configuration Migrate your data replication solution from Oracle Streams to GoldenGate Design a GoldenGate solution that meets all the functional and non-functional requirements of your system Written in a simple illustrative manner, providing step-by-step guidance with discussion points Goes way beyond the manual, appealing to Solution Architects, System Administrators and Database Administrators          At a high level, the design must include the following generic requirements: Hardware Software Network Storage Performance All the above must be factored into the overall system architecture. So let's take a look at some of the options and the key design issues. Replication methods So you have a fast reliable network between your source and target sites. You also have a schema design that is scalable and logically split. You now need to choose the replication architecture; One to One, One to Many, active-active, active-passive, and so on. This consideration may already be answered for you by the sheer fact of what the system has to achieve. Let's take a look at some configuration options. Active-active Let's assume a multi-national computer hardware company has an office in London and New York. Data entry clerks are employed at both sites inputting orders into an Order Management System. There is also a procurement department that updates the system inventory with volumes of stock and new products related to a US or European market. European countries are managed by London, and the US States are managed by New York. A requirement exists where the underlying database systems must be kept in synchronisation. Should one of the systems fail, London users can connect to New York and vice-versa allowing business to continue and orders to be taken. Oracle GoldenGate's active-active architecture provides the best solution to this requirement, ensuring that the database systems on both sides of the pond are kept synchronised in case of failure. Another feature the active-active configuration has to offer is the ability to load balance operations. Rather than have effectively a DR site in both locations, the European users could be allowed access to New York and London systems and viceversa. Should a site fail, then the DR solution could be quickly implemented. Active-passive The active-passive bi-directional configuration replicates data from an active primary database to a full replica database. Sticking with the earlier example, the business would need to decide which site is the primary where all users connect. For example, in the event of a failure in London, the application could be configured to failover to New York. Depending on the failure scenario, another option is to start up the passive configuration, effectively turning the active-passive configuration into active-active. Cascading The Cascading GoldenGate topology offers a number of "drop-off" points that are intermediate targets being populated from a single source. The question here is "what data do I drop at which site?" Once this question has been answered by the business, it is then a case of configuring filters in Replicat parameter files allowing just the selected data to be replicated. All of the data is passed on to the next target where it is filtered and applied again. This type of configuration lends itself to a head office system updating its satellite office systems in a round robin fashion. In this case, only the relevant data is replicated at each target site. Another design, is the Hub and Spoke solution, where all target sites are updated simultaneously. This is a typical head office topology, but additional configuration and resources would be required at the source site to ship the data in a timely manner. The CPU, network, and file storage requirements must be sufficient to accommodate and send the data to multiple targets. Physical Standby A Physical Standby database is a robust Oracle DR solution managed by the Oracle Data Guard product. The Physical Standby database is essentially a mirror copy of its Primary, which lends itself perfectly for failover scenarios. However , it is not easy to replicate data from the Physical Standby database, because it does not generate any of its own redo. That said, it is possible to configure GoldenGate to read the archived standby logs in Archive Log Only (ALO) mode. Despite being potentially slower, it may be prudent to feed a downstream system on the DR site using this mechanism, rather than having two data streams configured from the Primary database. This reduces network bandwidth utilization, as shown in the following diagram: Reducing network traffic is particularly important when there is considerable distance between the primary and the DR site. Networking The network should not be taken for granted. It is a fundamental component in data replication and must be considered in the design process. Not only must it be fast, it must be reliable. In the following paragraphs, we look at ways to make our network resilient to faults and subsequent outages, in an effort to maintain zero downtime. Surviving network outages Probably one of your biggest fears in a replication environment is network failure. Should the network fail, the source trail will fill as the transactions continue on the source database, ultimately filling the filesystem to 100% utilization, causing the Extract process to abend. Depending on the length of the outage, data in the database's redologs may be overwritten causing you the additional task of configuring GoldenGate to extract data from the database's archived logs. This is not ideal as you already have the backlog of data in the trail files to ship to the target site once the network is restored. Therefore, ensure there is sufficient disk space available to accommodate data for the longest network outage during the busiest period. Disks are relatively cheap nowadays. Providing ample space for your trail files will help to reduce the recovery time from the network outage. Redundant networks One of the key components in your GoldenGate implementation is the network. Without the ability to transfer data from the source to the target, it is rendered useless. So, you not only need a fast network but one that will always be available. This is where redundant networks come into play, offering speed and reliability. NIC teaming One method of achieving redundancy is Network Interface Card (NIC) teaming or bonding. Here two or more Ethernet ports can be "coupled" to form a bonded network supporting one IP address. The main goal of NIC teaming is to use two or more Ethernet ports connected to two or more different access network switches thus avoiding a single point of failure. The following diagram illustrates the redundant features of NIC teaming: Linux (OEL/RHEL 4 and above) supports NIC teaming with no additional software requirements. It is purely a matter of network configuration stored in text files in the /etc/sysconfig/network-scripts directory. The following steps show how to configure a server for NIC teaming: First, you need to log on as root user and create a bond0 config file using the vi text editor. # vi /etc/sysconfig/network-scripts/ifcfg-bond0 Append the following lines to it, replacing the IP address with your actual IP address, then save file and exit to shell prompt: DEVICE=bond0 IPADDR=192.168.1.20 NETWORK=192.168.1.0 NETMASK=255.255.255.0 USERCTL=no BOOTPROTO=none ONBOOT=yes Choose the Ethernet ports you wish to bond, and then open both configurations in turn using the vi text editor, replacing ethn with the respective port number. # vi /etc/sysconfig/network-scripts/ifcfg-eth2 # vi /etc/sysconfig/network-scripts/ifcfg-eth4 Modify the configuration as follows: DEVICE=ethn USERCTL=no ONBOOT=yes MASTER=bond0 SLAVE=yes BOOTPROTO=none Save the files and exit to shell prompt. To make sure the bonding module is loaded when the bonding interface (bond0) is brought up, you need to modify the kernel modules configuration file: # vi /etc/modprobe.conf Append the following two lines to the file: alias bond0 bonding options bond0 mode=balance-alb miimon=100 Finally, load the bonding module and restart the network services: # modprobe bonding # service network restart You now have a bonded network that will load balance when both physical networks are available, providing additional bandwidth and enhanced performance. Should one network fail, the available bandwidth will be halved, but the network will still be available. Non-functional requirements (NFRs) Irrespective of the functional requirements, the design must also include the nonfunctional requirements (NFR) in order to achieve the overall goal of delivering a robust, high performance, and stable system. Latency One of the main NFRs is performance. How long does it take to replicate a transaction from the source database to the target? This is known as end-to-end latency that typically has a threshold that must not be breeched in order to satisfy the specified NFR. GoldenGate refers to latency as lag, which can be measured at different intervals in the replication process. These are: Source to Extract: The time taken for a record to be processed by the Extract compared to the commit timestamp on the database Replicat to Target: The time taken for the last record to be processed by the Replicat compared to the record creation time in the trail file A well designed system may encounter spikes in latency but it should never be continuous or growing. Trying to tune GoldenGate when the design is poor is a difficult situation to be in. For the system to perform well you may need to revisit the design.  
Read more
  • 0
  • 0
  • 4669

article-image-oracle-goldengate-11g-configuration-high-availability
Packt
23 Feb 2011
10 min read
Save for later

Oracle GoldenGate 11g: Configuration for High Availability

Packt
23 Feb 2011
10 min read
  Oracle GoldenGate 11g Implementer's guide Design, install, and configure high-performance data replication solutions using Oracle GoldenGate The very first book on GoldenGate, focused on design and performance tuning in enterprise-wide environments Exhaustive coverage and analysis of all aspects of the GoldenGate software implementation, including design, installation, and advanced configuration Migrate your data replication solution from Oracle Streams to GoldenGate Design a GoldenGate solution that meets all the functional and non-functional requirements of your system Written in a simple illustrative manner, providing step-by-step guidance with discussion points Goes way beyond the manual, appealing to Solution Architects, System Administrators and Database Administrators       This includes the following discussion points: Shared storage options Configuring clusterware for GoldenGate GoldenGate on Exadata Failover We also touch upon the new features available in Oracle 11g Release 2, including the Database Machine, that provides a "HA solution in a box". GoldenGate on RAC A number of architectural options are available to Oracle RAC, particularly surrounding storage. Since Oracle 11g Release 2, these options have grown, making it possible to configure the whole RAC environment using Oracle software, whereas in earlier versions, third party clusterware and storage solutions had to be used. Let's start by looking at the importance of shared storage. Shared storage The secret to RAC is "share everything" and this also applies to GoldenGate. RAC relies on shared storage in order to support a single database having multiple instances, residing on individual nodes. Therefore, as a minimum the GoldenGate checkpoint and trail files must be on the shared storage so all Oracle instances can "see" them. Should a node fail, a surviving node can "take the reins" and continue the data replication without interruption. Since Oracle 11g Release 2, in addition to ASM, the shared storage can be an ACFS or a DBFS. Automatic Storage Management Cluster File System (ACFS) ACFS is Oracle's multi-platform, scalable file system, and storage management technology that extends ASM functionality to support files maintained outside of the Oracle Database. This lends itself perfectly to supporting the required GoldenGate files. However, any Oracle files that could be stored in regular ASM diskgroups are not supported by ACFS. This includes the OCR and Voting files that are fundamental to RAC. Database File System (DBFS) Another Oracle solution to the shared filesystem is DBFS, which creates a standard file system interface on top of files and directories that are actually stored as SecureFile LOBs in database tables. DBFS is similar to Network File System (NFS) in that it provides a shared network file system that "looks like" a local file system. On Linux, you need a DBFS client that has a mount interface that utilizes the Filesystem in User Space (FUSE) kernel module, providing a file-system mount point to access the files stored in the database. This mechanism is also ideal for sharing GoldenGate files among the RAC nodes. It also supports the Oracle Cluster Registry (OCR) and Voting files, plus Oracle homes. DBFS requires an Oracle Database 11gR2 (or higher) database. You can use DBFS to store GoldenGate recovery related files for lower releases of the Oracle Database, but you will need to create a separate Oracle Database 11gR2 (or higher) database to host the file system. Configuring Clusterware for GoldenGate Oracle Clusterware will ensure that GoldenGate can tolerate server failures by moving processing to another available server in the cluster. It can support the management of a third party application in a clustered environment. This capability will be used to register and relocate the GoldenGate Manager process. Once the GoldenGate software has been installed across the cluster and a script to start, check, and stop GoldenGate has been written and placed on the shared storage (so it is accessible to all nodes), the GoldenGate Manager process can be registered in the cluster. Clusterware commands can then be used to create, register and set privileges on the virtual IP address (VIP) and the GoldenGate application using standard Oracle Clusterware commands. The Virtual IP The VIP is a key component of Oracle Clusterware that can dynamically relocate the IP address to another server in the cluster, allowing connections to failover to a surviving node. The VIP provides faster failovers compared to the TCP/IP timeout based failovers on a server's actual IP address. On Linux this can take up to 30 minutes using the default kernel settings! The prerequisites are as follows: The VIP must be a fixed IP address on the public subnet. The interconnect must use a private non-routable IP address, ideally over Gigabit Ethernet. Use a VIP to access the GoldenGate Manager process to isolate access to the Manager process from the physical server. Remote data pump processes must also be configured to use the VIP to contact the GoldenGate Manager. The following diagram illustrates the RAC architecture for 2 nodes (rac1 and rac2) supporting 2 Oracle instances (oltp1 and oltp2). The VIPs are 11.12.1.6 and 11.12.1.8 respectively, in this example: The user community or application servers connect to either instance via the VIP and a load balancing database service, that has been configured on the database and in the client's SQL*Net tnsnames.ora file or JDBC connect string. The following example shows a typical tnsnames entry for a load balancing service. Load balancing is the default and does not need to be explicitly configured. Hostnames can replace the IP addresses in the tnsnames.ora file as long as they are mapped to the relevant VIP in the client's system hosts file. OLTP = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 11.12.1.6)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = 11.12.1.8)(PORT = 1521)) ) (CONNECT_DATA = (SERVICE_NAME = oltp) ) ) This is the recommended approach for scalability and performance and is known as active-active. Another HA solution is the active-passive configuration, where users connect to one instance only leaving the passive instance available for node failover. The term active-active or active-passive in this context relates to 2-node RAC environments and is not to be confused with the GoldenGate topology of the same name. On Linux systems, the database server hostname will typically have the following format in the /etc/hosts file. For Public VIP: <hostname>-vip For Private Interconnect: <hostname>-pri The following is an example hosts file for a RAC node: 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 #Virtual IP Public Address 11.12.1.6 rac1-vip rac1-vip 11.12.1.8 rac2-vip rac2-vip #Private Address 192.168.1.33 rac1-pri rac1-pri 192.168.1.34 rac2-pri rac2-pri Creating a GoldenGate application The following steps guide you through the process of configuring GoldenGate on RAC. This example is for an Oracle 11g Release 1 RAC environment: Install GoldenGate as the Oracle user on each node in the cluster or on a shared mount point that is visible from all nodes. If installing the GoldenGate home on each node, ensure the checkpoint and trails files are on the shared filesystem. Ensure the GoldenGate Manager process is configured to use the AUTOSTART and AUTORESTART parameters, allowing GoldenGate to start the Extract and Replicat processes as soon as the Manager starts. Configure a VIP for the GoldenGate application as the Oracle user from 1 node. <CLUSTERWARE_HOME>/bin/crs_profile -create ggsvip -t application -a <CLUSTERWARE_HOME>/bin/usrvip -o oi=bond1,ov=11.12.1.6,on=255.255.255.0 CLUSTERWARE_HOME is the oracle home in which Oracle Clusterware is installed. E.g. /u01/app/oracle/product/11.1.0/crs ggsvip is the name of the application VIP that you will create. oi=bond1 is the public interface in this example. ov=11.12.1.6 is the virtual IP address in this example. on=255.255.255.0 is the subnet mask. This should be the same subnet mask for the public IP address. Next, register the VIP in the Oracle Cluster Registry (OCR) as the Oracle user. <CLUSTERWARE_HOME>/bin/crs_register ggsvip Set the ownership of the VIP to the root user who assigns the IP address. Execute the following command as the root user: <CLUSTERWARE_HOME>/bin/crs_setperm ggsvip -o root Set read and execute permissions for the Oracle user. Execute the following command as the root user: <CLUSTERWARE_HOME>/bin/crs_setperm ggsvip -u user:oracle:r-x As the Oracle user, start the VIP. <CLUSTERWARE_HOME>/bin/crs_start ggsvip To verify the the VIP is running, execute the following command then ping the IP address from a different node in the cluster. <CLUSTERWARE_HOME>/bin/crs_stat ggsvip -t Name Type Target State Host ------ ------- ------ ------- ------ ggsvip application ONLINE ONLINE rac1 ping -c3 11.12.1.6 64 bytes from 11.12.1.6: icmp_seq=1 ttl=64 time=0.096 ms 64 bytes from 11.12.1.6: icmp_seq=2 ttl=64 time=0.122 ms 64 bytes from 11.12.1.6: icmp_seq=3 ttl=64 time=0.141 ms --- 11.12.1.6 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.082/0.114/0.144/0.025 ms Oracle Clusterware supports the use of "Action" scripts within its configuration, allowing bespoke scripts to be executed automatically during failover. Create a Linux shell script named ggs_action.sh that accepts 3 arguments: start, stop or check. Place the script in the <CLUSTERWARE_HOME>/crs/public directory on each node or if you have installed GoldenGate on a shared mount point, copy it there. Ensure that start and stop: returns 0 if successful, 1 if unsuccessful. check: returns 0 if GoldenGate is running, 1 if it is not running. As the Oracle user, make sure the script is executable. chmod 754 ggs_action.sh To check the GoldenGate processes are running, ensure the action script has the following commands. The following example can be expanded to include checks for Extract and Replicat processes: First check the Linux process ID (PID) the GoldenGate Manager process is configured to use. GGS_HOME=/mnt/oracle/ggs # Oracle GoldenGate home pid=`cut -f8 ${GGS_HOME}/dirpcs/MGR.pcm` Then, compare this value (in variable $pid) with the actual PID the Manager process is using. The following example will return the correct PID of the Manager process if it is running. ps -e |grep ${pid} |grep mgr |cut -d " " -f2 The code to start and stop a GoldenGate process is simply a call to ggsci. ggsci_command=$1 ggsci_output=`${GGS_HOME}/ggsci << EOF ${ggsci_command} exit EOF` Create a profile for the GoldenGate application as the Oracle user from 1 node. <CLUSTERWARE_HOME>/bin/crs_profile -create goldengate_app -t application -r ggsvip -a <CLUSTERWARE_HOME>/crs/public/ggs_action.sh -o ci=10 CLUSTERWARE_HOME is the Oracle home in which Oracle Clusterware is installed. For example: /u01/app/oracle/product/11.1.0/crs -create goldengate_app the application name is goldengate_app. -r specifies the required resources that must be running for the application to start. In this example, the dependency is the VIP ggsvip must be running before Oracle GoldenGate starts. -a specifies the action script. For example: <CLUSTERWARE_HOME>/crs/public/ggs_action.sh -o specifies options. In this example the only option is the Check Interval which is set to 10 seconds. Next, register the application in the Oracle Cluster Registry (OCR) as the oracle user. <CLUSTERWARE_HOME>/bin/crs_register goldengate_app Now start the Goldengate application as the Oracle user. <CLUSTERWARE_HOME>/bin/crs_start goldengate_app Check that the application is running. <CLUSTERWARE_HOME>/bin/crs_stat goldengate_app -t Name Type Target State Host ------ ------ -------- ----- ---- goldengate_app application ONLINE ONLINE rac1 You can also stop GoldenGate from Oracle Clusterware by executing the following command as the oracle user: CLUSTERWARE_HOME/bin/crs_stop goldengate_app Oracle has published a White Paper on "Oracle GoldenGate high availability with Oracle Clusterware". To view the Action script mentioned in this article, refer to the document, which can be downloaded in PDF format from the Oracle Website at the following URL: http://www.oracle.com/technetwork/middleware/goldengate/overview/ha-goldengate-whitepaper-128197.pdf  
Read more
  • 0
  • 0
  • 5527

article-image-openam-oracle-dsee-and-multiple-data-stores
Packt
03 Feb 2011
6 min read
Save for later

OpenAM: Oracle DSEE and Multiple Data Stores

Packt
03 Feb 2011
6 min read
  OpenAM Written and tested with OpenAM Snapshot 9—the Single Sign-On (SSO) tool for securing your web applications in a fast and easy way The first and the only book that focuses on implementing Single Sign-On using OpenAM Learn how to use OpenAM quickly and efficiently to protect your web applications with the help of this easy-to-grasp guide Written by Indira Thangasamy, core team member of the OpenSSO project from which OpenAM is derived Real-world examples for integrating OpenAM with various applications Oracle Directory Server Enterprise Edition The Oracle Directory Server Enterprise Edition type of identity store is predominantly used by customers and natively supported by the OpenSSO server. It is the only data store where all the user management features offered by OpenSSO is supported with no exceptions. In the console it is labeled as Sun DS with OpenSSO schema. After Oracle acquired Sun Microsystems Inc., the brand name for the Sun Directory Server Enterprise Edition has been changed to Oracle Directory Server Enterprise Edition (ODSEE). You need to have the ODSEE configured prior to creating the data store either from the console or CLI as shown in the following section. Creating a data store for Oracle DSEE Creating a data store from the OpenSSO console is the easiest way to achieve this task. However, if you want to automate this process in a repeatable fashion, then using the ssoadm tool is the right choice. However, the problem here is to obtain the list of the attributes and their corresponding values. It is not documented anywhere in the publicly available documentation for OpenSSO. Let me show you the easy way out of this by providing the required options and their values for the ssoadm: ./ssoadm create-datastore -m odsee_datastore -t LDAPv3ForAMDS -u amadmin-f /tmp/.passwd_of_amadmin -D data_store_odsee.txt -e / This command will create the data store that talks to an Oracle DSEE store. The key options in the preceding command are LDAPv3ForAMDS (instructing the server to create a datastore of type ODSEE) and the properties that have been included in the data_ store_odsee.txt. You can find the complete contents of this file as part of the code bundle provided by Packt Publishers. Some excerpts from the file are given as follows: sun-idrepo-ldapv3-config-ldap-server=odsee.packt-services. net:4355 sun-idrepo-ldapv3-config-authid=cn=directory manager sun-idrepo-ldapv3-config-authpw=dssecret12 com.iplanet.am.ldap.connection.delay.between.retries=1000 sun-idrepo-ldapv3-config-auth-naming-attr=uid <contents removed to save paper space > sun-idrepo-ldapv3-config-users-search-attribute=uid sun-idrepo-ldapv3-config-users-search-filter=(objectclass= inetorgperson) sunIdRepoAttributeMapping= sunIdRepoClass=com.sun.identity.idm.plugins.ldapv3.LDAPv3Repo sunIdRepoSupportedOperations=filteredrole=read,create,edit, delete sunIdRepoSupportedOperations=group=read,create,edit,delete sunIdRepoSupportedOperations=realm=read,create,edit,delete, service sunIdRepoSupportedOperations=role=read,create,edit,delete sunIdRepoSupportedOperations=user=read,create,edit,delete, service Among these properties, the first three provide critical information for the whole thing to work. As it is evident from the name of the property they denote the ODSEE server name, port, bind DN, and the password. The password is in plain text so you need to remove the password from the input file after creating the data store, for security reasons. Updating the data store Like we discussed in the previous section, one can invoke the update-datastore sub command with the appropriate properties and its values along with the -a switch to the ssoadm tool. If you have more properties to be updated, then put them in a text file and use it with the -D option like the create-datastore sub command. Deleting the data store A data store can be deleted by just selecting it's specific name from the console. There will be no warnings issued while deleting the data stores, and you could eventually delete all of the data stores in a realm. Be cautious about this behavior, you might end up deleting all of them unintentionally. Deleting data store does not remove any existing data in the underlying LDAP server, it only removes the configuration from the OpenSSO server: ./ssoadm delete-datastores -e / -m odsee_datastore -f /tmp/ .passwd_of_amadmin -u amadmin Data store for OpenDS OpenDS is one of the popular LDAP servers that is completely written in Java and available freely under open source license. As a matter of fact the embedded configuration store that is built in the OpenSSO server is the embedded version of OpenDS. It has been fully tested with the OpenDS standalone version for the identity store usage. The data store creation and management are pretty much similar to the steps described in the foregoing section, except the type of store and the corresponding properties' values. The properties and their values are given in the file data_store_opends.txt (available as part of code bundle). Invoke the ssoadm tool with this property file after making appropriate changes to fit to your deployment. Here is the sample command that creates the datastore for OpenDS: ./ssoadm create-datastore -u amadmin -f /tmp/.passwd_of_amadmin -e / -m "OpenDS-store" -t LDAPv3ForOpenDS -D data_store_opends.txt Data store for Tivoli DS The IBM Tivoli Directory Server 6.2 is one of the supported LDAP servers for the OpenSSO server to provide authentication and authorization services. A specific sub configuration LDAPv3ForTivoli is available out of the box to support this server. You can find the data_store_tivoli.txt to create a new data store by supplying the -t LDAPv3ForTivoli option to the ssoadm tool. Here is the sample command that creates the datastore for Tivoli DS. The sample (data_store_tivoli. txt can be found as part of the code bundle) file contains the entries including the default group, that are shipped with the Tivoli DS for easy understanding. You can customize it to any valid values: ./ssoadm create-datastore -u amadmin -f /tmp/.passwd_of_amadmin -e / -m "Tivoli-store" -t LDAPv3ForTivoli -D data_store_tivoli.txt Data store for Active Directory Microsoft Active Directory provides most of the LDAPv3 features including support for persistent search notifications. Creating a data store for this is also a straightforward process and is available out-of-the-box. You can find the data_ store_ad.txt to create a new data store by supplying the -t LDAPv3ForAD option to the ssoadm tool. Here is the sample command that creates the datastore for AD: ./ssoadm create-datastore -u amadmin -f /tmp/.passwd_of_amadmin -e / -m "AD-store" -t LDAPv3ForAD -D data_store_ad.txt . Data store for Active Directory Application Mode Microsoft Active Directory Application Mode (ADAM) is the lightweight version of the Active directory with simplified schema. In the ADAM instance it is possible to set user password over LDAP unlike Active Directory where password-related operations must happen over LDAPS. Creating a data store for this is also a straightforward process and is available out-of-the-box. You can find the data_store_adam. txt to create a new data store by supplying the -t LDAPv3ForADAM option to the ssoadm tool: ./ssoadm create-datastore -u amadmin -f /tmp/.passwd_of_amadmin -e / -m "ADAM-store" -t LDAPv3ForADAM -D data_store_adam.txt
Read more
  • 0
  • 0
  • 3034
article-image-openam-backup-recovery-and-logging
Packt
03 Feb 2011
9 min read
Save for later

OpenAM: Backup, Recovery, and Logging

Packt
03 Feb 2011
9 min read
  OpenAM Written and tested with OpenAM Snapshot 9—the Single Sign-On (SSO) tool for securing your web applications in a fast and easy way The first and the only book that focuses on implementing Single Sign-On using OpenAM Learn how to use OpenAM quickly and efficiently to protect your web applications with the help of this easy-to-grasp guide Written by Indira Thangasamy, core team member of the OpenSSO project from which OpenAM is derived Real-world examples for integrating OpenAM with various applications OpenSSO provides utilities that can be invoked with proper procedure to backup the server configuration data. When a crash or data corruption occurs, the server administrator must initiate a recovery operation. Recovering a backup involves the following two distinct operations: Restoring the configuration files Restoring the XML configuration data that was backed up earlier In general, recovery refers to the various operations involved in restoring, rolling forward, and rolling back a backup. Backup and recovery refers to the various strategies and operations involved in protecting the configuration database against data loss, and reconstructing the database should a loss occur. In this article, you should be able to learn about how to use the tools provided by OpenSSO to perform the following: OpenSSO configuration backup and recovery Test to production Trace and debug level logging for troubleshooting Audit log configuration using flat file Audit log configuration using RDBMS Securing the audit logs from intrusion OpenSSO deals with only backing up the configuration data of the server as the identity such as users, groups, and roles data backup and recovery will be handled by the enterprise level identity management suite of products. Backing up configuration data I am sure you are familiar now with the different configuration stores supported by the OpenSSO, an embedded store (based on OpenDS), and a highly scalable Directory Server Enterprise Edition. Regardless of the underlying configuration type, there are certain files that are created in the local file system on the host where the server is deployed. These files (such as bootstrap file) contain critical pieces of information that helps the application to initialize, any corruption in these files could cause the server application not to start. Hence it becomes necessary to backup the configuration data stored in the file system. As a result, we could term the backup and recovery as a two step process: Backup the configuration files in the local file system Backup the OpenSSO configuration data in the LDAP configuration Let us briefly discuss what each option means and when to apply them. Backing up the OpenSSO configuration files Typically the server can fail to come up for two reasons. Either because it could not find the right configuration file that will locate the configuration data store or because the data contained in the configuration store is corrupted. It is the simplest case I took up for this discussion because there could be umpteen reasons that could cause a server to fail to start up. OpenSSO provides a subcommand export-svc-cfg as part of the ssoadm command line interface. Using this the customer can only backup the configuration data that is stored in the configuration store, provided the configuration store is up and running. This backup will not help if the disk that holds the configuration files such as the bootstrap and OpenDS schema files crashed, because the backup obtained using the export-svc-cfg will not contain these schema and bootstrap files. This jeopardizes the backup and recovery process for the OpenSSO. This is why backing up the configuration directory becomes inevitable and vital for the recovery process. To backup the OpenSSO configuration directory, you can just use any file archive tool of your choice. To perform this backup, log on to the OpenSSO host as the OpenSSO configuration user, and execute the following command: zip -r /tmp/opensso_config.zip /export/ssouser/opensso1/ This will backup all the contents of the configuration directory (in this case /export/ssouser/opensso1). Though this will be the recommended way to backup, there may be server audit and debug logs that could fill up the disk space as you perform a periodic backup of this directory. The critical files and directories that need to be backed up are as follows: bootstrap opends (whole directory if present) .version .configParam certificate stores The rest of the content of this directory can be restored from your staging area. If you have customized any of the service schema under the <opensso-config>/config/xml directory, then make sure you back them up. This backup is in itself enough to bring up the corrupted OpenSSO server. When you backup the opends directory all the OpenSSO configuration information will also get backed up, so you really do not need to have the backup file that you would generate using the export-svc-cfg. This kind of backup will be extremely useful and is the only way to perform the crash recovery. If the OpenDS is itself corrupted due to its internal database or index corruption, it will not start. Hence one cannot access the OpenSSO server or ssoadm command line tool to restore the XML backup. So, it is a must to backup your configuration directory from the file system periodically. Backing up the OpenSSO configuration data The process of backing up the OpenSSO service configuration data is slightly different from the complete backup of the overall system deployment. When the subcommand export-svc-cfg is invoked, the underlying code exports all the nodes under the ou=services,ROOT_SUFFIX of the configuration directory server: ./ssoadm export-svc-cfg -u amadmin -f /tmp/passwd_of_amadmin -e secretkeytoencryptpassword -o /tmp/svc-config-bkup.xml To perform this, you need to have the ssoadm command line tool configured. The options supplied to this command are self-explanatory except maybe the -e . This takes a random string that will be used as the encryption secret to encrypt the password entries in the service configuration data. For example, the RADIUS server's share secret value. You need this key to restore the data back to the server. The OpenSSO and its configuration directory server must be running in good condition in order to be successful with this export operation. This backup will be useful in the following cases: Administrator accidentally deleted some of the authentication configurations Administrator accidentally changed some of the configuration properties Somehow the agent profiles have lost their configuration data Want to reset to factory defaults In any case, one should be able to authenticate to OpenSSO as an admin to restore the configuration data. If the server is not in that state, then crash recovery is the only option. In the embedded store configuration case this means unzipping the file system configuration backup obtained as described in the Backing up the OpenSSO configuration files section. For the configuration data that is stored in the Directory Server Enterprise Edition, the customer should use the tools that are bundled with the Oracle Directory Server Enterprise Edition to backup and restore. Crash recovery and restore In the previous section, we briefly covered the crash recovery part of it. When a crash occurs in the embedded or remote configuration store, the server will not come up again unless it is restored back to a valid state. This may involve restoring the proper database state and indexes using a known valid state backup. This backup may have been obtained by using the ODSEE backup tools or simply zipping up the configuration file system of OpenSSO, as described in the Backing up the OpenSSO configuration files section. You need to bring back the OpenSSO server to a state where the administrator can log in to access the console. At this point the configuration exported to XML, as described in the Backing up the OpenSSO configuration data section can be used. Here is a sample execution of the import-svccfg subcommand. It is recommended to backup your vanilla configuration data from the file system periodically to use it in the crash recovery case (where the embedded store itself is corrupted). Backup of configuration data using the export-svc-cfg should be done frequently: ./ssoadm import-svc-cfg -u amadmin -f /tmp/passwd_of_amadmin -e mysecretenckey -X /tmp/svc-config-bkup.xml This will throw an error (because we have intentionally provided a wrong key) claiming that the secret key provided was wrong (actually it will show a string such as the following, that is a known bug): import-service-configuration-secret-key This is the key name that is supposed to contain a corresponding localizable error string. If you provide the correct encryption key, then it will import successfully: ./ssoadm import-svc-cfg -u amadmin -f /tmp/passwd_of_amadmin -e secretkeytoencryptpassword -X /tmp/svc-config-bkup.xml Directory Service contains existing data. Do you want to delete it? [y|N] y Please wait while we import the service configuration... Service Configuration was imported. Note that it prompts before overwriting the existing data to make sure that the current configuration is not overwritten accidentally. There is no incremental restore so be cautious while performing this operation. An import with a wrong version of the restore file could damage a working configuration. It is always recommended to backup the existing configuration before importing an existing configuration backup file. If you do not want to import the current file just enter N and the command will terminate without harming your data. Well, what happens if customers do not have the configuration files backed up? Suppose customers do not have the copy of the configuration files to restore, they can reconfigure the OpenSSO web application by accessing the configurator (after cleaning up existing configuration). Once they configure the server, they should be able to restore the XML backup. Nevertheless, the new configuration must match all the configuration parameters that were provided earlier including the hostname and port details. This information can be found in the .configParam file. If you are planning to export the configuration to a different server than the original server, then you should be referring to the Test to production section, that covers the details on how customers can migrate the test configuration to a production server. It requires more steps than simply restoring the configuration data.
Read more
  • 0
  • 0
  • 3021

article-image-creating-time-series-charts-r
Packt
01 Feb 2011
5 min read
Save for later

Creating Time Series Charts in R

Packt
01 Feb 2011
5 min read
Formatting time series data for plotting Time series or trend charts are the most common form of line graphs. There are a lot of ways in R to plot such data, however it is important to first format the data in a suitable format that R can understand. In this recipe, we will look at some ways of formatting time series data using the base and some additional packages. Getting ready In addition to the basic R functions, we will also be using the zoo package in this recipe. So first we need to install it: install.packages("zoo") How to do it... Let's use the dailysales.csv example dataset and format its date column: sales<-read.csv("dailysales.csv") d1<-as.Date(sales$date,"%d/%m/%y") d2<-strptime(sales$date,"%d/%m/%y") data.class(d1) [1] "Date" data.class(d2) [1] "POSIXt" How it works... We have seen two different functions to convert a character vector into dates. If we did not convert the date column, R would not automatically recognize the values in the column as dates. Instead, the column would be treated as a character vector or a factor. The as.Date() function takes at least two arguments: the character vector to be converted to dates and the format to which we want it converted. It returns an object of the Date class, represented as the number of days since 1970-01-01, with negative values for earlier dates. The values in the date column are in a DD/MM/YYYY format (you can verify this by typing sales$date at the R prompt). So, we specify the format argument as "%d/%m/%y". Please note that this order is important. If we instead use "%m/%d/%y", then our days will be read as months and vice-versa. The quotes around the value are also necessary. The strptime() function is another way to convert character vectors into dates. However, strptime() returns a different kind of object of class POSIXlt, which is a named list of vectors representing the different components of a date and time, such as year, month, day, hour, seconds, minutes, and a few more. POSIXlt is one of the two basic classes of date/times in R. The other class POSIXct represents the (signed) number of seconds since the beginning of 1970 (in the UTC time zone) as a numeric vector. POSIXct is more convenient for including in data frames, and POSIXlt is closer to human readable forms. A virtual class POSIXt inherits from both of the classes. That's why when we ran the data.class() function on d2 earlier, we get POSIXt as the result. strptime() also takes a character vector to be converted and the format as arguments. There's more... The zoo package is handy for dealing with time series data. The zoo() function takes an argument x, which can be a numeric vector, matrix, or factor. It also takes an order.by argument which has to be an index vector with unique entries by which the observations in x are ordered: library(zoo) d3<-zoo(sales$units,as.Date(sales$date,"%d/%m/%y")) data.class(d3) [1] "zoo" See the help on DateTimeClasses to find out more details about the ways dates can be represented in R. Plotting date and time on the X axis In this recipe, we will learn how to plot formatted date or time values on the X axis. Getting ready For the first example, we only need to use the base graphics function plot(). How to do it... We will use the dailysales.csv example dataset to plot the number of units of a product sold daily in a month: sales<-read.csv("dailysales.csv") plot(sales$units~as.Date(sales$date,"%d/%m/%y"),type="l", xlab="Date",ylab="Units Sold") How it works... Once we have formatted the series of dates using as.Date(), we can simply pass it to the plot() function as the x variable in either the plot(x,y) or plot(y~x) format. We can also use strptime() instead of using as.Date(). However, we cannot pass the object returned by strptime() to plot() in the plot(y~x) format. We must use the plot(x,y) format as follows: plot(strptime(sales$date,"%d/%m/%Y"),sales$units,type="l", xlab="Date",ylab="Units Sold") There's more... We can plot the example using the zoo() function as follows (assuming zoo is already installed): library(zoo) plot(zoo(sales$units,as.Date(sales$date,"%d/%m/%y"))) Note that we don't need to specify x and y separately when plotting using zoo; we can just pass the object returned by zoo() to plot(). We also need not specify the type as "l". Let's look at another example which has full date and time values on the X axis, instead of just dates. We will use the openair.csv example dataset for this example: air<-read.csv("openair.csv") plot(air$nox~as.Date(air$date,"%d/%m/%Y %H:%M"),type="l", xlab="Time", ylab="Concentration (ppb)", main="Time trend of Oxides of Nitrogen") (Move the mouse over the image to enlarge it.) The same graph can be made using zoo as follows: plot(zoo(air$nox,as.Date(air$date,"%d/%m/%Y %H:%M")), xlab="Time", ylab="Concentration (ppb)", main="Time trend of Oxides of Nitrogen")
Read more
  • 0
  • 0
  • 7915

article-image-faqs-microsoft-sql-server-2008-high-availability
Packt
28 Jan 2011
6 min read
Save for later

FAQs on Microsoft SQL Server 2008 High Availability

Packt
28 Jan 2011
6 min read
Microsoft SQL Server 2008 High Availability Minimize downtime, speed up recovery, and achieve the highest level of availability and reliability for SQL server applications by mastering the concepts of database mirroring,log shipping,clustering, and replication  Install various SQL Server High Availability options in a step-by-step manner  A guide to SQL Server High Availability for DBA aspirants, proficient developers and system administrators  Learn the pre and post installation concepts and common issues you come across while working on SQL Server High Availability  Tips to enhance performance with SQL Server High Availability  External references for further study Q: What is Clustering? A: Clustering is usually deployed when there is a critical business application running that needs to be available 24 X 7 or in terminology—High Availability. These clusters are known as Failover clusters because the primary goal to set up the cluster is to make services or business processes that are critical for business and should be available 24 X 7 with 99.99% up time. Q: How does MS Windows server Enterprise and Datacenter edition support failover clustering? A: MS Windows server Enterprise and Datacenter edition supports failover clustering. This is achieved by having two or more identical nodes connected to each other by means of private network and commonly used resources. In case of failure of any common resource or services, the first node (Active) passes the ownership to another node (Passive). Q: What is MSDTC? A: Microsoft Distributed Transaction Coordinator (MSDTC) is a service used by the SQL Server when it is required to have distributed transactions between more than one machine. In a clustered environment, SQL Server service can be hosted on any of the available nodes if the active node fails, and in this case MSDTC comes into the picture in case we have distributed queries and for replication, and hence the MSDTC service should be running. Following are a couple of questions with regard to MSDTC. Q: What will happen to the data that is being accessed? A: The data is taken care of, by shared disk arrays as it is shared and every node that is part of the cluster can access it; however, one node at a time can access and own it. Q: What about clients that were connected previously? Does the failover mean that developers will have to modify the connection string? A: Nothing like this happens. SQL Server is installed as a virtual server and it has a virtual IP address and that too is shared by every cluster node. So, the client actually knows only one SQL Server or its IP address. Here are the steps that explain how Failover will work: Node 1 owns the resources as of now, and is active node. The network adapter driver gets corrupted or suffers a physical damage. Heartbeat between Node1 and Node 2 is broken. Node 2 initiates the process to take ownership of the resources owned by the Node 1. It would approximately take two to five minutes to complete the process. Q: What is Hyper-V? What are their uses? A: Let's see what the Hyper-V is: It is a hypervisor-based technology that allows multiple operating systems to run on a host operating system at the same time. It has advantages of using SQL Server 2008 R2 on Windows Server 2008 R2 with Hyper-V. One such example could be the ability to migrate a live server, thereby increasing high availability without incurring downtime, among others. Hyper-V now supports up to 64 logical processors. It can host up to four VMs on a single licensed host server. SQL Server 2008 R2 allows an unrestricted number of virtual servers, thus making consolidation easy. It has the ability to manage multiple SQL Servers centrally using Utility Control Point (UCP). Sysprep utility can be used to create preconfigured VMs so that SQL Server deployment becomes easier. Q: What are the Hardware, Software and Operating system requirements for installing SQL Server 2008 R2? A: The following are the hardware requirements: Processor: Intel Pentium 3 or Higher Processor Speed: 1 GHZ or Higher RAM: 512 MB of RAM but 2 GB is recommended Display : VGA or Higher The following are the software requirements: Operating system: Windows 7 Ultimate, Windows Server 2003 (x86 or x64), Windows Server 2008 (x86 or x64) Disk space: Minimum 1 GB .Net Framework 3.5 Windows Installer 4.5 or later MDAC 2.8 SP1 or later The following are the operating system requirements for clustering: To install SQL Server 2008 clustering, it's essential to have Windows Server 2008 Enterprise or Data Center Edition installed on our host system with Full Installation, so that we don't have to go back and forth and install the required components and restart the system. Q: What is to be done when we see the network binding warning coming up? A: In this scenario, we will have to go to Network and Sharing Center | Change Adapter Settings. Once there, pressing Alt + F, we will select Advanced Settings. Select Public Network and move it up if it is not and repeat this process on the second node. Q: What is the difference between Active/Passive and Active/Active Failover Cluster? A: In reality, there is only one difference between Single-instance (Active/Passive Failover Cluster) and Multi-instance (Active/Active Failover Cluster). As its name suggests, in a Multi-instance cluster, there will be two or more SQL Server active instances running in a cluster, compared to one instance running in Single-instance. Also, to configure a multi-instance Cluster, we may need to procure additional disks, IP addresses, and network names for the SQL Server. Q: What is the benefit of having Multi-instance, that is, Active/Active configuration? A: Depending on the business requirement and the capability of our hardware, we may have one or more instances running in our cluster environment. The main goal is to have a better uptime and better High Availability by having multiple SQL Server instances running in an environment. Should anything go wrong with the one SQL Server instance, another instance can easily take over the control and keep the business-critical application up and running! Q: What will be the difference in the prerequisites for the Multi-instance Failover Cluster as compared to the Single-instance Failover Cluster? A: There will be no difference compared to a Single-instance Failover Cluster, except that we need to procure additional disk(s), network name, and IP addresses. We need to make sure that our hardware is capable of handling requests that come from client machines for both the instances. Installing a Multi-instance cluster is almost similar to adding a Single-instance cluster, except for the need to add a few resources along with a couple of steps here and there.
Read more
  • 0
  • 0
  • 1520
article-image-openam-identity-stores-types-supported-types-caching-and-notification
Packt
27 Jan 2011
11 min read
Save for later

OpenAM Identity Stores: Types, Supported Types, Caching and Notification

Packt
27 Jan 2011
11 min read
Like any other service, the Identity Repository service is also defined using an XML file named idRepoService.xml that can be found in <conf-dir>/config/xml. In this file one can define as many subschema as needed. By default, the following subschema names are defined: LDAPv3 LDAPv3ForAMDS LDAPv3ForOpenDS LDAPv3ForTivoli LDAPv3ForAD LDAPv3ForADAM Files Database However, not all of them are supported in the version that has been tested while writing this article. For instance the files, LDAPv3, and Database subschema are meant to be sample implementations. One can extend it for other databases, keeping this as an example. The rest of the sub configurations are all well tested and supported. One of the Identity Repository types Access Manager Repository is missing from this definition, as it is a manual process to add it into the OpenSSO server. That is something which will be detailed later in this article. It is also called a legacy SDK plugin for OpenSSO. The Identity Repository framework requires support for logging service and session management to deliver its overall functionality. Identity store types In OpenSSO, multiple types of Identity Repository plugins are implemented including the following: LDAPv3Repo AgentsRepo InternalRepo/SpecialRepo FilesRepo AMSDKRepo Unlike the Access Manager Repository plugin, these are available in a vanilla OpenSSO server. So customers can readily use it without requiring to perform any additional configuration steps. LDAPv3Repo: Is the plugin that will be used by customers and the administrators quite frequently as the other types of plugin implementations are mostly meant to be used by OpenSSO internal services. This plugin forms the basis for building the configuration for supporting various LDAP servers including Microsoft Active Directory, Active Directory Application Mode (ADAM/LDS), IBM Tivoli Directory, OpenDS, and Oracle Directory Server Enterprise Edition. There are subschema defined for each of the recently mentioned LDAP servers in the IdRepo service schema as described in the beginning of this section. AgentsRepo: Is a plugin that is used to manage the OpenSSO policy agents' profiles. Unlike the LDAPv3Repo, AgentsRepo uses the configuration repository to store the agent's configuration data including authentication information. Prior to the Agents 3.0 version, all agents accessing earlier versions of OpenSSO such as Access Manager 7.1, had most of the configuration data of the agents stored locally in the file system as plain text files. This imposed huge management problems for the customers to upgrade or change any configuration parameters as it required them to log in to each host where the agents are installed. Besides, the configuration of all agents prior to 3.0 was stored in the user identity store. In OpenSSO the agent's profiles and configurations are stored as part of the configuration Directory Information Tree (DIT). The AgentsRepo is a hidden internal repository plugin, and at no point should it be visible to end users or administrators for modification. SpecialRepo: In the predecessor of OpenSSO the administrative users were stored as part of the user identity store. So even when the configuration store is up and running administrators still cannot log in to the system unless the user identity store is up and running. This kind of limits the customer experience especially during pilot testing and troubleshooting scenarios. To overcome this, OpenSSO introduced a feature wherein all the core administrative users are stored as part of the configuration store in the IdRepo service. All the administrative and special user authentication by default uses this specialrepo framework. It may be possible to override this behavior by invoking module based authentication. SpecialRepo is used as a fallback repository to get authenticated to the OpenSSO server. SpecialRepo is also a hidden internal repository plugin. At no point, should it be visible to end users or administrators for modification. FilesRepo: Is no longer supported in the OpenSSO product. You can see the references of this in the source code but it cannot be configured to use flat files store for either configuration data or user identity data. AMSDKRepo: This plugin has been made available to maintain the compatibility with the Sun Java System Access Manager versions. When this plugin is enabled the identity schema is defined using the DAI service as described in the ums.xml. This plugin will not be available in the vanilla OpenSSO server, the administrator has to perform certain manual steps to have this plugin available for use. In this plugin, identity management is tightly coupled with the Oracle Directory Server Enterprise Edition. It is generally useful in the co-existence scenario where OpenSSO needs to co-exist with Sun Access Manager. In this article wherever we refer to "Access Manager Repository plugin" it means refer to AMSDKRepo. Besides this there is a sample implementation for the MySQL-based database repository available as part of the server default configuration. It works; however, it is not extensively tested for all the OpenSSO features. You can also refer to another discussion on the custom database repository implementation at this link: http:// www.badgers-in-foil.co.uk/notes/installing_a_custom_opensso_identity_ repository/. Caching and notification For the LDAPv3Repo, the key feature that enables it to perform and scale is the caching of results set for each client query, without which it would be impossible to achieve the performance and scalability. When caching is employed there is a possibility that clients could get stale information about identities. This can be avoided by keeping the cache cleaned up periodically or having an external event dirty the cache so new values can be cached. OpenSSO provides more than one way to tackle this caching and notification. There are a couple of ways in which the cache can be invalidated and refreshed. The Identity Repository design relies broadly on two types of mechanisms to refresh the IdRepo cache. They are: Persistent search-based event notification Time-to-live (TTL) based refresh Both methods have their own merits and can be enabled simultaneously, and it is recommended. This is to handle the scenario where a network glitch (which could cause a packet loss) might have caused the OpenSSO server to miss some change notifications. The value of TTL purely depends on the deployment environment and end user experience. Persistent search-based notification The OpenSSO Identity Repository plugin cache can be invalidated and refreshed by registering a persistent search connection to the backend LDAP server provided the LDAP server supports the persistent search control. The persistent search (http:// www.mozilla.org/directory/ietf-docs/draft-smith-psearch-ldap-01.txt) control 2.16.840.1.113730.3.4.3 is implemented by many of the commercial LDAP servers including: IBM (Tivoli Directory) Novell (eDirectory) Oracle Directory Server Enterprise Edition(ODSEE) OpenDS (OpenDS Directory Server 1.0.0-build007) Fedora-Directory/1.0.4 B2006.312.1539 In order to determine whether your LDAP vendor supports a persistent search, perform the following search for the persistent search control 2.16.840.1.113730.3.4.3: ldapsearch -p 389 -h ds_host -s base -b '' "objectclass=*" supportedControl | grep 2.16.840.1.113730.3.4.3 Microsoft Active Directory implements in a different form using the LDAP control 1.2.840.113556.1.4.528. Persistent searches are handled by the max-psearch-count property in the Sun Java Directory Server that defines the maximum number of persistent searches that can be performed on the directory server. The persistent search mechanism provides an active channel through which entries that change (and information about the changes that occur) can be communicated. As each persistent search operation uses one thread, limiting the number of simultaneous persistent searches prevents certain kinds of denial of service attacks. It is quite apparent that a client implementation that generates a large number of persistent connections to a single directory server may indicate that the LDAP protocol may not have been the correct transport. However, horizontal scaling using Directory Proxy Servers, or an LDAP Consumer tier, may assist to spread the load. The best solution, from an LDAP implementation, would be to limit persistent searches. If you have created a user data store against an LDAP server which supports RFC2026, then a persistent search connection will be created with base DN configured in the LDAPv3 configuration. The search filter for this connection is obtained from the data store configuration properties. Though it is possible to listen to a specific type of change event, OpenSSO registers the persistent search connections to receive all kinds of change events. The IdRepo framework has the logic to determine whether the underlying directory server supports persistent searches or not. If not supported it does not try to submit the persistent search. In this case customers may resort to a TTL-based notification as described in the next section. Each active persistent search request requires that an open TCP connection be maintained between an LDAP client (in this case it is OpenSSO) and an LDAP (backend user store LDAP server) server that might not otherwise be kept open. The OpenSSO server that acts as an LDAP client closes idle LDAP connections to the backend LDAP server in order to maximize the resource utilization. If the OpenSSO servers are behind the load balancer or a firewall you need to tune the value of "com. sun.am.event.connection.idle.timeout". If the persistent search connections are made through a Load Balancer (LB) or firewall, then these connections are subject to the TCP timeout value of the respective LB and/or firewall. In such a scenario once the firewall closes the persistent search connection due to an idle TCP timeout, then the change notifications cannot happen to OpenSSO unless the persistent search connection is re-established. Customers could avoid this scenario by configuring the idle timeout for the persistent search connection so that it would restart the persistent search TCP connection before the LB/firewall idle timeout, that way the LB/firewall will not have an idle persistent search connection. The advanced server configuration property "com.sun.am.event.connection. idle.timeout" specifies timeout value in minutes after which the persistent searches will be restarted. Ideally, this value should be lower than the LB/firewall TCP timeout, to make sure that the persistent searches are restarted before the connections are dropped. A value of "0" indicates that these searches will not be restarted. By default the value is "0". Only the connections that are timed out will be reset. You should never set this value to a value lower than the LB/firewall timeout. The delta should not be more than five minutes. If your LB's idle connection timeout is "50" minutes, then set this property value to "45" minutes. For some reason if you want to disable the persistent search to be submitted to the backend LDAP server, just leave the persistent search base (sun-idrepo-ldapv3- config-psearchbase) empty, this will cause the IdRepo to disable the persistent search connection. Time-to-live based notification There may be deployment scenarios where persistent search-based notifications may not be possible or the underlying LDAP server may not be supporting the persistent search control. In such scenarios customers can employ the TTL or timeto- live based notification mechanism. It is a feature that involves a proprietary implementation by the OpenSSO server. This feature works in a fashion that is similar to the polling mechanism in the OpenSSO clients where the client periodically polls the OpenSSO server for changes, often called "pull" model. Whereas persistent search-based notifications are termed as "push" model (the LDAP server pushes the changes to the clients). Regardless of the persistent search based change notifications, the OpenSSO server polls the underlying directory server and gets the data to refresh its Identity Repository cache. TTL-specific properties for Identity Repository cache When the OpenSSO deployment is configured for TTL-based cache refresh, there are certain server-side properties that need to be configured to enable the Identity Repository framework to refresh the cache. The following are the core properties that are relevant in the TTL context: com.sun.identity.idm.cache.entry.expire.enabled=true com.sun.identity.idm.cache.entry.user.expire.time=1 (in minutes). com.sun.identity.idm.cache.entry.default.expire.time=1 (in minutes). The property com.sun.identity.idm.cache.user.expire.time and com.sun. identity.idm.cache.default.expire.time specify time in minutes for which the user and non-user entries such as roles and groups respectively remain valid after their last modification. In other words after this specified period of time elapses, the data for the entry that is cached will expire. At that instant, new requests for these entries will result in fresh reading from the underlying Identity Repository plugins. Suppose the property com.sun.identity.idm.cache.entry.expire.enabled is set to true, the non-user objects cache entries will expire based on the time specified in the com.sun.identity.idm.cache.entry.default.expire.time property. The rest of the user entries objects will be cleaned up based on the value set in the property com.sun.identity.idm.cache.entry.user.expire.time.
Read more
  • 0
  • 0
  • 3097

article-image-choosing-styles-various-graph-elements-r
Packt
25 Jan 2011
4 min read
Save for later

Choosing Styles of Various Graph Elements in R

Packt
25 Jan 2011
4 min read
  R Graph Cookbook Detailed hands-on recipes for creating the most useful types of graphs in R – starting from the simplest versions to more advanced applications Learn to draw any type of graph or visual data representation in R Filled with practical tips and techniques for creating any type of graph you need; not just theoretical explanations All examples are accompanied with the corresponding graph images, so you know what the results look like Each recipe is independent and contains the complete explanation and code to perform the task as efficiently as possible    Choosing plotting point symbol styles and sizes In this recipe, we will see how we can adjust the styling of plotting symbols, which is useful and necessary when we plot more than one set of points representing different groups of data on the same graph. Getting ready All you need to try out this recipe is to run R and type the recipe at the command prompt. You can also choose to save the recipe as a script so that you can use it again later on. We will also use the cityrain.csv example data file. Please read the file into R as follows: rain<-read.csv("cityrain.csv") The code file can be downloaded from here. How to do it... The plotting symbol and size can be set using the pch and cex arguments: plot(rnorm(100),pch=19,cex=2)   How it works... The pch argument stands for plotting character (symbol). It can take numerical values (usually between 0 and 25) as well as single character values. Each numerical value represents a different symbol. For example, 1 represents circles, 2 represents triangles, 3 represents plus signs, and so on. If we set the value of pch to a character such as "*" or "£" in inverted commas, then the data points are drawn as that character instead of the default circles. The size of the plotting symbol is controlled by the cex argument, which takes numerical values starting at 0 giving the amount by which plotting symbols should be magnified relative to the default. Note that cex takes relative values (the default is 1). So, the absolute size may vary depending on the defaults of the graphic device in use. For example, the size of plotting symbols with the same cex value may be different for a graph saved as a PNG file versus a graph saved as a PDF. There’s more... The most common use of pch and cex is when we don’t want to use color to distinguish between different groups of data points. This is often the case in scientific journals which do not accept color images. For example, let’s plot the city rainfall data as a set of points instead of lines: plot(rain$Tokyo, ylim=c(0,250), main="Monthly Rainfall in major cities", xlab="Month of Year", ylab="Rainfall (mm)", pch=1) points(rain$NewYork,pch=2) points(rain$London,pch=3) points(rain$Berlin,pch=4) legend("top", legend=c("Tokyo","New York","London","Berlin"), ncol=4, cex=0.8, bty="n", pch=1:4) Choosing line styles and width Similar to plotting point symbols, R provides simple ways to adjust the style of lines in graphs. Getting ready All you need to try out this recipe is to run R and type the recipe at the command prompt. You can also choose to save the recipe as a script so that you can use it again later on. We will again use the cityrain.csv data file. How to do it... Line styles can be set by using the lty and lwd arguments (for line type and width respectively) in the plot(), lines(), and par() commands. Let’s take our rainfall example and apply different line styles keeping the color the same: plot(rain$Tokyo, ylim=c(0,250), main="Monthly Rainfall in major cities", xlab="Month of Year", ylab="Rainfall (mm)", type="l", lty=1, lwd=2) lines(rain$NewYork,lty=2,lwd=2) lines(rain$London,lty=3,lwd=2) lines(rain$Berlin,lty=4,lwd=2) legend("top", legend=c("Tokyo","New York","London","Berlin"), ncol=4, cex=0.8, bty="n", lty=1:4, lwd=2)   How it works... Both line type and width can be set with numerical values as shown in the previous example. Line type number values correspond to types of lines: 0: blank 1: solid (default) 2: dashed 3: dotted 4: dotdash 5: longdash 6: twodash We can also use the character strings instead of numbers, for example, lty="dashed" instead of lty=2. The line width argument lwd takes positive numerical values. The default value is 1. In the example we used a value of 2, thus making the lines thicker than default.
Read more
  • 0
  • 0
  • 2347
Modal Close icon
Modal Close icon