Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-disaster-recovery-mysql-python

25 Sep 2010

10 min read

Disaster Recovery in MySQL for Python

25 Sep 2010

MySQL for Python Integrate the flexibility of Python and the power of MySQL to boost the productivity of your Python applications Implement the outstanding features of Python's MySQL library to their full potential See how to make MySQL take the processing burden from your programs Learn how to employ Python with MySQL to power your websites and desktop applications Apply your knowledge of MySQL and Python to real-world problems instead of hypothetical scenarios A manual packed with step-by-step exercises to integrate your Python applications with the MySQL database server Read more about this book (For more resources on Phython, see here.) The purpose of the archiving methods covered in this article is to allow you, as the developer, to back up databases that you use for your work without having to rely on the database administrator. As noted later in the article, there are more sophisticated methods for backups than we cover here, but they involve system-administrative tasks that are beyond the remit of any development post and are thus beyond the scope of this article. Every database needs a backup plan When archiving a database, one of the critical questions that must be answered is how to take a snapshot backup of the database without having users change the data in the process. If data changes in the midst of the backup, it results in an inconsistent backup and compromises the integrity of the archive. There are two strategic determinants for backing up a database system: Offline backups Live backups Which you use depends on the dynamics of the system in question and the import of the data being stored. In this article, we will look at each in turn and the way to implement them. Offline backups Offline backups are done by shutting down the server so the records can be archived without the fear of them being changed by the user. It also helps to ensure the server shut down gracefully and that errors were avoided. The problem with using this method on most production systems is that it necessitates a temporary loss of access to the service. For most service providers, such a consequence is anathema to the business model. The value of this method is that one can be certain that the database has not changed at all while the backup is run. Further, in many cases, the backup is performed faster because the processor is not simultaneously serving data. For this reason, offline backups are usually performed in controlled environments or in situations where disruption is not critical to the user. These include internal databases, where administrators can inform all users about the disruption ahead of time, and small business websites that do not receive a lot of traffic. Offline backups also have the benefit that the backup is usually held in a single file. This can then be used to copy a database across hosts with relative ease. Shutting down a server obviously requires system administrator-like authority. So creating an offline backup relies on the system administrator shutting down the server. If your responsibilities include database administration, you will also have sufficient permission to shut down the server. Live backups Live backups occur while the server continues to accept queries from users, while it's still online. It functions by locking down the tables so no new data may be written to them. Users usually do not lose access to the data and the integrity of the archive, for a particular point in time is assured. Live backups are used by large, data-intensive sites such as Nokia's Ovi services and Google's web services. However, because they do not always require administrator access of the server itself, these tend to suit the backup needs of a development project. Choosing a backup method After having determined whether a database can be stopped for the backup, a developer can choose from three methods of archiving: Copying the data files (including administrative files such as logs and tablespaces) Exporting delimited text files Backing up with command-line programs Which you choose depends on what permissions you have on the server and how you are accessing the data. MySQL also allows for two other forms of backup: using the binary log and by setting up replication (using the master and slave servers). To be sure, these are the best ways to back up a MySQL database. But, both of these are administrative tasks and require system-administrator authority; they are not typically available to a developer. However, you can read more about them in the MySQL documentation. Use of the binary log for incremental backups is documented at: http://dev.mysql.com/doc/refman/5.5/en/point-in-time-recovery.html Setting up replication is further dealt with at: http://dev.mysql.com/doc/refman/5.5/en/replication-solutions-backups.html Copying the table files The most direct way to back up database files is to copy from where MySQL stores the database itself. This will naturally vary based on platform. If you are unsure about which directory holds the MySQL database files, you can query MySQL itself to check: mysql> SHOW VARIABLES LIKE 'datadir'; Alternatively, the following shell command sequence will give you the same information: $ mysqladmin variables | grep datadir| datadir | /var/lib/mysql/ | Note that the location of administrative files, such as binary logs and InnoDB tablespaces are customizable and may not be in the data directory. If you do not have direct access to the MySQL server, you can also write a simple Python program to get the information: #!/usr/bin/env pythonimport MySQLdbmydb = MySQLdb.connect('<hostname>', '<user>', '<password>')cursor = mydb.cursor()runit = cursor.execute("SHOW VARIABLES LIKE 'datadir'")results = cursor.fetchall()print "%s: %s" %(cursor.fetchone()) Slight alteration of this program will also allow you to query several servers automatically. Simply change the login details and adapt the output to clarify which data is associated with which results. Locking and flushing If you are backing up an offline MyISAM system, you can copy any of the files once the server has been stopped. Before backing up a live system, however, you must lock the tables and flush the log files in order to get a consistent backup at a specific point. These tasks are handled by the LOCK TABLES and FLUSH commands respectively. When you use MySQL and its ancillary programs (such as mysqldump) to perform a backup, these tasks are performed automatically. When copying files directly, you must ensure both are done. How you apply them depends on whether you are backing up an entire database or a single table. LOCK TABLES The LOCK TABLES command secures a specified table in a designated way. Tables can be referenced with aliases using AS and can be locked for reading or writing. For our purposes, we need only a read lock to create a backup. The syntax looks like this: LOCK TABLES <tablename> READ; This command requires two privileges: LOCK TABLES and SELECT. It must be noted that LOCK TABLES does not lock all tables in a database but only one. This is useful for performing smaller backups that will not interrupt services or put too severe a strain on the server. However, unless you automate the process, manually locking and unlocking tables as you back up data can be ridiculously inefficient. FLUSH The FLUSH command is used to reset MySQL's caches. By re-initiating the cache at the point of backup, we get a clear point of demarcation for the database backup both in the database itself and in the logs. The basic syntax is straightforward, as follows: FLUSH <the object to be reset>; Use of FLUSH presupposes the RELOAD privilege for all relevant databases. What we reload depends on the process we are performing. For the purpose of backing up, we will always be flushing tables: FLUSH TABLES; How we "flush" the tables will depend on whether we have already used the LOCK TABLES command to lock the table. If we have already locked a given table, we can call FLUSH for that specific table: FLUSH TABLES <tablename>; However, if we want to copy an entire database, we can bypass the LOCK TABLES command by incorporating the same call into FLUSH: FLUSH TABLES WITH READ LOCK; This use of FLUSH applies across the database, and all tables will be subject to the read lock. If the account accessing the database does not have sufficient privileges for all databases, an error will be thrown. Unlocking the tables Once you have copied the files for a backup, you need to remove the read lock you imposed earlier. This is done by releasing all locks for the current session: UNLOCK TABLES; Restoring the data Restoring copies of the actual storage files is as simple as copying them back into place. This is best done when MySQL has stopped, lest you risk corruption. Similarly, if you have a separate MySQL server and want to transfer a database, you simply need to copy the directory structure from the one server to another. On restarting, MySQL will see the new database and treat it as if it had been created natively. When restoring the original data files, it is critical to ensure the permissions on the files and directories are appropriate and match those of the other MySQL databases. Delimited backups within MySQL MySQL allows for exporting of data from the MySQL command line. To do so, we simply direct the output from a SELECT statement to an output file. Using SELECT INTO OUTFILE to export data Using sakila, we can save the data from film to a file called film.data as follows: SELECT * INTO OUTFILE 'film.data' FROM film; This results in the data being written in a tab-delimited format. The file will be written to the directory in which MySQL stores the sakila data. Therefore, the account under which the SELECT statement is executed must have the FILE privilege for writing the file as well as login access on the server to view it or retrieve it. The OUTFILE option on SELECT can be used to write to any place on the server that MySQL has write permission to use. One simply needs to prepend that directory location to the file name. For example, to write the same file to the /tmp directory on a Unix system, use: SELECT * INTO OUTFILE '/tmp/film.data' FROM film; Windows simply requires adjustment of the directory structure accordingly. Using LOAD DATA INFILE to import data If you have an output file or similar tab-delimited file and want to load it into MySQL, use the LOAD DATA INFILE command. The basic syntax is: LOAD DATA INFILE '<filename>' INTO TABLE <tablename>; For example, to import the film.data file from the /tmp directory into another table called film2, we would issue this command: LOAD DATA INFILE '/tmp/film.data' INTO TABLE film2; Note that LOAD DATA INFILE presupposes the creation of the table into which the data is being loaded. In the preceding example, if film2 had not been created, we would receive an error. If you are trying to mirror a table, remember to use the SHOW CREATE TABLE query to save yourself time in formulating the CREATE statement. This discussion only touches on how to use LOAD DATA INFILE for inputting data created with the OUTFILE option of SELECT. But, the command handles text files with just about any set of delimiters. To read more on how to use it for other file formats, see the MySQL documentation at: http://dev.mysql.com/doc/refman/5.5/en/load-data.html

0
0
6243

article-image-understanding-model-based-clustering

Packt

14 Sep 2015

10 min read

Understanding Model-based Clustering

Packt

14 Sep 2015

10 min read

0
0
6232

article-image-regular-expressions-python-26-text-processing

Packt

13 Jan 2011

17 min read

Regular Expressions in Python 2.6 Text Processing

Packt

13 Jan 2011

17 min read

0
0
6212

article-image-make-things-pretty-ggplot2

Packt

24 Feb 2016

30 min read

Make Things Pretty with ggplot2

Packt

24 Feb 2016

30 min read

0
0
6191

Packt

29 Oct 2015

32 min read

Protecting Your Bitcoins

Packt

29 Oct 2015

32 min read

0
0
6188

article-image-linking-section-access-multiple-dimensions

Packt

25 Jun 2013

3 min read

Linking Section Access to multiple dimensions

Packt

25 Jun 2013

3 min read

(For more resources related to this topic, see here.) Getting ready Load the following script: Product:LOAD * INLINE [ ProductID, ProductGroup, ProductName 1, GroupA, Great As 2, GroupC, Super Cs 3, GroupC, Mega Cs 4, GroupB, Good Bs 5, GroupB, Busy Bs];Customer:LOAD * INLINE [ CustomerID, CustomerName, Country 1, Gatsby Gang, USA 2, Charly Choc, USA 3, Donnie Drake, USA 4, London Lamps, UK 5, Shylock Homes, UK];Sales:LOAD * INLINE [ CustomerID, ProductID, Sales 1, 2, 3536 1, 3, 4333 1, 5, 2123 2, 2, 45562, 4, 1223 2, 5, 6789 3, 2, 1323 3, 3, 3245 3, 4, 6789 4, 2, 2311 4, 3, 1333 5, 1, 7654 5, 2, 3455 5, 3, 6547 5, 4, 2854 5, 5, 9877];CountryLink:Load Distinct Country, Upper(Country) As COUNTRY_LINKResident Customer;Load Distinct Country, 'ALL' As COUNTRY_LINKResident Customer;ProductLink:Load Distinct ProductGroup, Upper(ProductGroup) As PRODUCT_LINKResident Product;Load Distinct ProductGroup, 'ALL' As PRODUCT_LINKResident Product;//Section Access;Access:LOAD * INLINE [ ACCESS, USERID, PRODUCT_LINK, COUNTRY_LINKADMIN, ADMIN, *, * USER, GM, ALL, ALL USER, CM1, ALL, USA USER, CM2, ALL, UK USER, PM1, GROUPA, ALL USER, PM2, GROUPB, ALL USER, PM3, GROUPC, ALL USER, SM1, GROUPB, UK USER, SM2, GROUPA, USA];Section Application; Note that there is a loop error generated on reload because there is a loop in the data structure. How to do it… Follow these steps to link Section Access to multiple dimensions: Add list boxes to the layout for ProductGroup and Country. Add a statistics box for Sales. Remove // to uncomment the Section Access statement. From the Settings menu, open Document Properties and select the Opening tab. Turn on the Initial Data Reduction Based on Section Access option. Reload and save the document. Close QlikView. Re-open QlikView and open the document. Log in as the Country Manager, CM1, user. Note that USA is the only country. Also, the product group, GroupA, is missing—there are no sales of this product group in USA. Close QlikView and then re-open again. This time, log in as the Sales Manager, SM2. You will not be allowed access to the document. Log into the document as the ADMIN user. Edit the script. Add a second entry for the SM2 user in the Access table as follows: USER, SM2, GROUPA, USA USER, SM2, GROUPB, UK Reload, save, and close the document and QlikView. Re-open and log in as SM2. Note the selections. How it works… Section Access is really quite simple. The user is connected to the data and the data is reduced accordingly. QlikView allows Section Access tables to be connected to multiple dimensions in the main data structure without causing issues with loops. Each associated field acts in the same way as a selection in the layout. The initial setting for the SM2 user contained values that were mutually exclusive. Because of the default Strict Exclusion setting, the SM2 user cannot log in. We changed the script and included multiple rows for the SM2 user. Intuitively, we might expect that, as the first row did not connect to the data, only the second row would connect to the data. However, each field value is treated as an individual selection and all of the values are included. There's more… If we wanted to include solely the composite association of Country and ProductGroup, we would need to derive a composite key in the data set and connect the user to that. In this example, we used the USERID field to test using QlikView logins. However, we would normally use NTNAME to link the user to either a Windows login or a custom login. Resources for Article : Further resources on this subject: Pentaho Reporting: Building Interactive Reports in Swing [Article] Visual ETL Development With IBM DataStage [Article] A Python Multimedia Application: Thumbnail Maker [Article]

0
0
6166

Packt

21 Jun 2011

7 min read

Segmenting images in OpenCV

Packt

21 Jun 2011

7 min read

OpenCV 2 Computer Vision Application Programming Cookbook Over 50 recipes to master this library of programming functions for real-time computer vision Read more about this book OpenCV (Open Source Computer Vision) is an open source library containing more than 500 optimized algorithms for image and video analysis. Since its introduction in 1999, it has been largely adopted as the primary development tool by the community of researchers and developers in computer vision. OpenCV was originally developed at Intel by a team led by Gary Bradski as an initiative to advance research in vision and promote the development of rich, vision-based CPU-intensive applications. In the previous article by Robert Laganière, author of OpenCV 2 Computer Vision Application Programming Cookbook, we took a look at image processing using morphological filters. In this article we will see how to segment images using watersheds and GrabCut algorithm. (For more resources related to this subject, see here.) Segmenting images using watersheds The watershed transformation is a popular image processing algorithm that is used to quickly segment an image into homogenous regions. It relies on the idea that when the image is seen as a topological relief, homogeneous regions correspond to relatively flat basins delimitated by steep edges. As a result of its simplicity, the original version of this algorithm tends to over-segment the image which produces multiple small regions. This is why OpenCV proposes a variant of this algorithm that uses a set of predefined markers which guide the definition of the image segments. How to do it... The watershed segmentation is obtained through the use of the cv::watershed function. The input to this function is a 32-bit signed integer marker image in which each non-zero pixel represents a label. The idea is to mark some pixels of the image that are known to certainly belong to a given region. From this initial labeling, the watershed algorithm will determine the regions to which the other pixels belong. In this recipe, we will first create the marker image as a gray-level image, and then convert it into an image of integers. We conveniently encapsulated this step into a WatershedSegmenter class: class WatershedSegmenter { private: cv::Mat markers;public: void setMarkers(const cv::Mat& markerImage) { // Convert to image of ints markerImage.convertTo(markers,CV_32S); } cv::Mat process(const cv::Mat &image) { // Apply watershed cv::watershed(image,markers); return markers; } The way these markers are obtained depends on the application. For example, some preprocessing steps might have resulted in the identification of some pixels belonging to an object of interest. The watershed would then be used to delimitate the complete object from that initial detection. In this recipe, we will simply use the binary image used in the previous article (OpenCV: Image Processing using Morphological Filters) in order to identify the animals of the corresponding original image. Therefore, from our binary image, we need to identify pixels that certainly belong to the foreground (the animals) and pixels that certainly belong to the background (mainly the grass). Here, we will mark foreground pixels with label 255 and background pixels with label 128 (this choice is totally arbitrary, any label number other than 255 would work). The other pixels, that is the ones for which the labeling is unknown, are assigned value 0. As it is now, the binary image includes too many white pixels belonging to various parts of the image. We will then severely erode this image in order to retain only pixels belonging to the important objects: // Eliminate noise and smaller objectscv::Mat fg;cv::erode(binary,fg,cv::Mat(),cv::Point(-1,-1),6); The result is the following image: Note that a few pixels belonging to the background forest are still present. Let's simply keep them. Therefore, they will be considered to correspond to an object of interest. Similarly, we also select a few pixels of the background by a large dilation of the original binary image: // Identify image pixels without objectscv::Mat bg;cv::dilate(binary,bg,cv::Mat(),cv::Point(-1,-1),6);cv::threshold(bg,bg,1,128,cv::THRESH_BINARY_INV); The resulting black pixels correspond to background pixels. This is why the thresholding operation immediately after the dilation assigns to these pixels the value 128. The following image is then obtained: These images are combined to form the marker image: // Create markers imagecv::Mat markers(binary.size(),CV_8U,cv::Scalar(0));markers= fg+bg; Note how we used the overloaded operator+ here in order to combine the images. This is the image that will be used as input to the watershed algorithm: The segmentation is then obtained as follows: // Create watershed segmentation objectWatershedSegmenter segmenter;// Set markers and processsegmenter.setMarkers(markers);segmenter.process(image); The marker image is then updated such that each zero pixel is assigned one of the input labels, while the pixels belonging to the found boundaries have value -1. The resulting image of labels is then: The boundary image is: How it works... As we did in the preceding recipe, we will use the topological map analogy in the description of the watershed algorithm. In order to create a watershed segmentation, the idea is to progressively flood the image starting at level 0. As the level of "water" progressively increases (to levels 1, 2, 3, and so on), catchment basins are formed. The size of these basins also gradually increase and, consequently, the water of two different basins will eventually merge. When this happens, a watershed is created in order to keep the two basins separated. Once the level of water has reached its maximal level, the sets of these created basins and watersheds form the watershed segmentation. As one can expect, the flooding process initially creates many small individual basins. When all of these are merged, many watershed lines are created which results in an over-segmented image. To overcome this problem, a modification to this algorithm has been proposed in which the flooding process starts from a predefined set of marked pixels. The basins created from these markers are labeled in accordance with the values assigned to the initial marks. When two basins having the same label merge, no watersheds are created, thus preventing the oversegmentation. This is what happens when the cv::watershed function is called. The input marker image is updated to produce the final watershed segmentation. Users can input a marker image with any number of labels with pixels of unknown labeling left to value 0. The marker image has been chosen to be an image of a 32-bit signed integer in order to be able to define more than 255 labels. It also allows the special value -1, to be assigned to pixels associated with a watershed. This is what is returned by the cv::watershed function. To facilitate the displaying of the result, we have introduced two special methods. The first one returns an image of the labels (with watersheds at value 0). This is easily done through thresholding: // Return result in the form of an imagecv::Mat getSegmentation() { cv::Mat tmp; // all segment with label higher than 255 // will be assigned value 255 markers.convertTo(tmp,CV_8U); return tmp;} Similarly, the second method returns an image in which the watershed lines are assigned value 0, and the rest of the image is at 255. This time, the cv::convertTo method is used to achieve this result: // Return watershed in the form of an imagecv::Mat getWatersheds() { cv::Mat tmp; // Each pixel p is transformed into // 255p+255 before conversion markers.convertTo(tmp,CV_8U,255,255); return tmp;} The linear transformation that is applied before the conversion allows -1 pixels to be converted into 0 (since -1*255+255=0). Pixels with a value greater than 255 are assigned the value 255. This is due to the saturation operation that is applied when signed integers are converted into unsigned chars. See also The article The viscous watershed transform by C. Vachier, F. Meyer, Journal of Mathematical Imaging and Vision, volume 22, issue 2-3, May 2005, for more information on the watershed transform. The next recipe which presents another image segmentation algorithm that can also segment an image into background and foreground objects.

0
0
6135

article-image-oracle-goldengate-11g-performance-tuning

Packt

01 Mar 2011

12 min read

Oracle GoldenGate 11g: Performance Tuning

Packt

01 Mar 2011

12 min read

0
0
6095

Packt

21 Aug 2014

9 min read

Using Canvas and D3

Packt

21 Aug 2014

9 min read

This article by Pablo Navarro Castillo, author of Mastering D3.js, includes that most of the time, we use D3 to create SVG-based charts and visualizations. If the number of elements to render is huge, or if we need to render raster images, it can be more convenient to render our visualizations using the HTML5 canvas element. In this article, we will learn how to use the force layout of D3 and the canvas element to create an animated network visualization with random data. (For more resources related to this topic, see here.) Creating figures with canvas The HTML canvas element allows you to create raster graphics using JavaScript. It was first introduced in HTML5. It enjoys more widespread support than SVG, and can be used as a fallback option. Before diving deeper into integrating canvas and D3, we will construct a small example with canvas. The canvas element should have the width and height attributes. This alone will create an invisible figure of the specified size: <!— Canvas Element --> <canvas id="canvas-demo" width="650px" height="60px"></canvas> If the browser supports the canvas element, it will ignore any element inside the canvas tags. On the other hand, if the browser doesn't support the canvas, it will ignore the canvas tags, but it will interpret the content of the element. This behavior provides a quick way to handle the lack of canvas support: <!— Canvas Element --> <canvas id="canvas-demo" width="650px" height="60px">  <img src="img/fallback-img.png" width="650" height="60"></img> </canvas> If the browser doesn't support canvas, the fallback image will be displayed. Note that unlike the <img> element, the canvas closing tag (</canvas>) is mandatory. To create figures with canvas, we don't need special libraries; we can create the shapes using the canvas API: <script> // Graphic Variables var barw = 65, barh = 60; // Append a canvas element, set its size and get the node. var canvas = document.getElementById('canvas-demo'); // Get the rendering context. var context = canvas.getContext('2d'); // Array with colors, to have one rectangle of each color. var color = ['#5c3566', '#6c475b', '#7c584f', '#8c6a44', '#9c7c39', '#ad8d2d', '#bd9f22', '#cdb117', '#ddc20b', '#edd400']; // Set the fill color and render ten rectangles. for (var k = 0; k < 10; k += 1) { // Set the fill color. context.fillStyle = color[k]; // Create a rectangle in incremental positions. context.fillRect(k * barw, 0, barw, barh); } </script> We use the DOM API to access the canvas element with the canvas-demo ID and to get the rendering context. Then we set the color using the fillStyle method, and use the fillRect canvas method to create a small rectangle. Note that we need to change fillStyle every time or all the following shapes will have the same color. The script will render a series of rectangles, each one filled with a different color, shown as follows: A graphic created with canvas Canvas uses the same coordinate system as SVG, with the origin in the top-left corner, the horizontal axis augmenting to the right, and the vertical axis augmenting to the bottom. Instead of using the DOM API to get the canvas node, we could have used D3 to create the node, set its attributes, and created scales for the color and position of the shapes. Note that the shapes drawn with canvas don't exists in the DOM tree; so, we can't use the usual D3 pattern of creating a selection, binding the data items, and appending the elements if we are using canvas. Creating shapes Canvas has fewer primitives than SVG. In fact, almost all the shapes must be drawn with paths, and more steps are needed to create a path. To create a shape, we need to open the path, move the cursor to the desired location, create the shape, and close the path. Then, we can draw the path by filling the shape or rendering the outline. For instance, to draw a red semicircle centered in (325, 30) and with a radius of 20, write the following code: // Create a red semicircle. context.beginPath(); context.fillStyle = '#ff0000'; context.moveTo(325, 30); context.arc(325, 30, 20, Math.PI / 2, 3 * Math.PI / 2); context.fill(); The moveTo method is a bit redundant here, because the arc method moves the cursor implicitly. The arguments of the arc method are the x and y coordinates of the arc center, the radius, and the starting and ending angle of the arc. There is also an optional Boolean argument to indicate whether the arc should be drawn counterclockwise. A basic shape created with the canvas API is shown in the following screenshot: Integrating canvas and D3 We will create a small network chart using the force layout of D3 and canvas instead of SVG. To make the graph looks more interesting, we will randomly generate the data. We will generate 250 nodes sparsely connected. The nodes and links will be stored as the attributes of the data object: // Number of Nodes var nNodes = 250, createLink = false; // Dataset Structure var data = {nodes: [],links: []}; We will append nodes and links to our dataset. We will create nodes with a radius attribute randomly assigning it a value of either 2 or 4 as follows: // Iterate in the nodes for (var k = 0; k < nNodes; k += 1) { // Create a node with a random radius. data.nodes.push({radius: (Math.random() > 0.3) ? 2 : 4}); // Create random links between the nodes. } We will create a link with probability of 0.1 only if the difference between the source and target indexes are less than 8. The idea behind this way to create links is to have only a few connections between the nodes: // Create random links between the nodes. for (var j = k + 1; j < nNodes; j += 1) { // Create a link with probability 0.1 createLink = (Math.random() < 0.1) && (Math.abs(k - j) < 8); if (createLink) { // Append a link with variable distance between the nodes data.links.push({ source: k, target: j, dist: 2 * Math.abs(k - j) + 10 }); } } We will use the radius attribute to set the size of the nodes. The links will contain the distance between the nodes and the indexes of the source and target nodes. We will create variables to set the width and height of the figure: // Figure width and height var width = 650, height = 300; We can now create and configure the force layout. As we did in the previous section, we will set the charge strength to be proportional to the area of each node. This time, we will also set the distance between the links, using the linkDistance method of the layout: // Create and configure the force layout var force = d3.layout.force() .size([width, height]) .nodes(data.nodes) .links(data.links) .charge(function(d) { return -1.2 * d.radius * d.radius; }) .linkDistance(function(d) { return d.dist; }) .start(); We can create a canvas element now. Note that we should use the node method to get the canvas element, because the append and attr methods will both return a selection, which don't have the canvas API methods: // Create a canvas element and set its size. var canvas = d3.select('div#canvas-force').append('canvas') .attr('width', width + 'px') .attr('height', height + 'px') .node(); We get the rendering context. Each canvas element has its own rendering context. We will use the '2d' context, to draw two-dimensional figures. At the time of writing this, there are some browsers that support the webgl context; more details are available at https://developer.mozilla.org/en-US/docs/Web/WebGL/Getting_started_with_WebGL. Refer to the following '2d' context: // Get the canvas context. var context = canvas.getContext('2d'); We register an event listener for the force layout's tick event. As canvas doesn't remember previously created shapes, we need to clear the figure and redraw all the elements on each tick: force.on('tick', function() { // Clear the complete figure. context.clearRect(0, 0, width, height); // Draw the links ... // Draw the nodes ... }); The clearRect method cleans the figure under the specified rectangle. In this case, we clean the entire canvas. We can draw the links using the lineTo method. We iterate through the links, beginning a new path for each link, moving the cursor to the position of the source node, and by creating a line towards the target node. We draw the line with the stroke method: // Draw the links data.links.forEach(function(d) { // Draw a line from source to target. context.beginPath(); context.moveTo(d.source.x, d.source.y); context.lineTo(d.target.x, d.target.y); context.stroke(); }); We iterate through the nodes and draw each one. We use the arc method to represent each node with a black circle: // Draw the nodes data.nodes.forEach(function(d, i) { // Draws a complete arc for each node. context.beginPath(); context.arc(d.x, d.y, d.radius, 0, 2 * Math.PI, true); context.fill(); }); We obtain a constellation of disconnected network graphs, slowly gravitating towards the center of the figure. Using the force layout and canvas to create a network chart is shown in the following screenshot: We can think that to erase all the shapes and redraw each shape again and again could have a negative impact on the performance. In fact, sometimes it's faster to draw the figures using canvas, because this way the browser doesn't have to manage the DOM tree of the SVG elements (but it still have to redraw them if the SVG elements are changed). Summary In this article, we will learn how to use the force layout of D3 and the canvas element to create an animated network visualization with random data. Resources for Article: Further resources on this subject: Interacting with Data for Dashboards [Article] Kendo UI DataViz – Advance Charting [Article] Visualizing my Social Graph with d3.js [Article]

0
0
6090

article-image-putting-fun-functional-python

Packt

17 Feb 2016

21 min read

Putting the Fun in Functional Python

Packt

17 Feb 2016

21 min read

0
1
6081

article-image-choose-r-data-mining-project

Fatema Patrawala

15 Feb 2018

9 min read

Why choose R for your data mining project

Fatema Patrawala

15 Feb 2018

9 min read

[box type="note" align="" class="" width=""]Our article is an excerpt taken from the book R Data Mining, written by Andrea Cirillo. If you are a budding data scientist or a data analyst with basic knowledge of R, and you want to get into the intricacies of data mining in a practical manner, be sure to check out this book.[/box] In today’s post, we will analyze R's strengths, and understand why it is a savvy idea to learn this programming language for data mining. R's strengths You know that R is really popular, but why? R is not the only data analysis language out there, and neither is it the oldest one; so why is it so popular? If looking at the root causes of R's popularity, we definitely have to mention these three: Open source inside Plugin ready Data visualization friendly Open source inside One of the main reasons the adoption of R is spreading is its open source nature. R binary code is available for everyone to download, modify, and share back again (only in an open source way). Technically, R is released with a GNU general public license, meaning that you can take it and use it for whatever purpose; but you have to share every derivative with a GNU general public license as well. These attributes fit well for almost every target user of a statistical analysis language: Academic user: Knowledge sharing is a must for an academic environment, and having the ability to share work without the worry of copyright and license questions makes R very practical for academic research purposes Business user: Companies are always worried about budget constraints; having professional statistical analysis software at their disposal for free sounds like a dream come true Private user: This user merges together both of the benefits already mentioned, because they will find it great to have a free instrument with which to learn and share their own statistical analyses Plugin ready You could imagine the R language as an expandable board game. You know, games like 7 Wonders or Carcassonne, with a base set of characters and places and further optional places and characters, increasing the choices at your disposal and maximizing the fun. The R language can be compared to this kind of game. There is a base version of R, containing a group of default packages that are delivered along with the standard version of the software (you can skip to the Installing R and writing R code section for more on how to obtain and install it). The functionalities available through the base version are mainly related to file system manipulation, statistical analysis, and data Visualization. While this base version is regularly maintained and updated by the R core team, virtually every R user can add further new functionalities to those available within the package, developing and sharing custom packages. This is basically how the package development and sharing flow works: The R user develops a new package, for example a package introducing a new machine learning algorithm exposed within a freshly published academic paper. The user submits the package to the CRAN repository or a similar repository. The Comprehensive R Archive Network (CRAN) is the official repository for R related documents and packages. Every R user can gain access to the additional features introduced with any given package, installing and loading them into their R environment. If the package has been submitted to CRAN, installing and loading the package will result in running just the two following lines of R code (similar commands are available for alternative repositories such as Bioconductor): install.packages("ggplot2") library(ggplot2) As you can see, this is a really convenient and effective way to expand R functionalities, and you will soon see how wide the range of functionalities added through additional packages developed by R users is. More than 9,000 packages are available on CRAN, and this number is sure to increase further, making more and more additional features available to the R community. Data visualization friendly As a discipline data visualization encompasses all of the principles and techniques employable to effectively display the information and messages contained within a set of data. Since we are living in an information-heavy age, the ability to effectively and concisely communicate articulated and complex messages through data visualization is a core asset for any professional. This is exactly why R is experiencing a great response in academic and professional fields: the data visualization capabilities of R place it at the cutting edge of these fields. R has been noticed for its amazing data visualization features right from its beginning; when some of its peers still showed x axes-built aggregating + signs, R was already able to produce astonishing 3D plots. Nevertheless, a major improvement of R as a data visualization technique came when Auckland's Hadley Wickham developed the highly famous ggplot2 package based on The Grammar of Graphics, introducing into the R world an organic framework for data visualization tasks: This package alone introduced the R community to a highly flexible way of producing and visualizing almost every kind of data visualization, having also been designed as an expandable tool, in order to add the possibility of incorporating new data visualization techniques as soon as they emerge. Finally, ggplot2 gives you the ability to highly customize your plot, adding every kind of graphical or textual annotation to it. Nowadays, R is being used by the biggest tech companies, such as Facebook and Google, and by widely circulated publications such as the Economist and the New York Times to visualize their data and convey their information to their stakeholders and readers. To sum all this up—should you invest your precious time learning R? If you are a professional or a student who could gain advantages from knowing effective and cutting-edge techniques to manipulate, model, and present data, I can only give you a positive opinion: yes. You should definitely learn R, and consider it a long-term investment, since the points of strength we have seen place it in a great position to further expand its influence in the coming years in every industry and academic field. Engaging with the community to learn R Now that we are aware of R’s popularity we need to engage with the community to take advantage of it. We will look at alternative and non-exclusive ways of engaging with the community: Employing community-driven learning material Asking for help from the community Staying ahead of language developments Employing community-driven learning material: There are two main kinds of R learning materials developed by the community: Papers, manuals, and books Online interactive courses Papers, manuals, and books: The first one is for sure the more traditional one, but you shouldn't neglect it, since those kinds of learning materials are always able to give you a more organic and systematic understanding of the topics they treat. You can find a lot of free material online in the form of papers, manuals, and books. Let me point out to you the more useful ones: Advanced R R for Data Science Introduction to Statistical Learning OpenIntro Statistics The R Journal Online interactive courses: This is probably the most common learning material nowadays. You can find different platforms delivering good content on the R language, the most famous of which are probably DataCamp, Udemy, and Packt itself. What all of them share is a practical and interactive approach that lets you learn the topic directly, applying it through exercises rather than passively looking at someone explaining theoretical stuff. Asking for help from the community: As soon as you start writing your first lines of R code, and perhaps before you even actually start writing it, you will come up with some questions related to your work. The best thing you can do when this happens is to resort to the community to solve those questions. You will probably not be the first one to come up with that question, and you should therefore first of all look online for previous answers to your question. Where should you look for answers? You can look everywhere, but most of the time you will find the answer you are looking for on one of the following (listed by the probability of finding the answer there): Stack Overflow R-help mailing list R packages documentation I wouldn't suggest you look for answers on Twitter, G+, and similar networks, since they were not conceived to handle these kinds of processes and you will expose yourself to the peril of reading answers that are out of date, or simply incorrect, because no review system is considered. If it is the case that you are asking an innovative question never previously asked by anyone, first of all, congratulations! That said, in that happy circumstance, you can ask your question in the same places that you previously looked for answers. Staying ahead of language developments: The R language landscape is constantly changing, thanks to the contributions of many enthusiastic users who take it a step further every day. How can you stay ahead of those changes? This is where social networks come in handy. Following the #rstats hashtag on Twitter, Google+ groups, and similar places, will give you the pulse of the language. Moreover, you will find the R-bloggers aggregator, which delivers a daily newsletter comprised of the R-related blog posts that were published the previous day really useful. Finally, annual R conferences and similar occasions constitute a great opportunity to get in touch with the most notorious R experts, gaining from them useful insights and inspiring speeches about the future of the language. To summarize, we looked why to choose R as your programming language for data mining and how we can engage with the R community. If you think this post is useful, you may further check out this book R Data Mining, to leverage data mining techniques across many different industries, including finance, medicine, scientific research, and more.

0
0
6057

article-image-hands-tutorial-getting-started-amazon-simpledb

Packt

28 May 2010

5 min read

Hands-on Tutorial for Getting Started with Amazon SimpleDB

Packt

28 May 2010

5 min read

(For more resources on SimpleDB, see here.) Creating an AWS account In order to start using SimpleDB, you will first need to sign up for an account with AWS. Visit http://aws.amazon.com/ and click on Create an AWS Account. You can sign up either by using your e-mail address for an existing Amazon account, or by creating a completely new account. You may wish to have multiple accounts to separate billing for projects. This could make it easier for you to track billing for separate accounts. After a successful signup, navigate to the main AWS page— http://aws.amazon.com/, and click on the Your Account link at any time to view your account information and make any changes to it if needed. Enabling SimpleDB service for AWS account Once you have successfully set up an AWS account, you must follow these steps to enable the SimpleDB service for your account: Log in to your AWS account. Navigate to the SimpleDB home page—http://aws.amazon.com/simpledb/. Click on the Sign Up For Amazon SimpleDB button on the right side of the page. Provide the requested credit card information and complete the signup process. You have now successfully set up your AWS account and enabled it for SimpleDB. All communication with SimpleDB or any of the Amazon web services must be through either the SOAP interface or the Query/ReST interface. The request messages sent through either of these interfaces is digitally signed by the sending user in order to ensure that the messages have not been tampered within transit, and that they really originate from the sending user. Requests that use the Query/ReST interface will use the access keys for signing the request, whereas requests to the SOAP interface will use the x.509 certificates. Your new AWS account is associated with the following items: A unique 12-digit AWS account number for identifying your account. AWS Access Credentials are used for the purpose of authenticating requests made by you through the ReST Request API to any of the web services provided by AWS. An initial set of keys is automatically generated for you by default. You can regenerate the Secret Access Key at any time if you like. Keep in mind that when you generate a new access key, all requests made using the old key will be rejected. An Access Key ID identifies you as the person making requests to a web service. A Secret Access Key is used to calculate the digital signature when you make requests to the web service. Be careful with your Secret Access Key, as it provides full access to the account, including the ability to delete all of your data. All requests made to any of the web services provided by AWS using the SOAP protocol use the X.509 security certificate for authentication. There are no default certificates generated automatically for you by AWS. You must generate the certificate by clicking on the Create a new Certificate link, then download them to your computer and make them available to the machine that will be making requests to AWS. Public and private key for the x.509 certificate. You can either upload your own x.509 certificate if you already have one, or you can just generate a new certificate and then download it to your computer. Query API and authentication There are two interfaces to SimpleDB. The SOAP interface uses the SOAP protocol for the messages, while the ReST Requests uses HTTP requests with request parameters to describe the various SimpleDB methods and operations. In this book, we will be focusing on using the ReST Requests for talking to SimpleDB, as it is a much simpler protocol and utilizes straightforward HTTP-based requests and responses for communication, and the requests are sent to SimpleDB using either a HTTP GET or POST method. The ReST Requests need to be authenticated in order to establish that they are originating from a valid SimpleDB user, and also for accounting and billing purposes. This authentication is performed using your access key identifiers. Every request to SimpleDB must contain a request signature calculated by constructing a string based on the Query API and then calculating an RFC 2104-compliant HMAC-SHA1 hash, using the Secret Access Key. The basic steps in the authentication of a request by SimpleDB are: You construct a request to SimpleDB. You use your Secret Access Key to calculate the request signature, a Keyed-Hashing for Message Authentication code (HMAC) with an SHA1 hash function. You send the request data, the request signature, timestamp, and your Access Key ID to AWS. AWS uses the Access Key ID in the request to look up the associated Secret Access Key. AWS generates a request signature from the request data using the retrieved Secret Access Key and the same algorithm you used to calculate the signature in the request. If the signature generated by AWS matches the one you sent in the request, the request is considered to be authentic. If the signatures are different, the request is discarded, and AWS returns an error response. If the timestamp is older than 15 minutes, the request is rejected. The procedure for constructing your requests is simple, but tedious and time consuming. This overview was intended to make you familiar with the entire process, but don't worry—you will not need to go through this laborious process every single time that you interact with SimpleDB. Instead, we will be leveraging one of the available libraries for communicating with SimpleDB, which encapsulates a lot of the repetitive stuff for us and makes it simple to dive straight into playing with and exploring SimpleDB!

0
0
6003

Packt

22 Aug 2013

10 min read

What is Hazelcast?

Packt

22 Aug 2013

10 min read

(For more resources related to this topic, see here.) Starting out as usual In most modern software systems, data is the key. For more traditional architectures, the role of persisting and providing access to your system's data tends to fall to a relational database. Typically this is a monolithic beast, perhaps with a degree of replication, although this tends to be more for resilience rather than performance. For example, here is what a traditional architecture might look like (which hopefully looks rather familiar) This presents us with an issue in terms of application scalability, in that it is relatively easy to scale our application layer by throwing more hardware at it to increase the processing capacity. But the monolithic constraints of our data layer would only allow us to do this so far before diminishing returns or resource saturation stunted further performance increases; so what can we do to address this? In the past and in legacy architectures, the only solution would be to increase the performance capability of our database infrastructure, potentially by buying a bigger, faster server or by further tweaking and fettling the utilization of currently available resources. Both options are dramatic, either in terms of financial cost and/or manpower; so what else could we do? Data deciding to hang around In order for us to gain a bit more performance out of our existing setup, we can hold copies of our data away from the primary database and use these in preference wherever possible. There are a number of different strategies we could adopt, from transparent second-level caching layers to external key-value object storage. The detail and exact use of each varies significantly depending on the technology or its place in the architecture, but the main desire of these systems is to sit alongside the primary database infrastructure and attempt to protect it from an excessive load. This would then tend to lead to an increased performance of the primary database by reducing the overall dependency on it. However, this strategy tends to be only particularly valuable as a short-term solution, effectively buying us a little more time before the database once again starts to reach saturation. The other downside is that it only protects our database from read-based load; if our application is predominately write-heavy, this strategy has very little to offer. So our expanded architecture could look a bit like the following figure: Therein lies the problem However, in insulating the database from the read load, we have introduced a problem in the form of a cache consistency issue, in that, how does our local data cache deal with changing data underneath it within the primary database? The answer is rather depressing: it can't! The exact manifestation of any issues will largely depend on the data needs of the application and how frequently the data changes; but typically, caching systems will operate in one of the two following modes to combat the problem: Time bound cache: Holds entries for a defined period (time-to-live or TTL) Write through cache: Holds entries until they are invalidated by subsequent updates Time bound caches almost always have consistency issues, but at least the amount of time that the issue would be present is limited to the expiry time of each entry. However, we must consider the application's access to this data, because if the frequency of accessing a particular entry is less than the cache expiry time of it, the cache is providing no real benefit. Write through caches are consistent in isolation and can be configured to offer strict consistency, but if multiple write through caches exist within the overall architecture, then there will be consistency issues between them. We can avoid this by having a more intelligent cache, which features a communication mechanism between nodes, that can propagate entry invalidations to each other. In practice, an ideal cache would feature a combination of both features, so that entries would be held for a known maximum time, but also passes around invalidations as changes are made. So our evolved architecture would look a bit like the following figure: So far we've had a look through the general issues in scaling our data layer, and introduced strategies to help combat the trade-offs we will encounter along the way; however, the real world isn't quite as simple. There are various cache servers and in-memory database products in this area: however, most of these are stand-alone single instances, perhaps with some degree of distribution bolted on or provided by other supporting technologies. This tends to bring about the same issues we experienced with just our primary database, in that we could encounter resource saturation or capacity issues if the product is a single instance, or if the distribution doesn't provide consistency control, perhaps inconsistent data, which might harm our application. Breaking the mould Hazelcast is a radical new approach to data, designed from the ground up around distribution. It embraces a new scalable way of thinking; in that data should be shared around for both resilience and performance, while allowing us to configure the trade-offs surrounding consistency as the data requirements dictate. The first major feature to understand about Hazelcast is its master less nature; each node is configured to be functionally the same. The oldest node in the cluster is the de facto leader and manages the membership, automatically delegating as to which node is responsible for what data. In this way as new nodes join or dropout, the process is repeated and the cluster rebalances accordingly. This makes Hazelcast incredibly simple to get up and running, as the system is self-discovering, self-clustering, and works straight out of the box. However, the second feature to remember is that we are persisting data entirely in-memory; this makes it incredibly fast but this speed comes at a price. When a node is shutdown, all the dta that was held on it is lost. We combat this risk to resilience through replication, by holding enough copies of a piece of data across multiple nodes. In the event of failure, the overall cluster will not suffer any data loss. By default, the standard backup count is 1, so we can immediately enjoy basic resilience. But don't pull the plug on more than one node at a time, until the cluster has reacted to the change in membership and reestablished the appropriate number of backup copies of data. So when we introduce our new master less distributed cluster, we get something like the following figure: We previously identified that multi-node caches tend to suffer from either saturation or consistency issues. In the case of Hazelcast, each node is the owner of a number of partitions of the overall data, so the load will be fairly spread across the cluster. Hence, any saturation would be at the cluster level rather than any individual node. We can address this issue simply by adding more nodes. In terms of consistency, by default the backup copies of the data are internal to Hazelcast and not directly used, as such we enjoy strict consistency. This does mean that we have to interact with a specific node to retrieve or update a particular piece of data; however, exactly which node that is an internal operational detail and can vary over time — we as developers never actually need to know. If we imagine that our data is split into a number of partitions, that each partition slice is owned by one node and backed up on another, we could then visualize the interactions like the following figure: This means that for data belonging to Partition 1, our application will have to communicate to Node 1, Node 2 for data belonging to Partition 2, and so on. The slicing of the data into each partition is dynamic; so in practice, where there are more partitions than nodes, each node will own a number of different partitions and hold backups for others. As we have mentioned before, all of this is an internal operational detail, and our application does not need to know it, but it is important that we understand what is going on behind the scenes. Moving to new ground So far we have been talking mostly about simple persisted data and caches, but in reality, we should not think of Hazelcast as purely a cache, as it is much more powerful than just that. It is an in-memory data grid that supports a number of distributed collections and features. We can load in data from various sources into differing structures, send messages across the cluster; take out locks to guard against concurrent activity, and listen to the goings on inside the workings of the cluster. Most of these implementations correspond to a standard Java collection, or function in a manner comparable to other similar technologies, but all with the distribution and resilience capabilities already built in. Standard utility collections Map: Key-value pairs List: Collection of objects? Set: Non-duplicated collection Queue: Offer/poll FIFO collection Specialized collection Multi-Map: Key-list of values collection Lock: Cluster wide mutex Topic: Publish/subscribe messaging Concurrency utilities AtomicNumber: Cluster-wide atomic counter IdGenerator: Cluster-wide unique identifier generation Semaphore: Concurrency limitation CountdownLatch: Concurrent activity gate-keeping Listeners: Application notifications as things happen In addition to data storage collections, Hazelcast also features a distributed executor service allowing runnable tasks to be created that can be run anywhere on the cluster to obtain, manipulate, and store results. We could have a number of collections containing source data, then spin up a number of tasks to process the disparate data (for example, averaging or aggregating) and outputting the results into another collection for consumption. Again, just as we could scale up our data capacities by adding more nodes, we can also increase the execution capacity in exactly the same way. This essentially means that by building our data layer around Hazelcast, if our application needs rapidly increase, we can continuously increase the number of nodes to satisfy seemingly extensive demands, all without having to redesign or re-architect the actual application. With Hazelcast, we are dealing more with a technology than a server product, a library to build a system around rather than retrospectively bolting it on, or blindly connecting to an off-the-shelf commercial system. While it is possible (and in some simple cases quite practical) to run Hazelcast as a separate server-like cluster and connect to it remotely from our application; some of the greatest benefits come when we develop our own classes and tasks run within it and alongside it. With such a large range of generic capabilities, there is an entire world of problems that Hazelcast can help solve. We can use the technology in many ways; in isolation to hold data such as user sessions, run it alongside a more long-term persistent data store to increase capacity, or shift towards performing high performance and scalable operations on our data. By moving more and more responsibility away from monolithic systems to such a generic scalable one, there is no limit to the performance we can unlock. This will allow us to keep our application and data layers separate, but enabling the ability to scale them up independently as our application grows. This will avoid our application becoming a victim of its own success, while hopefully taking the world by storm. Summary In this article, we learned about Hazelcast. With such a large range of generic capabilities, Hazelcast can solve a world of problems. Resources for Article: Further resources on this subject: JBoss AS Perspective [Article] Drools Integration Modules: Spring Framework and Apache Camel [Article] JBoss RichFaces 3.3 Supplemental Installation [Article]

0
0
5980

Packt

01 Apr 2015

16 min read

Installing PostgreSQL

Packt

01 Apr 2015

16 min read

In this article by Hans-Jürgen Schönig, author of the book Troubleshooting PostgreSQL, we will cover what can go wrong during the installation process and what can be done to avoid those things from happening. At the end of the article, you should be able to avoid all of the pitfalls, traps, and dangers you might face during the setup process. (For more resources related to this topic, see here.) For this article, I have compiled some of the core problems that I have seen over the years, as follows: Deciding on a version during installation Memory and kernel issues Preventing problems by adding checksums to your database instance Wrong encodings and subsequent import errors Polluted template databases Killing the postmaster badly At the end of the article, you should be able to install PostgreSQL and protect yourself against the most common issues popping up immediately after installation. Deciding on a version number The first thing to work on when installing PostgreSQL is to decide on the version number. In general, a PostgreSQL version number consists of three digits. Here are some examples: 9.4.0, 9.4.1, or 9.4.2 9.3.4, 9.3.5, or 9.3.6 The last digit is the so-called minor release. When a new minor release is issued, it generally means that some bugs have been fixed (for example, some time zone changes, crashes, and so on). There will never be new features, missing functions, or changes of that sort in a minor release. The same applies to something truly important—the storage format. It won't change with a new minor release. These little facts have a wide range of consequences. As the binary format and the functionality are unchanged, you can simply upgrade your binaries, restart PostgreSQL, and enjoy your improved minor release. When the digit in the middle changes, things get a bit more complex. A changing middle digit is called a major release. It usually happens around once a year and provides you with significant new functionality. If this happens, we cannot just stop or start the database anymore to replace the binaries. If the first digit changes, something really important has happened. Examples of such important events were introductions of SQL (6.0), the Windows port (8.0), streaming replication (9.0), and so on. Technically, there is no difference between the first and the second digit—they mean the same thing to the end user. However, a migration process is needed. The question that now arises is this: if you have a choice, which version of PostgreSQL should you use? Well, in general, it is a good idea to take the latest stable release. In PostgreSQL, every version number following the design patterns I just outlined is a stable release. As of PostgreSQL 9.4, the PostgreSQL community provides fixes for versions as old as PostgreSQL 9.0. So, if you are running an older version of PostgreSQL, you can still enjoy bug fixes and so on. Methods of installing PostgreSQL Before digging into troubleshooting itself, the installation process will be outlined. The following choices are available: Installing binary packages Installing from source Installing from source is not too hard to do. However, this article will focus on installing binary packages only. Nowadays, most people (not including me) like to install PostgreSQL from binary packages because it is easier and faster. Basically, two types of binary packages are common these days: RPM (Red Hat-based) and DEB (Debian-based). Installing RPM packages Most Linux distributions include PostgreSQL. However, the shipped PostgreSQL version is somewhat ancient in many cases. Recently, I saw a Linux distribution that still featured PostgreSQL 8.4, a version already abandoned by the PostgreSQL community. Distributors tend to ship older versions to ensure that new bugs are not introduced into their systems. For high-performance production servers, outdated versions might not be the best idea, however. Clearly, for many people, it is not feasible to run long-outdated versions of PostgreSQL. Therefore, it makes sense to make use of repositories provided by the community. The Yum repository shows which distributions we can use RPMs for, at http://yum.postgresql.org/repopackages.php. Once you have found your distribution, the first thing is to install this repository information for Fedora 20 as it is shown in the next listing: yum install http://yum.postgresql.org/9.4/fedora/fedora-20-x86_64/pgdg-fedora94-9.4-1.noarch.rpm Once the repository has been added, we can install PostgreSQL: yum install postgresql94-server postgresql94-contrib /usr/pgsql-9.4/bin/postgresql94-setup initdb systemctl enable postgresql-9.4.service systemctl start postgresql-9.4.service First of all, PostgreSQL 9.4 is installed. Then a so-called database instance is created (initdb). Next, the service is enabled to make sure that it is always there after a reboot, and finally, the postgresql-9.4 service is started. The term database instance is an important concept. It basically describes an entire PostgreSQL environment (setup). A database instance is fired up when PostgreSQL is started. Databases are part of a database instance. Installing Debian packages Installing Debian packages is also not too hard. By the way, the process on Ubuntu as well as on some other similar distributions is the same as that on Debian, so you can directly use the knowledge gained from this article for other distributions. A simple file called /etc/apt/sources.list.d/pgdg.list can be created, and a line for the PostgreSQL repository (all the following steps can be done as root user or using sudo) can be added: deb http://apt.postgresql.org/pub/repos/apt/ YOUR_DEBIAN_VERSION_HERE-pgdg main So, in the case of Debian Wheezy, the following line would be useful: deb http://apt.postgresql.org/pub/repos/apt/ wheezy-pgdg main Once we have added the repository, we can import the signing key: $# wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - OK Voilà! Things are mostly done. In the next step, the repository information can be updated: apt-get update Once this has been done successfully, it is time to install PostgreSQL: apt-get install "postgresql-9.4" If no error is issued by the operating system, it means you have successfully installed PostgreSQL. The beauty here is that PostgreSQL will fire up automatically after a restart. A simple database instance has also been created for you. If everything has worked as expected, you can give it a try and log in to the database: root@chantal:~# su - postgres $ psql postgres psql (9.4.1) Type "help" for help. postgres=# Memory and kernel issues After this brief introduction to installing PostgreSQL, it is time to focus on some of the most common problems. Fixing memory issues Some of the most important issues are related to the kernel and memory. Up to version 9.2, PostgreSQL was using the classical system V shared memory to cache data, store locks, and so on. Since PostgreSQL 9.3, things have changed, solving most issues people had been facing during installation. However, in PostgreSQL 9.2 or before, you might have faced the following error message: FATAL: Could not create shared memory segment DETAIL: Failed system call was shmget (key=5432001, size=1122263040, 03600) HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter. You can either reduce the request size or reconfigure the kernel with larger SHMMAX. To reduce the request size (currently 1122263040 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections. If the request size is already small, it's possible that it is less than your kernel's SHMMIN parameter, in which case raising the request size or reconfiguring SHMMIN is called for. The PostgreSQL documentation contains more information about shared memory configuration. If you are facing a message like this, it means that the kernel does not provide you with enough shared memory to satisfy your needs. Where does this need for shared memory come from? Back in the old days, PostgreSQL stored a lot of stuff, such as the I/O cache (shared_buffers, locks, autovacuum-related information and a lot more), in the shared memory. Traditionally, most Linux distributions have had a tight grip on the memory, and they don't issue large shared memory segments; for example, Red Hat has long limited the maximum amount of shared memory available to applications to 32 MB. For most applications, this is not enough to run PostgreSQL in a useful way—especially not if performance does matter (and it usually does). To fix this problem, you have to adjust kernel parameters. Managing Kernel Resources of the PostgreSQL Administrator's Guide will tell you exactly why we have to adjust kernel parameters. For more information, check out the PostgreSQL documentation at http://www.postgresql.org/docs/9.4/static/kernel-resources.htm. This article describes all the kernel parameters that are relevant to PostgreSQL. Note that every operating system needs slightly different values here (for open files, semaphores, and so on). Adjusting kernel parameters for Linux In this article, parameters relevant to Linux will be covered. If shmget (previously mentioned) fails, two parameters must be changed: $ sysctl -w kernel.shmmax=17179869184 $ sysctl -w kernel.shmall=4194304 In this example, shmmax and shmall have been adjusted to 16 GB. Note that shmmax is in bytes while shmall is in 4k blocks. The kernel will now provide you with a great deal of shared memory. Also, there is more; to handle concurrency, PostgreSQL needs something called semaphores. These semaphores are also provided by the operating system. The following kernel variables are available: SEMMNI: This is the maximum number of semaphore identifiers. It should be at least ceil((max_connections + autovacuum_max_workers + 4) / 16). SEMMNS: This is the maximum number of system-wide semaphores. It should be at least ceil((max_connections + autovacuum_max_workers + 4) / 16) * 17, and it should have room for other applications in addition to this. SEMMSL: This is the maximum number of semaphores per set. It should be at least 17. SEMMAP: This is the number of entries in the semaphore map. SEMVMX: This is the maximum value of the semaphore. It should be at least 1000. Don't change these variables unless you really have to. Changes can be made with sysctl, as was shown for the shared memory. Adjusting kernel parameters for Mac OS X If you happen to run Mac OS X and plan to run a large system, there are also some kernel parameters that need changes. Again, /etc/sysctl.conf has to be changed. Here is an example: kern.sysv.shmmax=4194304 kern.sysv.shmmin=1 kern.sysv.shmmni=32 kern.sysv.shmseg=8 kern.sysv.shmall=1024 Mac OS X is somewhat nasty to configure. The reason is that you have to set all five parameters to make this work. Otherwise, your changes will be silently ignored, and this can be really painful. In addition to that, it has to be assured that SHMMAX is an exact multiple of 4096. If it is not, trouble is near. If you want to change these parameters on the fly, recent versions of OS X provide a systcl command just like Linux. Here is how it works: sysctl -w kern.sysv.shmmax sysctl -w kern.sysv.shmmin sysctl -w kern.sysv.shmmni sysctl -w kern.sysv.shmseg sysctl -w kern.sysv.shmall Fixing other kernel-related limitations If you are planning to run a large-scale system, it can also be beneficial to raise the maximum number of open files allowed. To do that, /etc/security/limits.conf can be adapted, as shown in the next example: postgres hard nofile 1024 postgres soft nofile 1024 This example says that the postgres user can have up to 1,024 open files per session. Note that this is only important for large systems; open files won't hurt an average setup. Adding checksums to a database instance When PostgreSQL is installed, a so-called database instance is created. This step is performed by a program called initdb, which is a part of every PostgreSQL setup. Most binary packages will do this for you and you don't have to do this by hand. Why should you care then? If you happen to run a highly critical system, it could be worthwhile to add checksums to the database instance. What is the purpose of checksums? In many cases, it is assumed that crashes happen instantly—something blows up and a system fails. This is not always the case. In many scenarios, the problem starts silently. RAM may start to break, or the filesystem may start to develop slight corruption. When the problem surfaces, it may be too late. Checksums have been implemented to fight this very problem. Whenever a piece of data is written or read, the checksum is checked. If this is done, a problem can be detected as it develops. How can those checksums be enabled? All you have to do is to add -k to initdb (just change your init scripts to enable this during instance creation). Don't worry! The performance penalty of this feature can hardly be measured, so it is safe and fast to enable its functionality. Keep in mind that this feature can really help prevent problems at fairly low costs (especially when your I/O system is lousy). Preventing encoding-related issues Encoding-related problems are some of the most frequent problems that occur when people start with a fresh PostgreSQL setup. In PostgreSQL, every database in your instance has a specific encoding. One database might be en_US@UTF-8, while some other database might have been created as de_AT@UTF-8 (which denotes German as it is used in Austria). To figure out which encodings your database system is using, try to run psql -l from your Unix shell. What you will get is a list of all databases in the instance that include those encodings. So where can we actually expect trouble? Once a database has been created, many people would want to load data into the system. Let's assume that you are loading data into the aUTF-8 database. However, the data you are loading contains some ASCII characters such as ä, ö, and so on. The ASCII code for ö is 148. Binary 148 is not a valid Unicode character. In Unicode, U+00F6 is needed. Boom! Your import will fail and PostgreSQL will error out. If you are planning to load data into a new database, ensure that the encoding or character set of the data is the same as that of the database. Otherwise, you may face ugly surprises. To create a database using the correct locale, check out the syntax of CREATE DATABASE: test=# h CREATE DATABASE Command: CREATE DATABASE Description: create a new database Syntax: CREATE DATABASE name [ [ WITH ] [ OWNER [=] user_name ] [ TEMPLATE [=] template ] [ ENCODING [=] encoding ] [ LC_COLLATE [=] lc_collate ] [ LC_CTYPE [=] lc_ctype ] [ TABLESPACE [=] tablespace_name ] [ CONNECTION LIMIT [=] connlimit ] ] ENCODING and the LC* settings are used here to define the proper encoding for your new database. Avoiding template pollution It is somewhat important to understand what happens during the creation of a new database in your system. The most important point is that CREATE DATABASE (unless told otherwise) clones the template1 database, which is available in all PostgreSQL setups. This cloning has some important implications. If you have loaded a very large amount of data into template1, all of that will be copied every time you create a new database. In many cases, this is not really desirable but happens by mistake. People new to PostgreSQL sometimes put data into template1 because they don't know where else to place new tables and so on. The consequences can be disastrous. However, you can also use this common pitfall to your advantage. You can place the functions you want in all your databases in template1 (maybe for monitoring or whatever benefits). Killing the postmaster After PostgreSQL has been installed and started, many people wonder how to stop it. The most simplistic way is, of course, to use your service postgresql stop or /etc/init.d/postgresql stop init scripts. However, some administrators tend to be a bit crueler and use kill -9 to terminate PostgreSQL. In general, this is not really beneficial because it will cause some nasty side effects. Why is this so? The PostgreSQL architecture works like this: when you start PostgreSQL you are starting a process called postmaster. Whenever a new connection comes in, this postmaster forks and creates a so-called backend process (BE). This process is in charge of handling exactly one connection. In a working system, you might see hundreds of processes serving hundreds of users. The important thing here is that all of those processes are synchronized through some common chunk of memory (traditionally, shared memory, and in the more recent versions, mapped memory), and all of them have access to this chunk. What might happen if a database connection or any other process in the PostgreSQL infrastructure is killed with kill -9? A process modifying this common chunk of memory might die while making a change. The process killed cannot defend itself against the onslaught, so who can guarantee that the shared memory is not corrupted due to the interruption? This is exactly when the postmaster steps in. It ensures that one of these backend processes has died unexpectedly. To prevent the potential corruption from spreading, it kills every other database connection, goes into recovery mode, and fixes the database instance. Then new database connections are allowed again. While this makes a lot of sense, it can be quite disturbing to those users who are connected to the database system. Therefore, it is highly recommended not to use kill -9. A normal kill will be fine. Keep in mind that a kill -9 cannot corrupt your database instance, which will always start up again. However, it is pretty nasty to kick everybody out of the system just because of one process! Summary In this article we have learned how to install PostgreSQL using binary packages. Some of the most common problems and pitfalls, including encoding-related issues, checksums, and versioning were discussed. Resources for Article: Further resources on this subject: Getting Started with PostgreSQL [article] PostgreSQL Cookbook - High Availability and Replication [article] PostgreSQL – New Features [article]

0
0
5976

article-image-ease-chaos-automated-patching

Packt

02 Apr 2013

19 min read

Ease the Chaos with Automated Patching

Packt

02 Apr 2013

19 min read

(For more resources related to this topic, see here.) We have seen how the provisioning capabilities of the Oracle Enterprise Manager's Database Lifecycle Management (DBLM) Pack enable you to deploy fully patched Oracle Database homes and databases, as replicas of the gold copy in the Software Library of Enterprise Manager. However, nothing placed in production should be treated as static. Software changes in development cycles, enhancements take place, or security/functional issues are found. For almost anything in the IT world, new patches are bound to be released. These will also need to be applied to production, testing, reporting, staging, and development environments in the data center on an ongoing basis. For the database side of things, Oracle releases quarterly a combination of security fixes known as the Critical Patch Update (CPU). Other patches are bundled together and released every quarter in the form of a Patch Set Update (PSU), and this also includes the CPU for that quarter. Oracle strongly recommends applying either the PSU or the CPU every calendar quarter. If you prefer to apply the CPU, continue doing so. If you wish to move to the PSU, you can do so, but in that case continue only with the PSU. The quarterly patching requirement, as a direct recommendation from Oracle, is followed by many companies that prefer to have their databases secured with the latest security fixes. This underscores the importance of patching. However, if there are hundreds of development, testing, staging, and production databases in the data center to be patched, the situation quickly turns into a major manual exercise every three months. DBAs and their managers start planning for the patch exercise in advance, and a lot of resources are allocated to make it happen—with the administrators working on each database serially, at times overnight and at times over the weekend. There are a number of steps involved in patching each database, such as locating the appropriate patch in My Oracle Support (MOS), downloading the patch, transferring it to each of the target servers, upgrading the OPATCH facility in each Oracle home, shutting down the databases and listeners running from that home, applying the patch, starting each of the databases in restricted mode, applying any supplied SQL scripts, restarting the databases in normal mode, and checking the patch inventory. These steps have to be manually repeated on every database home on every server, and on every database in that home. Dull repetition of these steps in patching the hundreds of servers in a data center is a very monotonous task, and it can lead to an increase in human errors. To avoid these issues inherent in manual patching, some companies decide not to apply the quarterly patches on their databases. They wait for a year, or a couple of years before they consider patching, and some even prefer to apply year-old patches instead of the latest patches. This is counter-productive and leads to their databases being insecure and vulnerable to attacks, since the latest recommended CPUs from Oracle have not been applied. What then is the solution, to convince these companies to apply patches regularly? If the patching process can be mostly automated (but still under the control of the DBAs), it would reduce the quarterly patching effort to a great extent. Companies would then have the confidence that their existing team of DBAs would be able to manage the patching of hundreds of databases in a controlled and automated manner, keeping human error to a minimum. The Database Lifecycle Management Pack of Enterprise Manager Cloud Control 12c is able to achieve this by using its Patch Automation capability. We will now look into Patch Automation and the close integration of Enterprise Manager with My Oracle Support. Recommended patches By navigating to Enterprise | Summary, a Patch Recommendations section will be visible in the lower left-hand corner, as shown in the following screenshot: The graph displays either the Classification output of the recommended patches, or the Target Type output. Currently for this system, more than five security patches are recommended as can be seen in this graph. This recommendation has been derived via a connection to My Oracle Support (the OMS can be connected either directly to the Internet, or by using a proxy server). Target configuration information is collected by the Enterprise Manager Agent and is stored in the Configuration Management Database (CMDB) within the repository. This configuration information is collated regularly by the Enterprise Manager's Harvester process and pushed to My Oracle Support. Thus, configuration information about your targets is known to My Oracle Support, and it is able to recommend appropriate patches as and when they are released. However, the recommended patch engine also runs within Enterprise Manager 12c at your site, working off the configuration data in the CMDB in Enterprise Manager, so recommendations can in fact be achieved without the configuration having been uploaded on MOS by the Harvester (this upload is more useful now for other purposes, such as attaching configuration details during SR creation). It is also possible to get metadata about the latest available patches from My Oracle Support in offline mode, but more manual steps are involved in this case, so Internet connectivity is recommended to get the full benefits of Enterprise Manager's integration with My Oracle Support. To view the details about the patches, click on the All Recommendations link or on the graph itself. This connects to My Oracle Support (you may be asked to log in to your company-specific MOS account) and brings up the list of the patches in the Patch Recommendations section. The database (and other types of) targets managed by the Enterprise Manager system are displayed on the screen, along with the recommended CPU (or other) patches. We select the CPU July patch for our saiprod database. This displays the details about the patch in the section in the lower part of the screen. We can see the list Bugs Resolved by This Patch, the Last Updated date and Size of the patch and also Read Me—which has important information about the patch. The number of total downloads for this patch is visible, as is the Community Discussion on this patch in the Oracle forums. You can add your own comment for this patch, if required, by selecting Reply to the Discussion. Thus, at a glance, you can find out how popular the patch is (number of downloads) and any experience of other Oracle DBAs regarding this patch—whether positive or negative. Patch plan You can view the information about the patch by clicking on the Full Screen button. You can download the patch either to the Software Library in Enterprise Manager or to your desktop. Finally, you can directly add this patch to a new or existing patch plan, which we will do next. Go to Add to Plan | Add to New, and enter Plan Name as Sainath_patchplan. Then click on Create Plan. If you would like to add multiple patches to the plan, select both the patches first and then add to the plan. (You can also add patches later to the plan). After the plan is created, click on View Plan. This brings up the following screen: A patch plan is nothing but a collection of patches that can be applied as a group to one or more targets. On the Create Plan page that appears, there are five steps that can be seen in the left-hand pane. By default, the second step appears first. In this step, you can see all the patches that have been added to the plan. It is possible to include more patches by clicking on the Add Patch... button. Besides the ability to manually add a patch to this list, the analysis process may also result in additional patches being added to the plan. If you click on the first step, Plan Information, you can put in a description for this plan. You can also change the plan permissions, either Full or View, for various Enterprise Manager roles. Note that the Full permission allows the role to validate the plan, however, the View permission does not allow validation. Move to step 3, Deployment Options. The following screen appears. Out-of-place patching A new mechanism for patching has been provided in the Enterprise Manager Cloud Control 12c version, known as out-of-place patching. This is now the recommended method and creates a new Oracle home which is then patched while the previous home is still operational. All this is done using an out of the box deployment procedure in Enterprise Manager. Using this mechanism means that the only downtime will take place when the databases from the previous home are switched to run from the new home. If there is any issue with the database patch, you can switch back to the previous unpatched home since it is still available. So, patch rollback is a lot faster. Also, if there are multiple databases running in the previous home, you can decide which ones to switch to the new patched home. This is obviously an advantage, otherwise you would be forced to simultaneously patch all the databases in a home. A disadvantage of this method would be the space requirements for a duplicate home. Also, if proper housekeeping is not carried out later on, it can lead to a proliferation of Oracle homes on a server where patches are being applied regularly using this mechanism. This kind of selective patching and minimal downtime is not possible if you use the previously available method of in-place patching, which uses a separate deployment procedure to shut down all databases running from an Oracle home before applying the patches on the same home. The databases can only be restarted normally after the patching process is over, and this obviously takes more downtime and affects all databases in a home. Depending on the method you choose, the appropriate deployment procedure will be automatically selected and used. We will now use the out-of-place method in this patch plan. On the Step 3: Deployment Options page, make sure the Out of Place (Recommended) option is selected. Then click on Create New Location. Type in the name and location of the new Oracle home, and click on the Validate button. This checks the Oracle home path on the Target server. After this is done, click on the Create button. The deployment options of the patch plan are successfully updated, and the new home appears on the Step 3 page. Click on the Credentials tab. Here you need to select or enter the normal and privileged credentials for the Oracle home. Click on the Next button. This moves us to step 4, the Validation step. Pre-patching analysis Click on the Analyze button. A job to perform prepatching analysis is started in the background. This will compare the installed software and patches on the targets with the new patches you have selected in your plan, and attempt to validate them. This validation may take a few minutes to complete, since it also checks the Oracle home for readiness, computes the space requirements for the home, and conducts other checks such as cluster node connectivity (if you are patching a RAC database). If you drill down to the analysis job itself by clicking on Show Detailed Progress here, you can see that it does a number of checks to validate if the targets are supported for patching, verifies the normal and super user credentials of the Oracle home, verifies the target tools, commands, and permissions, upgrades OPATCH to the latest version, stages the selected patches to Oracle homes, and then runs the prerequisite checks including those for cloning an Oracle home. If the prerequisite checks succeed, the analysis job skips the remaining steps and stops at this point with a successful status. The patch is seen as Ready for Deployment. If there are any issues, they will show up at this point. For example, if there is a conflict with any of the patches, a replacement patch or a merge patch may be suggested. If there is no replacement or merge patch and you want to request such a patch, it will allow you to make the request directly from the screen. If you are applying a PSU and the CPU for that same release is already applied to the Oracle home, for example, July 2011 CPU, then because the PSU is a superset of the CPU, the MOS analysis will stop and mention that the existing patch fixes the issues. Such a message can be seen in the Informational Messages section of the Validation page. Deployment In our case, the patch is Ready for Deployment. At this point, you can move directly to step 5, Review & Deploy, by clicking on it in the left-hand side pane. On the Review & Deploy page, the patch plan is described in detail along with Impacted Targets. Along with the database that is in the patch plan, a new impacted target has been found by the analysis process and added to the list of impacted targets. This is the listener that is running from the home that is to be cloned and patched. The patches that are to be applied are also listed on this review page, in our case the CPUJUL2011 patch is shown with the status Conflict Free. The deployment procedure that will be used is Clone and Patch Oracle Database, since out-of-place patching is being used, and all instances and listeners running in the previous Oracle home are being switched to the new home. Click on the Prepare button. The status on the screen changes to Preparation in Progress. A job for preparation of the out-of-place patching starts, including cloning of the original Oracle home and applying the patches to the cloned home. No downtime is required while this job is running; it can happen in the background. This preparation phase is like a pre-deploy and is only possible in the case of out-of-place patching, whereas in the case of in-place patching, there is no Prepare button and you deploy straightaway. Clicking on Show Detailed Progress here opens a new window showing the job details. When the preparation job has successfully completed (after about two hours in our virtual machine), we can see that it performs the cloning of the Oracle home, applies the patches on the new home, validates the patches, runs the post patch scripts, and then skips all the remaining steps. It also collects target properties for the Oracle home in order to refresh the configurations in Enterprise Manager. The Review & Deploy page now shows Preparation Successful!. The plan is now ready to be deployed. Click on the Deploy button. The status on the screen changes to Deployment in Progress. A job for deployment of the out-of-place patching starts. At this time, downtime will be required since the database instances using the previous Oracle home will be shut down and switched across. The deploy job successfully completes (after about 21 minutes in our virtual machine); we can see that it works iteratively over the list of hosts and Oracle homes in the patch plan. It starts a blackout for the database instances in the Oracle home (so that no alerts are raised), stops the instances, migrates them to the cloned Oracle home, starts them in upgrade mode, applies SQL scripts to patch the instance, applies post-SQL scripts, and then restarts the database in normal mode. The deploy job applies other SQL scripts and recompiles invalid objects (except in the case of patch sets). It then migrates the listener from the previous Oracle home using the Network Configuration Assistant (NetCA), updates the Target properties, stops the blackout, and detaches the previous Oracle home. Finally, the configuration information of the cloned Oracle home is refreshed. The Review & Deploy page of the patch plan now shows the status of Deployment Successful!, as can be seen in the following screenshot: Plan template On the Deployment Successful page, it is possible to click on Save as Template at the bottom of the screen in order to save a patch plan as a plan template. The patch plan should be successfully analyzed and deployable, or successfully deployed, before it can be saved as a template. The plan template, when thus created, will not have any targets included, and such a template can then be used to apply the successful patch plan to multiple other targets. Inside the plan template, the Create Plan button is used to create a new plan based on this template, and this can be done repeatedly for multiple targets. Go to Enterprise | Provisioning and Patching | Patches & Updates; this screen displays a list of all the patch plans and plan templates that have been created. The successfully deployed Sainath_patchplan and the new patch plan template also shows up here. To see a list of the saved patches in the Software Library, go to Enterprise | Provisioning and Patching | Saved Patches. This brings up the following screen: This page also allows you to manually upload patches to the Software Library. This scenario is mostly used when there is no connection to the Internet (either direct or via a proxy server) from the Enterprise Manager OMS servers, and consequently you need to download the patches manually. For more details on setting up the offline mode and downloading the patch recommendations and latest patch information in the form of XML files from My Oracle Support, please refer to Oracle Enterprise Manager Lifecycle Management Administrator's Guide 12c Release 2 (12.1.0.2) at the following URL: http://docs.oracle.com/cd/E24628_01/em.121/e27046/pat_mosem_new. htm#BABBIEAI Patching roles The new version of Enterprise Manager Cloud Control 12c supplies out of the box administrator roles specifically for patching. These roles are EM_PATCH_ ADMINISTRATOR, EM_PATCH_DESIGNER, and EM_PATCH_OPERATOR. You need to grant these roles to appropriate administrators. Move to Setup | Security | Roles. On this page, search for the roles specifically meant for patching. The three roles appear as follows: The EM_PATCH_ADMINISTRATOR role can create, edit, deploy, or delete any patch plan and can also grant privileges to other administrators after creating them. This role has full privileges on any patch plan or patch template in the Enterprise Manager system and maintains the patching infrastructure. The EM_PATCH_DESIGNER role normally identifies patches to be used in the patching cycle across development, testing, and production. This role would be the one of the senior DBA in real life. The patch designer creates patch plans and plan templates, and grants privileges for these plan templates to the EM_PATCH_ OPERATOR role. As an example, the patch designer will select a set of recommended and other manually selected patches for an Oracle 11g database and create a patch plan. This role will then test the patching process in a development environment, and save the successfully analyzed or deployed patch plan as a plan template. The patch designer will then publish the Oracle 11g database patching plan template to the patch operator—probably the junior DBA or application DBA in real life. Next, the patch operator creates new patch plans using the template (but cannot create a template), and adds a different list of targets, such as other Oracle 11g databases in the test, staging, or production environment. This role then schedules the deployment of the patches to all these environments—using the same template again and again. Summary: Enterprise Manager Cloud Control 12 c allows automation of the tedious patching procedure used in many organizations today, to patch their Oracle databases and servers. This is achieved via the Database Lifecycle Management Pack, which is one of the main licensable packs of Enterprise Manager. Sophisticated Deployment Procedures are provided out of the box to fulfill many different types of patching tasks, and this helps you to achieve mass patching of multiple targets with multiple patches in a fully automated manner, thus making tremendous savings in administrative time and effort. Some companies have estimated savings of up to 98 percent in patching tasks in their data centers. Different types of patches can be applied in this manner, including CPUs, PSUs, Patch sets and other one-off patches. Different versions of databases are supported, such as 9i, 10 g and 11 g. For the first time, the upgrade of single-instance databases is also possible via Enterprise Manager Cloud Control 12c. There is full integration of the patching capabilities of Enterprise Manager with My Oracle Support (MOS). The support site retains the configuration of all the components managed by Enterprise Manager inside the company. Since the current version and patch information of the components is known, My Oracle Support is able to provide appropriate patch recommendations for many targets, including the latest security fixes. This ensures that the company is up to date with regards to security protection. A full division of roles is available, such as Patch Administrator, Designer, and Operator. It is possible to take the My Oracle Support recommendations, select patches for targets, put them into a patch plan, deploy the patch plan and then create a plan template from it. The template can then be published to any operator who can then create their own patch plans for other targets. In this way patching can be tested, verified, and then pushed to production. In all, Enterprise Manager Cloud Control 12 c offers valuable automation methods for Mass Patching, allowing Administrators to ensure that their systems have the latest security patches, and enabling them to control the application of patches on development, test, and production servers from the centralized location of the Software Library. Resources for Article : Further resources on this subject: Author Podcast - Bob Griesemer on Oracle Warehouse Builder 11g [Article] Managing Oracle Business Intelligence [Article] Author Podcast - Ronald Rood discusses the birth of Oracle Scheduler [Article]

0
0
5969

How-To Tutorials - Data

Disaster Recovery in MySQL for Python

Understanding Model-based Clustering

Regular Expressions in Python 2.6 Text Processing

Make Things Pretty with ggplot2

Protecting Your Bitcoins

Linking Section Access to multiple dimensions

Segmenting images in OpenCV

Oracle GoldenGate 11g: Performance Tuning

Using Canvas and D3

Putting the Fun in Functional Python

Trending Topics

Why choose R for your data mining project

Hands-on Tutorial for Getting Started with Amazon SimpleDB

What is Hazelcast?

Installing PostgreSQL

Ease the Chaos with Automated Patching

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access