Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Data

1229 Articles
article-image-author-podcast-ronald-rood-discusses-birth-oracle-scheduler
Packt
05 Aug 2010
1 min read
Save for later

Author Podcast - Ronald Rood discusses the birth of Oracle Scheduler

Packt
05 Aug 2010
1 min read
You can listen to the podcast here, or hit play in the media player below.
Read more
  • 0
  • 0
  • 1174

article-image-packt-make-epub-downloads-available-its-website
Packt
20 Jul 2010
2 min read
Save for later

Packt to make ePub downloads available from its website

Packt
20 Jul 2010
2 min read
ePub (short for electronic publication) is a free and open eBook standard by the International Digital Publishing Forum (IDPF). ePub allows greater flexibility with content. With an ePub-formatted book, the display of text can be optimized for the reader's device.  Other ePub features include improved design and layout, Inline raster and vector images, better search functionality, while enabling DRM protection and embedded metadata. The company’s marketing manager Damian Carvill said “Packt listens to all feedback from customers and attempts to act on it accordingly. After receiving a number of requests, we made it a priority to convert all of our existing eBooks into the ePub format. All of our future books, and we hope to publish over 20 in July, will also be available in this popular eBook format.” Packt is inviting all customers, who have purchased an eBook, to download the ePub version of the book and enjoy its features and great flexibility. Alternatively, customers can sample the ePub format by accessing one of Packt’s free eBooks. All purchases of new eBooks from today will be made available to download as an ePub file, as well as the standard PDF. Packt ePub formatted eBooks are available from Tuesday 20th July 2010. To access your first Packt ePub eBook, either previously purchased or one of Packt’s free eBooks, please log into www.PacktPub.com and go to My account.  
Read more
  • 0
  • 0
  • 5812

article-image-use-templates-report-using-birt
Packt
19 Jul 2010
2 min read
Save for later

Use of Templates in Report using BIRT

Packt
19 Jul 2010
2 min read
Templates Templates are predesigned report layouts. Building a report from a template Templates, for the most part, take the leg work out of having to recreate the same layout over and over again.With templates, we could either use the existing, canned Listing Report template or build our own, saving your time for more complex tasks. It's as simple as creating a new report, using a template, and following the Cheat Sheets. In this example, we are going to use a Grouped Listing report template to create a simple Employee Sales Detail report, which we will use as a target for a drill down report. Create a new report called Employee Sales Details Report.rptDesign.Do not go through the wizard; stop at the template selection dialog. On the New Report – Template screen, select Grouped Listing report. When the report design opens, the Cheat Sheet is open on the right-hand side. It lays out a series of steps to create a basic report. As we already have the data source created in our library, go ahead and add the data source from the library, and click on the skip option for the data source hint. For the dataset, click on the Click to perform option. Name the dataset Get Employee Sales, and use the following query: select EMPLOYEES.EMPLOYEENUMBER, EMPLOYEES.LASTNAME || ', ' || EMPLOYEES.FIRSTNAME name, ORDERDETAILS.PRICEEACH sales, ORDERS.ORDERDATEfrom EMPLOYEES, CUSTOMERS, ORDERS, ORDERDETAILSwhere ORDERS.ORDERNUMBER = ORDERDETAILS.ORDERNUMBER and EMPLOYEES.EMPLOYEENUMBER = CUSTOMERS. SALESREPEMPLOYEENUMBER and ORDERS.CUSTOMERNUMBER = CUSTOMERS.CUSTOMERNUMBER and ORDERS.ORDERDATE between ? and ? Create two report parameters called startDate and endDate, and bind them to the data set parameters. For startDate, use the default value of 2005-01-01, and for endDate, use the default parameter of 2005-04-01. When we are back at the Cheat Sheet, we click on Click to Complete for the dataset. For the Edit Date Binding cheat sheet section, drag the fields over like next screenshot. Make the Group Header row appear with a silver background and the text appear in bold. When we drag EmployeeNumber over, it will automatically create the group on the EmployeeNumber express. Just be sure to drag it to the GroupHeader section as illustrated in the screenshot. Select the column with the Name and OrderDate fields, and select Suppress duplicates.
Read more
  • 0
  • 0
  • 3086

article-image-use-stylesheets-report-designing-using-birt
Packt
17 Jul 2010
3 min read
Save for later

Use of Stylesheets for Report Designing using BIRT

Packt
17 Jul 2010
3 min read
Stylesheets BIRT, being a web-based reporting environment, takes a page from general web development toolkits by importing stylesheets. However, BIRT stylesheets function slightly differently to regular stylesheets in a web development environment. We are going to add on to the Customer Orders report we have been working with, and will create some styles that will be used in this report. Open Customer Order.rptDesign. Right-click on the getCustomerInformation dataset and choose Insert into Layout. Modify the table visually to look like the next figure. Create a new dataset called getCustomerOrders using the following query: //insert code 1 Link the dataset parameter to rprmCustomerID. Save the dataset, right-click on it, and select Insert to layout. Select the first ORDERNUMBER column. Under the Property Editor, Select Advanced. In the Property Editor, go to the Suppress duplicates option, and change it to true. This will prevent the OrderNumber data item from repeating the value it displays down the page. In the Outline, right-click on Styles and choose New Style…. In the Pre-Defined Style drop down, choose table-header. A predefined style is an element that is already defined in the BIRT report. When selecting a predefined style, this will affect every element of that type within a report. In this case, for every table in the report, the table header will have this style applied. Under the Font section, apply the following settings: Font: Sans-Serif Font Color: White Size: Large Weight: Bold Under the Background section, set the Background Color to >b>Black. Click OK. Now, when we run the report, we can see that the header line is formatted with a black background and white font. Custom stylesheets In the example we just saw, we didn't have to apply this style to any element, it was automatically applied to the header of the order details table as it was using a predefined style. This would be the case for any table that had the header row populated with something and the same is the case for any of the predefined styles in BIRT. So next, let's look at a custom defined style and apply it to our customer information table. Right-click on the Styles section under the Outline tab and create a new style. Under the Custom Style textbox, enter CustomerHeaderInfo. Under the Font section, enter the following information: Font: Sans Serif Color: White Size: Large Weight: Bold Under the Background section, set the Background Color to Gray. Under the Box section, enter 1 points for all sections. Under the Border section, enter the following information: Style (All): Solid Color (All): White Width (All): Thin Click OK and then click Save. Select the table which contains the customer information. Select the first column. Under the Property Editor, in the list box for the Styles, select CustomerHeaderInfo. The preview report will look like the following screenshot: Right-click on the Styles section, and create a new custom style called CustomerHeaderData. Under Box, put in 1 points for all fields. Under Border, enter the following information: Style – Top: Solid Style – Bottom: Solid Color (All): Gray Click OK. Select the Customer Information table. Select the second column. Right-click on the column selector and select Style | Apply Style | CustomHeaderData. The finished report should look something like the next screenshot:
Read more
  • 0
  • 0
  • 6694

article-image-creating-themes-report-using-birt
Packt
17 Jul 2010
3 min read
Save for later

Creating Themes for a Report using BIRT

Packt
17 Jul 2010
3 min read
Creating themes Using the power of stylesheets and libraries, one has the ability to apply general formatting to an entire report design using themes and reuse these among different reports. Themes provide a simple mechanism to apply a wide range of styles to an entire report design without the need to manually apply them. The following example will move the styles that we have created in our library and will show how to apply them to our report using a theme. For each of the styles we have created, select them under the Outline tab and choose Export to Library…. Choose the ClassicCarsLibrary.rptLibrary file. All of the styles will reside under the defaultTheme themes section, so select this from the drop-down list that appears next to the Theme option. Repeat these steps for all styles we have created in Customer Orders.rptDesign. Delete all of the styles stored in the Customer Orders.rptDesign file. You will notice all the styles disappear from the report designer. In the Outline tab, under the Customer Orders.rptDesign file, select the root element titled Customer Orders.rptDesign. Right-click the element and select Refresh Library. The library should already be shared since we built the report using the library's data source and datasets. If it is not, open the Resource Explorer, choose ClassicCarsLibrary.rptLibrary, right-click and choose Use Library. Under the Property Editor, change the Themes drop down to ClassicCarsLibrarydefaultTheme. When we apply the theme, we will see the detail table header automatically apply the style for table-header. Apply the remaining custom styles to the two columns in the customer information section and the order detail row. Now, we know that we can create several different themes by grouping styles together in libraries. So, when developing reports, you can create several different looks that can be applied to reports, simply by applying themes to reports with the help of libraries. Using external CSS stylesheets Another stylesheet feature is the ability to use external stylesheets and simply link to them. This works out very well when report documents are embedded into existing web portals by using the portals stylesheets to keep a consistent look and feel. This creates a sense of uniformity in the overall site. Imagine that our graphics designer gives us a CSS file and asks us to design our reports around it. There are two ways one can use CSS files in BIRT: Importing CSS files Using CSS as a resource In the following examples we are going to illustrate both scenarios. I have a CSS file containing six styles—five styles that are for predefined elements in reports and one style that is a custom style. The following is the CSS stylesheet for the given report: .page { background-color: #FFFFFF; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; line-height: 24px; color: #336699;}.table-group-footer-1 { font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 12px; line-height: 24px; color: #333333; background-color: #FFFFCC;}.title { font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 24px; line-height: 40px; background-color: #99CC00; color: #003333; font-weight: bolder;}.table-header { font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 20px; background-color: #669900; color: #FFFF33;}.table-footer { font-family: Arial, Helvetica, sans-serif; font-size: 14px; font-weight: bold; line-height: 22px; color: #333333; background-color: #CCFF99;}
Read more
  • 0
  • 0
  • 5108

article-image-creating-reporting-site-using-birt-extension
Packt
16 Jul 2010
3 min read
Save for later

Creating a Reporting Site using BIRT-An Extension

Packt
16 Jul 2010
3 min read
Bug status The next report we are going to undertake is the report to show open issues compared to all bugs. What we are trying to build here is a graph that shows, by category, the bugs that are open in relation to the total number of bugs for that category. No other grouping is required. Under the graph, we also want to show a list of open bugs that we can click on and have them drill down to the detail for that bug so that we can see the history of changes to that bug. What this means is we need a bar graph showing all the status. Open BugzillaReportsTemplate.rptTemplate and save as bugStatusReport.rptDesign. Drag getAllBugs over to the Report Designer. Delete all columns except the bug_status field and short_desc field. In the table, group data by the components' name. With the new category created in the Table, insert a new column on the right. Delete the header labels. Move the data fields to look like the following screenshot: In the Outline tab, select the root element. Apply bugZillaTheme. In the group header row with the name, apply the DetailTableHeader style. In the Detail row, apply the DetailTableRow style. In the header row, select all the cells and merge them. In to the new merged cell, insert a chart. Select a Bar Chart and change the Output Format to PNG. Open the Select Data tab. Set the Inherit Data from Container drop-down list to Inherit Columns Only. Drag the bug_status field to the Optional Y Series Grouping slot. Drag the name field to the Category (X) Series slot. Click on the Edit group and sorting button. In the Group and sorting dialog, check the Enabled checkbox. Set the Type to Text. Set the Interval to 0. Set the Aggregate Expression to Count. As the Value (Y) Series, enter 1. Under the Format Chart tab, go to Title. Enter the title as Bug Status Report. Select the Axis option. Under X-Axis, check the Stagger checkbox. Click Finish. Resize the chart to fit the number of categories. The last thing we need to do is add the drill-through from the descriptions to the bug detail. Select the short_desc data item in the report designer. Under the Property Editor, select the Hyperlink tab. Click on the Edit... button next to Link To. From the Hyperlink dialog, select the Drill-through as Hyperlink type. Select BugzillaDetailReport.rptDesign as the target report. Set up the target report parameter bugID to be linked to row["bug_id"]. Click OK and save the report.
Read more
  • 0
  • 0
  • 2243
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at ₹800/month. Cancel anytime
Packt
12 Jul 2010
10 min read
Save for later

Understanding ShapeSheet™ in Microsoft Visio 2010

Packt
12 Jul 2010
10 min read
In this article by David J. Parker, author of Microsoft Visio 2010 Business Process Diagramming and Validation, we will discuss Microsoft Visio ShapeSheet™ and the key sections, rows, and cells, along with the functions available for writing ShapeSheet™ formulae, where relevant for structured diagrams. Microsoft Visio is a unique data diagramming system, and most of that uniqueness is due to the power of the ShapeSheet, which is a window on the Visio object model. It is the ShapeSheet that enables you to encapsulate complex behavior into apparently simple shapes by adding formulae to the cells using functions. The ShapeSheet was modeled on a spreadsheet, and formulae are entered in a similar manner to cells in an Excel worksheet. Validation rules are written as quasi-ShapeSheet formulae so you will need to understand how they are written. Validation rules can check the contents of ShapeSheet cells, in addition to verifying the structure of a diagram. Therefore, in this article you will learn about the structure of the ShapeSheet and how to write formulae. Where is the ShapeSheet? There is a ShapeSheet behind every single Document, Page, and Shape, and the easiest way to access the ShapeSheet window is to run Visio in Developer mode. This mode adds the Developer tab to the Fluent UI, which has a Show ShapeSheet button. The drop-down list on the button allows you to choose which ShapeSheet window to open. Alternatively, you can use the right-mouse menu of a shape or page, or on the relevant level within the Drawing Explorer window as shown in the following screenshot: The ShapeSheet window, opened by the Show ShapeSheet menu option, displays the requested sections, rows, and cells of the item selected when the window was opened. It does not automatically change to display the contents of any subsequently selected shape in the Visio drawing page—you must open the ShapeSheet window again to do that. The ShapeSheet Tools tab, which is displayed when the ShapeSheet window is active, has a Sections button on the View group to allow you to vary the requested sections on display. You can also open the View Sections dialog from the right-mouse menu within the ShapeSheet window. You cannot alter the display order of sections in the ShapeSheet window, but you can expand/collapse them by clicking the section header. The syntax for referencing the shape, page, and document objects in ShapeSheet formula is listed in the following table. Object ShapeSheet formula Comment Shape Sheet.n! Where n is the ID of the shape Can be omitted when referring to cells in the same shape. Page.PageSheet ThePage! Used in the ShapeSheet formula of shapes within the page.   Pages[page name]! Used in the ShapeSheet formula of shapes in other pages. Document.DocumentSheet TheDoc! Used in the ShapeSheet formula in pages or shapes of the document. What are sections, rows, and cells? There are a finite number of sections in a ShapeSheet, and some sections are mandatory for the type of element they are, whilst others are optional. For example, the Shape Transform section, which specifies the shape's size (that is, angle and position) exists for all types of shapes. However, the 1-D Endpoints section, which specifies the co-ordinates of either end of the line, is only relevant, and thus displayed for OneD shapes. Neither of these sections is optional, because they are required for the specific type of shape. Sections like User-defined Cells and Shape Data are optional and they may be added to the ShapeSheet if they do not exist already. If you press the Insert button on the ShapeSheet Tools tab, under the Sections group, then you can see a list of the sections that you may insert into the selected ShapeSheet. In the above example, User-defined Cells option is grayed out because this optional section already exists. It is possible for a shape to have multiple Geometry, Ellipse, or Infinite line sections. In fact, a shape can have a total of 139 of them. Reading a cell's properties If you select a cell in the ShapeSheet, then you will see the formula in the formula edit bar immediately below the ribbon. Move the mouse over the image to enlarge it. You can view the ShapeSheet Formulas (and I thought the plural was formulae!) or Values by clicking the relevant button in the View group on the ShapeSheet Tools ribbon. Notice that Visio provides IntelliSense when editing formulae. This is new in Visio 2010, and is a great help to all ShapeSheet developers. Also notice that the contents of some of the cells are shown in blue text, whilst others are black. This is because the blue text denotes that the values are stored locally with this shape instance, whilst the black text refers to values that are stored in the Master shape. Usually, the more black text you see, the more memory efficient the shape is, since less is needed to be stored with the shape instance. Of course, there are times when you cannot avoid storing values locally, such as the PinX and PinY values in the above screenshot, since these define where the shape instance is in the page. The following VBA code returns 0 (False): ActivePage.Shapes("Task").Cells("PinX").IsInherited But the following code returns -1 (True) : ActivePage.Shapes("Task").Cells("Width").IsInherited The Edit Formula button opens a dialog to enable you to edit multiple lines, since the edit formula bar only displays a single line, and some formulae can be quite large. You can display the Formula Tracing window using the Show Window button in the Formula Tracing group on the ShapeSheet Tools present in Design tab. You can decide whether to Trace Dependents, which displays other cells that have a formula that refers to the selected cell or Trace Precedents, which displays other cells that the formula in this cell refers to. Of course, this can be done in code too. For example, the following VBA code will print out the selected cell in a ShapeSheet into the Immediate Window: Public Sub DebugPrintCellProperties ()'Abort if ShapeSheet not selected in the Visio UI If Not Visio.ActiveWindow.Type = Visio.VisWinTypes.visSheet Then Exit Sub End IfDim cel As Visio.Cell Set cel = Visio.ActiveWindow.SelectedCell'Print out some of the cell properties Debug.Print "Section", cel.Section Debug.Print "Row", cel.Row Debug.Print "Column", cel.Column Debug.Print "Name", cel.Name Debug.Print "FormulaU", cel.FormulaU Debug.Print "ResultIU", cel.ResultIU Debug.Print "ResultStr("""")", cel.ResultStr("") Debug.Print "Dependents", UBound(cel.Dependents)'cel.Precedents may cause an errorOn Error Resume Next Debug.Print "Precedents", UBound(cel.Precedents) End Sub In the previous screenshot, where the Actions.SetDefaultSize.Action cell is selected in the Task shape from the BPMN Basic Shapes stencil, the DebugPrintCellProperties macro outputs the following: Section 240 Row 2 Column 3 Name Actions.SetDefaultSize.Action FormulaU SETF(GetRef(Width),User.DefaultWidth)+SETF(GetRef(Height),User.DefaultHeight) ResultIU 0 ResultStr("") 0.0000 Dependents 0 Precedents 4     Firstly, any cell can be referred to by either its name, or section/row/column indices, commonly referred to as SRC. Secondly, the FormulaU should produce a ResultIU of 0, if the formula is correctly formed and there is no numerical output from it. Thirdly, the Precedents and Dependents are actually an array of referenced cells. Can I print out the ShapeSheet settings? You can download and install the Microsoft Visio SDK from the Visio Developer Center (visit http://msdn.microsoft.com/en-us/office/aa905478.aspx). This will install an extra group, Visio SDK, on the Developer ribbon and one extra button Print ShapeSheet. I have chosen the Clipboard option and pasted the report into an Excel worksheet, as in the following screenshot: The output displays the cell name, value, and formula in each section, in an extremely verbose manner. This makes for many rows in the worksheet, and a varying number of columns in each section. What is a function? A function defines a discrete action, and most functions take a number of arguments as input. Some functions produce an output as a value in the cell that contains the formula, whilst others redirect the output to another cell, and some do not produce a useful output at all. The Developer ShapeSheet Reference in the Visio SDK contains a description of each of the 197 functions available in Visio 2010, and there are some more that are reserved for use by Visio itself. Formulae can be entered into any cell, but some cells will be updated by the Visio engine or by specific add-ons, thus overwriting any formula that may be within the cell. Formulae are entered starting with the = (equals) sign, just as in Excel cells, so that Visio can understand that a formula is being entered rather than just a text. Some cells have been primed to expect text (strings) and will automatically prefix what you type with =" (equals double-quote) and close with "(double-quote) if you do not start typing with an equal sign. For example, the function NOW(), returns the current date time value, which you can modify by applying a format, say, =FORMAT(NOW(),"dd//MM/YYYY"). In fact, the NOW() function will evaluate every minute unless you specify that it only updates at a specific event. You could, for example, cause the formula to be evaluated only when the shape is moved, by adding the DEPENDSON() function: =DEPENDSON(PinX,PinY)+NOW() The normal user will not see the result of any values unless there is something changing in the UI. This could be a value in the Shape Data that could cause linked Data Graphics to change. Or there could be something more subtle, such as the display of some geometry within the shape, like the Compensation symbol in the BPMN Task shape. In the above example, you can see that the Compensation right-mouse menu option is checked, and the IsForCompensation Shape Data value is TRUE. These values are linked, and the Task shape itself displays the two triangles at the bottom edge. The custom right-mouse menu options are defined in the Actions section of the shape's ShapeSheet, and one of the cells, Checked, holds a formula to determine if a tick should be displayed or not. In this case, the Actions.Compensation.Checked cell contains the following formula, which is merely a cell reference: =Prop.BpmnIsForCompensation Prop is the prefix used for all cells in the Shape Data section because this section used to be known as Custom Properties. The Prop.BpmnIsForCompensation row is defined as a Boolean (True/False) Type, so the returned value is going to be 1 or 0 (True or False). Thus, if you were to build a validation rule that required a Task to be for Compensation, then you would have to check this value. You will often need to branch expressions using the following: IF(logical_expression, value_if_true, value_if_false)
Read more
  • 0
  • 0
  • 12345

article-image-indexing-mysql-admin
Packt
29 Jun 2010
6 min read
Save for later

Indexing in MySQL Admin

Packt
29 Jun 2010
6 min read
Using prefix primary keys In this example we will add indexes to two tables that are almost identical. The only difference will be the definition of their primary keys. You will see the difference in space consumption for secondary indexes between a regular full column primary key and a prefix primary key. The sample table structure and data are designed to demonstrate the effect very evidently. In real-world scenarios the effect will most certainly be less severe. Getting ready... Connect to the database server with your administrative account. How to do it... Download the sample script for this article from the book's website and save it to your local disk. In the example below, we will assume it is stored in /tmp/idxsizeestimate_sample.sql. Create a new database and make it the default database:CREATE DATABASE pktests; USE pktests; Import the sample data from the downloaded file. When done, you will be presented with some statistics about the two tables loaded. Note that both tables have an Index Length of 0. SOURCE /tmp/idxsizeestimate_sample.sql; Now with the sample tables present, add an index to each of them: ALTER TABLE LongCharKey ADD INDEX IDX_PAY_10(Payload(10)); ALTER TABLE LongCharKey10 ADD INDEX IDX_PAY_10(Payload(10)); Display the data and index sizes of the tables now:SHOW TABLE STATUS LIKE 'LongCharKey%'; Add another index to each table to make the difference even more evident:ALTER TABLE LongCharKey ADD INDEX IDX2_PAY_10(Payload(10));ALTER TABLE LongCharKey10 ADD INDEX IDX2_PAY_10(Payload(10)); Display the data and index sizes of the tables again and compare with the previous values:SHOW TABLE STATUS LIKE 'LongCharKey%'; Name Rows Data Length Index Length Index/Data Ratio LongCharKey 50045 30392320 28868608 94.99% LongCharKey10 50045 29949952 3178496 10.61%   With the second index added, the difference in index length becomes even clearer.   How it works... Executing the downloaded script will set up two tables with the following structures: CREATE TABLE `LongCharKey` (`LongChar` char(255) NOT NULL,`Payload` char(255) DEFAULT NULL,PRIMARY KEY (`LongChar`)) ENGINE=InnoDB DEFAULT CHARSET=latin1;CREATE TABLE `LongCharKey10` (`LongChar` char(255) NOT NULL,`Payload` char(255) DEFAULT NULL,PRIMARY KEY (`LongChar`(10))) ENGINE=InnoDB DEFAULT CHARSET=latin1; The two tables are almost identical, except for the primary key definition. They are pre-filled with 50,000 records of sample data. The tables are populated with exactly the same 50,000 records of pseudo-random data. The Payload column is filled with sequences of 255 random letters each. The LongChar column is filled with a sequential number in the first 10 characters and then filled up to use all remaining 245 character with the same sort of random data.SELECT LEFT(LongChar,20), LEFT(Payload, 20) from LongCharKey LIMIT 5; LEFT(LongChar,20) LEFT(Payload, 20) 0000000000KEAFAYVEJD RHSKMEJITOVBPOVAGOGM 0000000001WSSGKGMIJR VARLGOYEONSLEJVTVYRP 0000000002RMNCFBJSTL OVWGTTSHEQHJHTHMFEXV 0000000003SAQVOQSINQ AHDYUXTAEWRSHCLJYSMW 0000000004ALHYUDSRBH DPLPXJVERYHUOYGGUFOS While the LongKeyChar table simply marks the whole LongChar column as a primary key with its entire 255 characters length, the LongCharKey10 table limits the primary key to the first 10 characters of that column. This is perfectly fine for this table, because the test data was crafted to be unique in this range. Neither one of the two tables has any secondary indexes defined. Looking at some relevant table data shows they are equally big (some columns left out for brevity): SHOW TABLE STATUS LIKE 'LongCharKey%'; Name Rows Data Length Index Length LongCharKey 50045 30392320 0 LongCharKey10 50045 29949952 0 With each index added, the Index Length for the first table will increase significantly, while for the second one its growth is much slower. In case of the LongCharKey table, each secondary index record will carry around with it a complete copy of the LongChar column because it is the primary key without limitation. Assuming a single byte character encoding, this means every secondary index record is blown up in size by 255 bytes on top of the 10 bytes needed for the actual index entry. This means a whole kilobyte is spent just for the primary key reference for every 4 records! In contrast to that, the primary key definition of the LongCharKey10 table only includes the leading 10 characters of the LongChar column, making the secondary index entry 245 bytes shorter and thereby explaining the much slower growth upon adding further indexes. Choosing InnoDB primary key columns In the article introduction we promised to shed some light on how to choose your InnoDB primary key columns sensibly. Be advised that choosing good primary key columns is not an exact science—there are multiple aspects that influence this decision. Depending on your needs and preconditions you will want to prioritize them differently from one table to the next. Consider the following as general advice rather than hard rules that must be obeyed unconditionally. Getting ready In order to make reasonable decisions on primary key columns, it is important to have a very clear understanding of what the data looks like in the table at hand. If you already have existing data that is to be stored in an InnoDB table—for example in MyISAM format—it can be helpful to compare it with the criteria below. If you are planning a new schema, you might have to guess about some characteristics of the future data. As is often the case, the quality of your choices is directly proportional to how good those guesses are. This recipe is less strict step-by-step instructions that must be followed from top to bottom and should be considered a list of properties a good primary key should have, even though you might decide some of them do not apply to your actual environment. As a rule of thumb, however, a column that fulfills all or most of the attributes described below is most probably a sensible choice for a primary key. See the How it works... section for details on the individual items. How to do it... Identify unique attributes: This is an absolute (technical) requirement for primary keys in general. Any data attribute that is not strictly guaranteed to be free of duplicates cannot be used alone as a primary key. Identify immutable attributes: While not absolutely necessary, a good primary key is never changed once it has been assigned. For all intents and purposes, you should avoid columns that have even a small chance of being changed for existing records Use reasonably short keys: This is the "softest" criterion of all. In general, longer keys have negative impacts on overall database performance—the longer the worse. Also, consider a prefix primary key. See Using prefix primary keys earlier in this article for more information. Prefer single-column keys: Even though nothing prevents you from choosing a composite primary key (a combination of columns that together form the uniqueness), this can easily become a hassle to work with, especially when handling foreign keys. Consider the clustered index nature of the primary key: As InnoDB's primary key is also clustered, you should take this special nature into account as well. It can speed up read access a lot, if you often have to query for key ranges, because disk seek times will be minimized.
Read more
  • 0
  • 0
  • 2254

article-image-oracle-environmental-variables-and-scripting
Packt
22 Jun 2010
10 min read
Save for later

Oracle: Environmental Variables and Scripting

Packt
22 Jun 2010
10 min read
(For more resources on Oracle, see here.) Unix scripting will be one of the most powerful tools in your arsenal and only with constant use will you become proficient. Your standards for scripting need to produce code that is robust, modular, consistent, and scalable. This article won't cover everything you need to know about scripting. There are many excellent lessons available from your operating system vendor. See the following documents for more information specific to how Oracle products behave in a Unix environment: Note: 131207.1 Subject: How to Set Unix Environment Variables Note: 1011994.6 Subject: How do Applications Act in Different Unix Shells. How to Integrate the Shell, SQL*Plus Scripts and PL/SQL in any Permutation? [ID 400195.1] (Excellent document) It is suggested that the reader should read the Oracle Database Concepts 11g Release 2 (11.2) of the documentation, the book every DBA should start with. Usually cron is used for scheduling in Unix and the AT command with Windows. For Oracle utilities that run outside the database (or which must only run when the database is down), Unix shell scripting is best used. A well-written script would parse /etc/oratab for a list of ORACLE_SIDs on a single server for tasks such as parsing log files, consistent backups, exports, dbverify, and RMAN. If you have a solitary database, then DBMS_SCHEDULER can be used with a combination of SQL and PL/SQL integrated with OS commands. Occasionally, DBAs rename the oraenv located in $ORACLE_HOME/bin when they have added custom code to the /usr/local/bin/oraenv (default location on Linux), so that they can make sure they know exactly which one is executed at run time. If you have any problems related to running a modified oraenv file, check which one is being executed and adjust the $PATH variable as needed. The following is a list of some of the Oracle-provided Unix commands with a quick synopsis of their purpose. Most of them are located in $ORACLE_HOME/bin: wrap—encrypts stored procedures for advanced use oerr—displays oracle errors. Syntax: oerr ora 12154 Sysresv—instance and shared memory segments Tkprof—formats output trace file into readable format Dbshut—shell script to shut down all instances dbstart—shell script to start up all instances at boot Dbhome—sets ORACLE_HOME Oraenv—sets environmental variables for ORACLE_SID trcasst—trace assistant Guidelines for scripting These are general suggestions for someone with some experience with a Unix operating system. You will need more background information to understand this section. The best sources of information will be the software vendor of your operating system, because there are small differences between the different versions and flavors of Unix. As with any code, you must test on a non-production system first, as inexperience may lead to unexpected results. Separating the configuration file Use the OS-provided default profile for environmental variables, but use a separate configuration file to set the $ORACLE_SID variable. There are several key environmental variables that are commonly used with Oracle software, all of which are found in the documentation specific to the operating system. Optimal Flexible Architecture (OFA) mentions setting the $ORACLE_SID in the profile, but if you have more than one database or Oracle product installed on a system, it is best done interactively. A default profile is loaded when you first log in to Unix. So if the $ORACLE_SID is loaded when you log on, what happens when you want to change ORACLE_SID(s)? This is when the environment becomes mixed. It just keeps appending the $PATH variable each time you execute the oraenv script. Which set of executables will you find first? It will be those executables for which you ran oraenv the first time. At this point I wanted to execute SQL*Plus out of the 11g directory, but was able to determine that the 10gR2 client was first in the $PATH. How did I know that? Use the which Unix command to find out. It is also a good practice to use a separate terminal window for each ORACLE_HOME. Normally, you will operate with multiple windows open, one for each ORACLE_HOME or ORACLE_SID in different colors to visually remind you which one is production. The example profile is provided in the code: example_profile.txt. The profile sets the entire user environment at first logon. This one is specific to the ksh or korn shell on the Linux operating system and will also work for bash. Differences in bash include that the line history is scrollable with the up and down arrows instead of vi commands. To set the ORACLE_SID and activate all of the variables located in profile, source the file oraenv (bash, Bourne, or korn shell) or coraenv (C shell). Source means the variables will be in effect for the entire session and not just the current line in the command window. You indicate source by putting a '.' (dot) in front of the file. As the oraenv file is located in /usr/local/bin (on Linux) and this location is in the $PATH, typing it at the command line works. Putting key Oracle files, such as oraenv, oratab, and oraInst.loc, in locations that ORACLE_HOMEwill not be affected by standard Oracle installations is also part of the OFA. The oraenv script is installed into /usr/local/ bin/ automatically when running .runInstaller for the first time. Notice the prompt that you will see if you use the command as in the profile listed above: A note about prompts: Every person will want to customize their own prompt so; look around for various versions that tickle your fancy. This one is better than most examples to compare to. Notice the small difference in the prompt before and after? ORACLE_SID is now defined because oraenv is executed, which also runs dbhome (also located in /usr/local/bin), but these scripts require a valid entry in /etc/oratab. If you type the ORACLE_SID incorrectly on Unix, this will be case sensitive. It will ask where the ORACLE_HOME is if there is no entry in oratab. Making an ORACLE_SID lowercase conforms to the OFA standard (see the install guide for your OS for more information). Some DBAs use an uppercase ORACLE_SID because it makes it more prominent for any type of naming convention and is meant to reduce human error. You can use an ORACLE_SID entry in the oratab file to reference other Oracle products such as the Grid Control Intelligent Agent. The ea, which is an ORACLE_SID in the following oratab example, is what I use to indicate the Enterprise Manager Agent. The third letter after the ORACLE_HOME (N) indicates when Oracle-supplied utilities (like db_shut and db_start) are not to execute against this ORACLE_HOME. I personally use the N for my own scripting to indicate which utilities shouldn't run against this ORACLE_HOME. What this does is take advantage of Oracle-provided files—oratab and oraenv—to accomplish other types of tasks. This is only a suggested use. There are other ways of setting environmental variables for non-database products. You will need to create a test database to work through all of the examples and practice scenarios. How should you create the test database? Use the Oracle-provided Database Configuration Assistant (DBCA) utility to create a test database. There are default templates provided that will work for most of the tasks outlined in this article. If you are interested in duplicating some of the advanced tasks (like Data Guard), then it will require the installation of the Enterprise Edition of Oracle Database. All tasks in this article were done with 11.1.0.7 version of Oracle Database with some references to 11.2.0.1, which had just been released. Host commands relative location This will be important as you begin scripting. Host commands are relative to the location of the executable. As a general rule, you should execute database-specific utilities (imp, exp, datapump, RMAN, and so forth) on the server where the database is located in the correct ORACLE_HOME. This reduces the amount of issues such as core dumps and version compatibilities. This is different from what is usually thought of as a client utilities such as SQL*Plus. There are exceptions to this rule, for it is recommended to run a compiled code (C, C++, Cobol) on a separate server rather than a database. See the following document for setting the TWO_TASK variable when using a separate node for compiled programs. TWO_TASK is an environmental variable. Subject: How to Customize Pro*C and Pro*Cobol Makefiles demo_proc.mk And demo_procob.mk On Linux/Unix [Doc ID: 602936.1]. Notice the WARNING! message that is set using the new 11g sqlnet.ora parameter SEC_USER_UNAUTHORIZED_ACCESS_BANNER. The sqlnet.ora file is part of the SQL*Net components of Oracle RDBMS, which handle the communication between clients and the database. oracle@nodename:/u01/app/oracle/admin/newdb[newdb]> sqlplus /nologSQL*Plus: Release 11.1.0.7.0 - Production on Thu Nov 5 19:00:292009Copyright (c) 1982, 2008, Oracle. All rights reserved.@> connect / as sysdba####################################################################WARNING! This computer system is the property of YOURORGANIZATIONand may be accessed only by authorized users.Unauthorized use of this system is strictly prohibited and may besubject to criminal prosecution.Connected.SYS@newdb> If you wanted to execute something that is available on the operating system level, then you would use a host command (either Windows or Unix), or on Unix the ! symbol. The output below shows that I am logged into the newdb as sys and lists (ls command) the files located in the $ORACLE_HOME/sqlplus/admin directory: Notice how the SQL prompt is populated with the ORACLE_SID and the username that is connected to the database. This is done by adding a line to the glogin.sql file, which can be done within SQL*Plus as shown below (I used the text editor vi): Host commands work based on the location of SQL*Plus. If you want to execute these same commands from a Windows desktop connecting to a remote Unix database, then it would require a Windows equivalent command like Notepad or another editor. If you have Cygwin installed and configured on the Windows desktop, then it would allow you to run Unix-equivalent commands like vi. Separating the variable part of the script into its own configuration file There are scripts that will need some variables set, but you don't necessarily want to use the profile to set a variable at every login. The variables may need to contain commands specific to applications such as RMAN, SQL*Plus or specific to certain tasks. This is where a configuration file comes in handy, which is a personal preference as to what you call them. Be sure not to use reserved keywords for variables, as that leads to unexpected results. In the example below, we use emal instead of the word email. To call this configuration file, you source it starting with the prerequisite . (dot) within a script. This file can also be used in the custom code section of the ioraenv file.
Read more
  • 0
  • 0
  • 5671

article-image-amazon-simpledb-versus-rdbms
Packt
08 Jun 2010
7 min read
Save for later

Amazon SimpleDB versus RDBMS

Packt
08 Jun 2010
7 min read
(For more resources on SimpleDB, see here.) We have all used a Relational Database Management System (RDBMS) at some point in our careers. These relational databases are ubiquitous and are available from a wide range of companies such as Oracle, Microsoft, IBM, and so on. These databases have served us well for our application needs. However, there is a new breed of applications coming to the forefront in the current Internet-driven and socially networked economy. The new applications require large scaling to meet demand peaks that can quickly reach massive levels. This is a scenario that is hard to satisfy using a traditional relational database, as it is impossible to requisition and provision the hardware and software resources that will be needed to service the demand peaks. It is also non-trivial and difficult to scale a normal RDBMS to hundreds or thousands of nodes. The overwhelming complexity of doing this makes the RDBMS not viable for these kinds of applications. SimpleDB provides a great alternative to an RDBMS and can provide a solution to all these problems. However, in order to provide this solution, SimpleDB makes some choices and design decisions that you need to understand in order to make an informed choice about the data storage for your application domain. No normalization Normalization is a process of organizing data efficiently in a relational database by eliminating redundant data, while at the same time ensuring that the data dependencies make sense. SimpleDB data models do not conform to any of the normalization forms, and tend to be completely de-normalized. The lack of need for normalization in SimpleDB allows you a great deal of flexibility with your model, and enables you to use the power of multi-valued attributes in your data. Let's look at a simple example of a database starting with a basic spreadsheet structure and then design it for an RDBMS and a SimpleDB. In this example, we will create a simple contact database, with contact information as raw data. ID First_Name Last_Name Phone_Num 101 John Smith 555-845-7854 101 John Smith 555-854-9885 101 John Smith 555-695-7485 102 Bill Jones 555-748-7854 102 Bill Jones 555-874-8654 The obvious issue is the repetition of the name data. The table is inefficient and would require care to update to keep the name data in sync. To find a person by his or her phone number is easy. SELECT * FROM Contact_Info WHERE Phone_Num = '555-854-9885' So let's analyze the strengths and weaknesses of this database design. SCORE-Raw data Strength Weakness Efficient storage   No Efficient search by phone number Yes   Efficient search by name   No Easy-to-add another phone number Yes   The design is simple, but as the name data is repeated, it would require care to keep the data in sync. Searching for phone numbers by name would be ugly if the names got out of sync. To improve the design, we can rationalize the data. One approach would be to create multiple phone number fields such as the following. While this is a simple solution, it does limit the phone numbers to three. Add e-mail and Twitter, and the table becomes wider and wider. ID First_Name Last_Name Phone_Num_1 Phone_Num_2 Phone_Num_3 101 John Smith 555-845-7854 555-854-9885 555-695-7485 102 Bill Jones 555-748-7854 555-874-8654   Finding a person by a phone number is ugly. SELECT * FROM Contact_Info WHERE Phone_Num_1 = '555-854-9885'OR Phone_Num_2 = '555-854-9885'OR Phone_Num_3 = '555-854-9885' Now let's analyze the strengths and weaknesses of this database design. SCORE-Rationalize data Strength Weakness Efficient storage Yes   Efficient search by phone number   No Efficient search by name Yes   Easy to add another phone number   No The design is simple, but the phone numbers are limited to three, and searching by phone number involves three index searches. Another approach would be to use a delimited list for the phone number as follows: ID First_Name Last_Name Phone_Nums 101 John Smith 555-845-7854;555-854-9885;555-695-7485 102 Bill Jones 555-748-7854;555-874-8654 This approach has the advantage of no data repetition and is easy to maintain, compact, and extendable, but the only way to find a record by the phone number is with a substring search. SELECT * FROM Contact_Info WHERE Phone_Nums LIKE %555-854-9885% This type of SQL forces a complete table scan. Do this with a small table and no one will notice, but try this on a large database with millions of records, and the performance of the database will suffer. SCORE-Delimited Data Strength Weakness Efficient storage Yes   Efficient search by phone number   No Efficient search by name Yes   Easy to add another phone number Yes   A delimited field is good for data that is of one type and will only be retrieved. The normalization for relational databases results in splitting up your data into separate tables that are related to one another by keys. A join is an operation that allows you to retrieve the data back easily across the multiple tables. Let's first normalize this data. This is the Person_Info table: ID First_Name Last_Name 101 John Smith 102 Bill Jones And this is the Phone_Info table: ID Phone_Num 101 555-845-7854 101 555-854-9885 101 555-695-7485 102 555-748-7854 102 555-874-8654 Now a join of the Person_Info table with the Phone_Info can retrieve the list of phone numbers as well as the e-mail addresses. The table structure is clean and other than the ID primary key, no data is duplicated. Provided Phone_Num is indexed, retrieving a contact by the phone number is efficient. SELECT First_Name, Last_Name, Phone_num, Person_Info.IDFROM Person_Info JOIN Phone_InfoON Person_Info.ID = Phone_Info.IDWHERE Phone_Num = '555-854-9885' So if we analyze the strengths and weaknesses of this database design, we get: SCORE-Relational Data Strength Weakness Efficient storage Yes   Efficient search by phone number Yes   Efficient search by name Yes   Easy to add another phone number Yes   While this is an efficient relational model, there is no join command in SimpleDB. Using two tables would force two selects to retrieve the complete contact information. Let's look at how this would be done using the SimpleDB principles. No joins SimpleDB does not support the concept of joins. Instead, SimpleDB provides you with the ability to store multiple values for an attribute, thus avoiding the necessity to perform a join to retrieve all the values. ID       101 First_Name=John Last_Name=Smith Phone_Num =555-845-7854Phone_Num =555-854-9885Phone_Num =555-695-7485 102 First_Name=Bill Last_Name=Jones Phone_Num =555-748-7854Phone_Num =555-874-8654 In the SimpleDB table, each record is stored as an item with attribute/value pairs. The difference here is that the Phone_Num field has multiple values. Unlike a delimited list field, SimpleDB indexes all values enabling an efficient search each value. SELECT * FROM Contact_Info WHERE Phone_Num = '555-854-9885' This SELECT is very quick and efficient. It is even possible to use Phone_Num multiple times such as follows: SELECT * FROM Contact_Info WHERE Phone_Num = '555-854-9885'OR Phone_Num = '555-748-7854' Let's analyze the strengths and weaknesses of this approach: SCORE-SimpleDB Data Strength Weakness Efficient storage Yes   Efficient search by phone number Yes   Efficient search by name Yes   Easy to add another phone number Yes  
Read more
  • 0
  • 0
  • 2437
article-image-hands-tutorial-getting-started-amazon-simpledb
Packt
28 May 2010
5 min read
Save for later

Hands-on Tutorial for Getting Started with Amazon SimpleDB

Packt
28 May 2010
5 min read
(For more resources on SimpleDB, see here.) Creating an AWS account In order to start using SimpleDB, you will first need to sign up for an account with AWS. Visit http://aws.amazon.com/ and click on Create an AWS Account. You can sign up either by using your e-mail address for an existing Amazon account, or by creating a completely new account. You may wish to have multiple accounts to separate billing for projects. This could make it easier for you to track billing for separate accounts. After a successful signup, navigate to the main AWS page— http://aws.amazon.com/, and click on the Your Account link at any time to view your account information and make any changes to it if needed. Enabling SimpleDB service for AWS account Once you have successfully set up an AWS account, you must follow these steps to enable the SimpleDB service for your account: Log in to your AWS account. Navigate to the SimpleDB home page—http://aws.amazon.com/simpledb/. Click on the Sign Up For Amazon SimpleDB button on the right side of the page. Provide the requested credit card information and complete the signup process. You have now successfully set up your AWS account and enabled it for SimpleDB. All communication with SimpleDB or any of the Amazon web services must be through either the SOAP interface or the Query/ReST interface. The request messages sent through either of these interfaces is digitally signed by the sending user in order to ensure that the messages have not been tampered within transit, and that they really originate from the sending user. Requests that use the Query/ReST interface will use the access keys for signing the request, whereas requests to the SOAP interface will use the x.509 certificates. Your new AWS account is associated with the following items: A unique 12-digit AWS account number for identifying your account. AWS Access Credentials are used for the purpose of authenticating requests made by you through the ReST Request API to any of the web services provided by AWS. An initial set of keys is automatically generated for you by default. You can regenerate the Secret Access Key at any time if you like. Keep in mind that when you generate a new access key, all requests made using the old key will be rejected. An Access Key ID identifies you as the person making requests to a web service. A Secret Access Key is used to calculate the digital signature when you make requests to the web service. Be careful with your Secret Access Key, as it provides full access to the account, including the ability to delete all of your data. All requests made to any of the web services provided by AWS using the SOAP protocol use the X.509 security certificate for authentication. There are no default certificates generated automatically for you by AWS. You must generate the certificate by clicking on the Create a new Certificate link, then download them to your computer and make them available to the machine that will be making requests to AWS. Public and private key for the x.509 certificate. You can either upload your own x.509 certificate if you already have one, or you can just generate a new certificate and then download it to your computer. Query API and authentication There are two interfaces to SimpleDB. The SOAP interface uses the SOAP protocol for the messages, while the ReST Requests uses HTTP requests with request parameters to describe the various SimpleDB methods and operations. In this book, we will be focusing on using the ReST Requests for talking to SimpleDB, as it is a much simpler protocol and utilizes straightforward HTTP-based requests and responses for communication, and the requests are sent to SimpleDB using either a HTTP GET or POST method. The ReST Requests need to be authenticated in order to establish that they are originating from a valid SimpleDB user, and also for accounting and billing purposes. This authentication is performed using your access key identifiers. Every request to SimpleDB must contain a request signature calculated by constructing a string based on the Query API and then calculating an RFC 2104-compliant HMAC-SHA1 hash, using the Secret Access Key. The basic steps in the authentication of a request by SimpleDB are: You construct a request to SimpleDB. You use your Secret Access Key to calculate the request signature, a Keyed-Hashing for Message Authentication code (HMAC) with an SHA1 hash function. You send the request data, the request signature, timestamp, and your Access Key ID to AWS. AWS uses the Access Key ID in the request to look up the associated Secret Access Key. AWS generates a request signature from the request data using the retrieved Secret Access Key and the same algorithm you used to calculate the signature in the request. If the signature generated by AWS matches the one you sent in the request, the request is considered to be authentic. If the signatures are different, the request is discarded, and AWS returns an error response. If the timestamp is older than 15 minutes, the request is rejected. The procedure for constructing your requests is simple, but tedious and time consuming. This overview was intended to make you familiar with the entire process, but don't worry—you will not need to go through this laborious process every single time that you interact with SimpleDB. Instead, we will be leveraging one of the available libraries for communicating with SimpleDB, which encapsulates a lot of the repetitive stuff for us and makes it simple to dive straight into playing with and exploring SimpleDB!
Read more
  • 0
  • 0
  • 6003

article-image-use-macros-ibm-cognos-8-report-studio
Packt
25 May 2010
13 min read
Save for later

Use of macros in IBM Cognos 8 Report Studio

Packt
25 May 2010
13 min read
Cognos Report Studio is widely used for creating and managing business reports in medium to large companies. It is simple enough for any business analyst, power user, or developer to pick up and start developing basic reports. However, when it comes to developing more sophisticated, fully functional business reports for wider audiences, report authors will need guidance. In this article, by Abhishek Sanghani, author of IBM Cognos 8 Report Studio Cookbook, we will show you  that even though macros are often considered a Framework Modeler's tool, they can be used within Report Studio as well. These recipes will show you some very useful macros around security, string manipulation, and prompting. (Read more interesting articles on Compiere here.) Introduction This article will introduce you to an interesting and useful tool of Cognos BI, called 'macros'. They can be used in Framework Manager as well as Report Studio. The Cognos engine understands the presence of a macro as it is written within a pair of hashes (#). It executes the macros first and puts the result back into report specification like a literal string replacement. We can use this to alter data items, filters, and slicers at run time. You won't find the macro functions and their details within Report Studio environment (which is strange, as it fully supports them). Anyways, you can always open Framework Manager and check different macro functions and their syntaxes from there. Also, there is documentation available in Cognos' help and online materials. Working with Dimensional Model (in the"Swapping dimension" recipe). In this article, I will show you more examples and introduce you to more functions which you can later build upon to achieve sophisticated functionalities. We will be writing some SQL straight against the GO Data Warehouse data source. Also, we will use the "GO Data Warehouse (Query)" package for some recipes. Add data level security using CSVIdentityMap macro A report shows the employee names by region and country. We need to implement data security in this report such that a user can see the records only for the country he belongs to. There are already User Groups defined on the Cognos server (in the directory) and users are made members of appropriate groups. For this sample, I have added my user account to a user group called 'Spain'. Getting ready Open a new list report with GO Data Warehouse (Query) as the package. How to do it... Drag the appropriate columns (Region, Country, and Employee name) on to the report from Employee by Region query subject. Go to Query Explorer and drag a new detail filter. Define the filter as: [Country] in (#CSVIdentityNameList(',')#) Run the report to test it. You will notice that a user can see only the rows of the country/countries of which he is a member. How it works... Here we are using a macro function called CSVIdentityNameList. This function returns a list of groups and roles that the user belongs to, along with the user's account name. Hence, when I run the report, one of the values returned will be 'Spain' and I will see data for Spain. The function accepts a string parameter which is used as a separator in the result. Here we are passing a comma (,) as the separator. If a user belongs to multiple country groups, he will see data for all the countries listed in the result of a macro. There's more... This solution, conspicuously, has its limitations. None of the user accounts or roles should be same as a country name, because that will wrongly show data for a country the user doesnot belong to. For example, for a user called 'Paris', it will show data for the 'Paris' region. So, there need to be certain restrictions. However, you can build upon the knowledge of this macro function and use it in many practical business scenarios. Using prompt macro in native SQL In this recipe, we will write an SQL statement straight to be fired on the data source. We will use the Prompt macro to dynamically change the filter condition. We will write a report that shows list of employee by Region and Country. We will use the Prompt macro to ask the users to enter a country name. Then the SQL statement will search for the employee belonging to that country. Getting ready Create a new blank list report against 'GO Data Warehouse (Query)' package. How to do it... Go to the Query Explorer and drag an SQL object on the Query Subject that is linked to the list (Query1 in usual case). Select the SQL object and ensure that great_outdoor_warehouse is selected as the data source. Open the SQL property and add the following statement: select distinct "Branch_region_dimension"."REGION_EN" "Region" ,"Branch_region_dimension"."COUNTRY_EN" "Country" , "EMP_EMPLOYEE_DIM"."EMPLOYEE_NAME" "Employee_name"from "GOSALESDW"."GO_REGION_DIM" "Branch_region_dimension","GOSALESDW"."EMP_EMPLOYEE_DIM" "EMP_EMPLOYEE_DIM","GOSALESDW"."GO_BRANCH_DIM" "GO_BRANCH_DIM"where ("Branch_region_dimension"."COUNTRY_EN" in(#prompt('Region')#))and "Branch_region_dimension"."COUNTRY_CODE" = "GO_BRANCH_DIM"."COUNTRY_CODE" and "EMP_EMPLOYEE_DIM"."BRANCH_CODE" = "GO_BRANCH_DIM"."BRANCH_CODE" Hit the OK button. This will validate the query and will close the dialog box. You will see that three data items (Region, Country, and Employee_Name) are added to Query1. Now go to the report page. Drag these data items on the list and run the report to test it. How it works... Here we are using the macro in native SQL statement. Native SQL allows us to directly fire a query on the data source and use the result on the report. This is useful in certain scenarios where we don't need to define any Framework Model. If you examine the SQL statement, you will notice that it is a very simple one that joins three tables and returns appropriate columns. We have added a filter condition on country name which is supposed to dynamically change depending on the value entered by user. The macro function that we have used here is Prompt(). As the name suggests, it is used to generate a prompt and returns the parameter value back to be used in an SQL statement. Prompt() function takes five arguments. The first argument is the parameter name and it is mandatory. It allows us to link a prompt page object (value prompt, date prompt, and so on) to the prompt function. The rest of the four arguments are optional and we are not using them here. You will read about them in the next recipe. Please note that we also have an option of adding a detail filter in the query subject instead of using PROMPT() macro within query. However, sometimes you would want to filter a table before joining it with other tables. In that case, using PROMPT() macro within the query helps. There's more... Similar to the Prompt() function, there is a i macro function. This works in exactly the same way and allows users to enter multiple values for the parameter. Those values are returned as a comma-separated list. Making prompt optional The previous recipe showed you how to generate a prompt through a macro. In this recipe, we will see how to make it optional using other arguments of the function. We will generate two simple list reports, both based on a native SQL. These lists will show product details for selected product line. However, the product line prompt will be made optional using two different approaches. Getting ready Create a report with two simple list objects based on native SQL. For that, create the Query Subjects in the same way as we did in the previous recipe. Use the following query in the SQL objects: select distinct "SLS_PRODUCT_LINE_LOOKUP"."PRODUCT_LINE_EN" "Product_line" , "SLS_PRODUCT_LOOKUP"."PRODUCT_NAME" "Product_name" , "SLS_PRODUCT_COLOR_LOOKUP"."PRODUCT_COLOR_EN" "Product_color" , "SLS_PRODUCT_SIZE_LOOKUP"."PRODUCT_SIZE_EN" "Product_size"from "GOSALESDW"."SLS_PRODUCT_DIM" "SLS_PRODUCT_DIM","GOSALESDW"."SLS_PRODUCT_LINE_LOOKUP" "SLS_PRODUCT_LINE_LOOKUP","GOSALESDW"."SLS_PRODUCT_TYPE_LOOKUP" "SLS_PRODUCT_TYPE_LOOKUP", "GOSALESDW"."SLS_PRODUCT_LOOKUP" "SLS_PRODUCT_LOOKUP","GOSALESDW"."SLS_PRODUCT_COLOR_LOOKUP" "SLS_PRODUCT_COLOR_LOOKUP","GOSALESDW"."SLS_PRODUCT_SIZE_LOOKUP" "SLS_PRODUCT_SIZE_LOOKUP","GOSALESDW"."SLS_PRODUCT_BRAND_LOOKUP" "SLS_PRODUCT_BRAND_LOOKUP"where "SLS_PRODUCT_LOOKUP"."PRODUCT_LANGUAGE" = N'EN' and "SLS_PRODUCT_DIM"."PRODUCT_LINE_CODE" = "SLS_PRODUCT_LINE_LOOKUP"."PRODUCT_LINE_CODE" and "SLS_PRODUCT_DIM"."PRODUCT_NUMBER" = "SLS_PRODUCT_LOOKUP"."PRODUCT_NUMBER" and "SLS_PRODUCT_DIM"."PRODUCT_SIZE_CODE"= "SLS_PRODUCT_SIZE_LOOKUP"."PRODUCT_SIZE_CODE" and "SLS_PRODUCT_DIM"."PRODUCT_TYPE_CODE" = "SLS_PRODUCT_TYPE_LOOKUP"."PRODUCT_TYPE_CODE" and "SLS_PRODUCT_DIM"."PRODUCT_COLOR_CODE" = "SLS_PRODUCT_COLOR_LOOKUP"."PRODUCT_COLOR_CODE" and "SLS_PRODUCT_BRAND_LOOKUP"."PRODUCT_BRAND_CODE" = "SLS_PRODUCT_DIM"."PRODUCT_BRAND_CODE" This is a simple query that joins product related tables and retrieves required columns. How to do it... We have created two list reports based on two SQL query subjects. Both the SQL objects use the same query as mentioned above. Now, we will start with altering them. For that open Query Explorer. Rename first query subject as Optional_defaultValue and the second one as Pure_Optional. In the Optional_defaultValue SQL object, amend the query with following lines: and "SLS_PRODUCT_LINE_LOOKUP"."PRODUCT_LINE_EN" = #sq(prompt ('ProductLine','string','Golf Equipment'))# Similarly, amend the Pure_Optional SQL object query with the following line: #prompt ('Product Line','string','and 1=1', ' and "SLS_PRODUCT_LINE_LOOKUP"."PRODUCT_LINE_EN" = ')# Now run the report. You will be prompted to enter a product line. Don't enter any value and just hit OK button. Notice that the report runs (which means the prompt is optional). First, list object returns rows for 'Golf Equipment'. The second list is populated by all the products. How it works... Fundamentally, this report works the same as the one in the previous report. We are firing the SQL statements straight on the data source. The filter condition in the WHERE clause are using the PROMPT macro. Optional_defaultValue In this query, we are using the second and third arguments of Prompt() function. Second argument defines the data type of value which is 'String' in our case. The third argument defines default value of the prompt. When the user doesn't enter any value for the prompt, this default value is used. This is what makes the prompt optional. As we have defined 'Golf Equipment' as the default value, the first list object shows data for 'Golf Equipment' when prompt is left unfilled. Pure_Optional In this query, we are using fourth argument of Prompt() function. This argument is of string type. If the user provides any value for the prompt, the prompt value is concatenated to this string argument and the result is returned. In our case, the fourth argument is the left part of filtering condition that is, 'and . 'and "SLS_PRODUCT_LINE_LOOKUP"."PRODUCT_LINE_EN" ='. So, if the user enters the value as 'XYZ', the macro is replaced by the following filter: and "SLS_PRODUCT_LINE_LOOKUP"."PRODUCT_LINE_EN" = 'XYZ' Interestingly, if the user doesn't provide any prompt value, then the fourth argument is simply ignored. The macro is then replaced by the third argument which is in our case is 'and 1=1'. Hence, the second list returns all the rows when user doesn't provide any value for the prompt. This way it makes the PRODUCT_LINE_EN filter purely optional. There's more... Prompt macro accepts two more arguments (fifth and sixth). Please check the help documents or internet sources to find information and examples about them. Adding token using macro In this recipe, we will see how to dynamically change the field on which filter is being applied using macro. We will use prompt macro to generate one of the possible tokens and then use it in the query. Getting ready Create a list report based on native SQL similar to the previous recipe. We will use the same query that works on the product tables but filtering will be different. For that, define the SQL as following: select distinct "SLS_PRODUCT_LINE_LOOKUP"."PRODUCT_LINE_EN" "Product_line" , "SLS_PRODUCT_LOOKUP"."PRODUCT_NAME" "Product_name" , "SLS_PRODUCT_COLOR_LOOKUP"."PRODUCT_COLOR_EN" "Product_color" , "SLS_PRODUCT_SIZE_LOOKUP"."PRODUCT_SIZE_EN" "Product_size"from "GOSALESDW"."SLS_PRODUCT_DIM" "SLS_PRODUCT_DIM","GOSALESDW"."SLS_PRODUCT_LINE_LOOKUP" "SLS_PRODUCT_LINE_LOOKUP","GOSALESDW"."SLS_PRODUCT_TYPE_LOOKUP" "SLS_PRODUCT_TYPE_LOOKUP", "GOSALESDW"."SLS_PRODUCT_LOOKUP" "SLS_PRODUCT_LOOKUP","GOSALESDW"."SLS_PRODUCT_COLOR_LOOKUP" "SLS_PRODUCT_COLOR_LOOKUP","GOSALESDW"."SLS_PRODUCT_SIZE_LOOKUP" "SLS_PRODUCT_SIZE_LOOKUP","GOSALESDW"."SLS_PRODUCT_BRAND_LOOKUP" "SLS_PRODUCT_BRAND_LOOKUP"where "SLS_PRODUCT_LOOKUP"."PRODUCT_LANGUAGE" = N'EN' and "SLS_PRODUCT_DIM"."PRODUCT_LINE_CODE" = "SLS_PRODUCT_LINE_LOOKUP"."PRODUCT_LINE_CODE" and "SLS_PRODUCT_DIM"."PRODUCT_NUMBER" = "SLS_PRODUCT_LOOKUP"."PRODUCT_NUMBER" and "SLS_PRODUCT_DIM"."PRODUCT_SIZE_CODE"= "SLS_PRODUCT_SIZE_LOOKUP"."PRODUCT_SIZE_CODE" and "SLS_PRODUCT_DIM"."PRODUCT_TYPE_CODE" = "SLS_PRODUCT_TYPE_LOOKUP"."PRODUCT_TYPE_CODE" and "SLS_PRODUCT_DIM"."PRODUCT_COLOR_CODE" = "SLS_PRODUCT_COLOR_LOOKUP"."PRODUCT_COLOR_CODE" and "SLS_PRODUCT_BRAND_LOOKUP"."PRODUCT_BRAND_CODE" = "SLS_PRODUCT_DIM"."PRODUCT_BRAND_CODE"and#prompt ('Field','token','"SLS_PRODUCT_LINE_LOOKUP"."PRODUCT_LINE_EN"')# like #prompt ('Value','string')# This is the same basic query that joins the product related tables and fetches required columns. The last statement in WHERE clause uses two prompt macros. We will talk about it in detail. How to do it... We have already created a list report based on an SQL query subject as mentioned previously. Drag the columns from the query subject on the list over the report page. Now create a new prompt page. Add a value prompt on the prompt page. Define two static choices for this. Display value Use value Filter on product line "SLS_PRODUCT_LINE_LOOKUP"."PRODUCT_LINE_EN" Filter on product name "SLS_PRODUCT_LOOKUP"."PRODUCT_NAME Set the parameter for this prompt to 'Field'. This will come pre-populated as existing parameter, as it is defined in the query subject. Choose the UI as radio button group and Filter on Product Line as default selection. Now add a text box prompt on to the prompt page. Set its parameter to Value which comes as a choice in an existing parameter (as it is already defined in the query). Run the report to test it. You will see an option to filter on product line or product name. The value you provide in the text box prompt will be used to filter either of the fields depending on the choice selected in radio buttons.
Read more
  • 0
  • 0
  • 9025

article-image-oracle-rdbms-log-miner-utility-fra-and-aum
Packt
20 May 2010
7 min read
Save for later

Oracle: RDBMS Log Miner Utility, FRA, and AUM

Packt
20 May 2010
7 min read
Log Miner can help when questions such as the following come up: What was changed? Who changed it? And in what order? When unauthorized people change data, they may assume that the record does not retain all changes if that information isn't viewable at the application level. There is a record of all changes that are logged, but it takes time and trouble to find that information. The tool most often used is the PL/SQL package DBMS_LOGMNR, but the GUI Interface called Log Miner Viewer has been added to the OEM. There are quite a few examples in the Oracle Database Utilities Guide of how to use this utility for both the browser-based and PL/SQL versions. We will concentrate on when and how to find the data to restore. You already should have a good understanding of the database structures that include the undo and redo logs: undo is generated when an end user starts changing data and redo is generated after the commit. Each is written to their own set of files. While undo and redo are both online (database is open), archived redo is offline and written to a disk. Archived redo logs are no longer needed for the transactions inside the database because they have been committed and written to disk. Archive logs are still important in order to restore the previously committed transactions in a recovery situation. Making an archive log offl ine allows backup procedures (RMAN, third-party backup software or OS utilities) to manipulate the files at the operating system level. Recovery is a database process that will: Roll forward changes from redo logs and then rollback statements any end user used the rollback command for. Roll back any uncommitted changes found in the UNDO segments. There are specific Oracle processes such as LGWR that write the redo to the online logs and then an archiver process (ARC) writes to the archived logs. The only way to ensure every transaction in a database has been logged for recovery purposes is to operate in ARCHIVELOG mode. There are special situations that will call for running in noarchivelog mode. It is assumed that any transactions lost between backups can be recreated. Archived redo logs can be used to restore transactions that occurred between regular backups. From the last exercise, you also have a good understanding of read consistency available from undo segments, which also contribute to redo entries. The DBMS_LOGMNR package is used to find data in both the undo and redo database structures. It is also useful for analyzing patterns over time for specific tuning needs, schema changes, and forecasting the time for hardware upgrades. With the DBMS_LOGMNR package, you can extract data that populates the V$LOGMNR_CONTENTS view with the actual transactions that have been executed. These entries contain both the REDO and UNDO statements. You can operate Log Miner on the original database that created the log entries or almost any other Oracle database of a higher version that is running the same character set, database block size, and operating system. This is why it is critical that you protect the online redo, undo, and archive logs—they can be mined for information. Most often a DBA will actually use a different database to do the mining so that it doesn't consume additional resources in a production database. If you use a different database than where the original transactions were created, you will have to rebuild the Log Miner data dictionary (online, offline, or a standalone flat file). The dictionary translates the internal object identifiers and types to table and column names that can be queried, but those object IDs will vary between databases, making the rebuild a requirement. The Log Miner example task requires several preparatory steps to be completed first, with some additional discussion along the way. Discussion includes archiving, supplemental logging, and Flashback technologies. You won't get to an actual logminer example for quite a few pages. Since logminer has extensive documentation detailing all of the steps for various scenarios, it was decided to only include a lesser known method of using logminer. Turn on archivelog mode Before we delve into the mining exercise, we will cover more information about SCNs, as they relate to checkpoints and log switches while turning on archiving for the database. Transactions in a database produce redo entries in the redo log buffer (in memory), but that is always being written to the online redo logs. That occurs according to different triggering events that can happen in the redo stream—a certain amount of data, commits, 3 seconds or 1/3 full redo log buffer. Whether these triggering events occur or not depends on the type and frequency of transactions. A checkpoint synchronizes modified data blocks in the redo log buffer with the actual data files, keeping the data consistent. In the case of a database crash, this identifies the point where all outstanding data (transactions) have been written to disk. This checkpoint isn't synchronized with the SCN of a transaction commit and it does not behave like a log switch. The files you will need as you work through this exercise are included in the code as follows: sys_archive.sql sysarchive.lst Open up the file sysarchive.lst. One of the most important views (anything labeled v$ is called a dynamic view) in the database is v$database. SYS@NEWDB> SELECT LOG_MODE, NAME, CURRENT_SCN, ARCHIVE_CHANGE#, OPEN_MODE FROM V$DATABASE; Find this section for the statement from v$log_history farther down in sysarchive.lst. What are all these entries if we aren't in archivelog mode? These are the log switches to the online redo logs. They are overwritten once that section of the redo log is no longer needed by a transaction to maintain consistency. This is where a checkpoint comes into play. It ensures that data is written to the disk and is independent of the ARC log switch process. Once we switch to archivelog mode, the online redo will still be overwritten, but the ARC process will write a copy of that log to an archive destination. Below you will see that each log contains a range of database SCNs. This log contains database changes from the first SCN number to the next. Now we try to correlate archive_change# and checkpoint_change#. Also notice that the checkpoint_change# for each data file is consistent for normal database operations. I am showing only the partial output from the following command for the single data file created: At this point, we have started the database in mount mode (the controlfile needs to be accessed, but the database is not opened for full use), turned on the archiving process, and verified that archiving has started and also verified the location of the archived logs. Making a log switch from one online redo to another doesn't sync the checkpoint_change# with what the controlfile has (controlfile_change# is what is also called a thread checkpoint). Only when we do a manual checkpoint (instead of a database-activated checkpoint) do tho se numbers coincide. They can be verified with the dynamic view v$datafile as shown below: Additional information for troubleshooting archiving issues comes from another dynamic view, V$INSTANCE: The archiver column can also indicate when the ARC process failed to switch logs with an automatic retry in another five minutes. The log_switch_wait will indicate the wait event the log switching process is waiting on—ARCHIVE LOG, CLEAR LOG, or CHECKPOINT. All of the activity associated with log switches and checkpoints will influence database performance. We shall continue now with the further required setup steps to complete all of the tasks.
Read more
  • 0
  • 0
  • 2583
article-image-mysql-cluster-management-part-2
Packt
10 May 2010
11 min read
Save for later

MySQL Cluster Management : Part 2

Packt
10 May 2010
11 min read
Replication between clusters with a backup channel The previous recipe showed how to connect a MySQL Cluster to another MySQL server or another MySQL Cluster using a single replication channel. Obviously, this means that this replication channel has a single point of failure (if either of the two replication agents {machines} fail, the channel goes down). If you are designing your disaster recovery plan to rely on MySQL Cluster replication, then you are likely to want more reliability than that. One simple thing that we can do is run multiple replication channels between two clusters. With this setup, in the event of a replication channel failing, a single command can be executed on one of the backup channel slaves to continue the channel. It is not currently possible to automate this process (at least, not without scripting it yourself). The idea is that with a second channel ready and good monitoring of the primary channel, you can quickly bring up the replication channel in the case of failure, which means significantly less time spent with the replication channel down. How to do it… Setting up this process is not vastly different, however, it is vital to ensure that both channels are not running at any one time, or the data at the slave site will become a mess and the replication will stop. To guarantee this, the first step is to add the following to the mysqld section of /etc/my.cnf on all slave MySQL Servers (of which there are likely to be two): skip-slave-start Once added, restart mysqld. This my.cnf parameter prevents the MySQL Server from automatically starting the slave process. You should start one of the channels (normally, whichever channel you decide will be your master) normally, while following the steps in the previous recipe. To configure the second slave, follow the instructions in the previous recipe, but stop just prior to the CHANGE MASTER TO step on the second (backup) slave. If you configure two replication channels simultaneously (that is, forget to stop the existing replication channel when testing the backup), you will end up with a broken setup. Do not proceed to run CHANGE MASTER TO on the backup slave unless the primary channel is not operating. As soon as the primary communication channel fails, you should execute the following command on any one of the SQL nodes in your slave (destination) cluster and record the result: [slave] mysql> SELECT MAX(epoch) FROM mysql.ndb_apply_status;+---------------+| MAX(epoch) |+---------------+| 5952824672272 |+---------------+1 row in set (0.00 sec) The previous highlighted number is the ID of the most recent global checkpoint, which is run every couple of seconds on all storage nodes in the master cluster and as a result, all the REDO logs are synced to disk. Checking this number on a SQL node in the slave cluster tells you what the last global checkpoint that made it to the slave cluster was. You can run a similar command SELECT MAX(epoch) FROM mysql.ndb_binlog_index on any SQL node in the master (source) cluster to find out what the most recent global checkpoint on the master cluster is. Clearly, if your replication channel goes down, then these two numbers will diverge quickly. Use this number (5952824672272 in our example) to find the correct logfile and position that you should connect to. You can do this by executing the following command on any SQL node in the master (source) cluster that you plan to make the new master, ensuring that you substitute the output of the previous command with the correct number as an epoch field as follows: mysql> SELECT-> File,-> Position-> FROM mysql.ndb_binlog_index-> WHERE epoch > 5952824672272-> ORDER BY epoch ASC LIMIT 1;+--------------------+----------+| File | Position |+--------------------+----------+| ./node2-bin.000003 | 200998 |+--------------------+----------+1 row in set (0.00 sec) If this returns NULL, firstly, ensure that there is some activity in your cluster since the failure (if you are using batched updates, then there should be 32 KB of updates or more) and secondly, ensure that there is no active replication channel between the nodes (that is, ensure the primary channel has really failed). Using the filename and position mentioned previously, run the following command on the backup slave: It is critical that you run these commands on the correct node. The previous command, from which you get the filename and position, must be run on the new master (this is in the "source" cluster). The following command, which tells the new slave which master to connect to and its relevant position and filename, must be executed on the new slave (this is the "destination" cluster). While it is technically possible to connect the old slave to a new master or vice versa, this configuration is not recommended by MySQL and should not be used. If all is okay, then the highlighted rows in the preceding output will show that the slave thread is running and waiting for the master to send an event. [NEW slave] mysql> CHANGE MASTER TO MASTER_HOST='10.0.0.2', MASTER_USER='slave', MASTER_PASSWORD='password', MASTER_LOG_FILE='node2-bin.000003', MASTER_LOG_POS=200998;Query OK, 0 rows affected (0.01 sec)mysql> START SLAVE;Query OK, 0 rows affected (0.00 sec)mysql> show slave statusG;*************************** 1. row ***************************Slave_IO_State: Waiting for master to send eventMaster_Host: 10.0.0.2Master_User: slaveMaster_Port: 3306[snip]Relay_Master_Log_File: node2-bin.000003Slave_IO_Running: YesSlave_SQL_Running: YesReplicate_Do_DB:Replicate_Ignore_DB:Replicate_Do_Table:Replicate_Ignore_Table:[snip]Seconds_Behind_Master: 233 After a while, the Seconds_Behind_Master value should return to 0 (if the primary replication channel has been down for some time or if the master cluster has a very high write rate, then this may take some time) There's more… It is possible to increase the performance of MySQL Cluster replication by enabling batched updates. This can be accomplished by starting slave mysqld processes with the slave-allow-batching option (or add the slave-allow-batching option line to the [mysqld] section in my.cnf). This has the effect of applying updates in 32 KB batches rather than as soon as they are received, which generally results in lower CPU usage and higher throughput (particularly when the mean update size is low). See also To know more about Replication Compatibility Between MySQL Versions visit: http://dev.mysql.com/doc/refman/5.1/en/replication-compatibility.html User-defined partitioning MySQL Cluster vertically partitions data, based on the primary key, unless you configure it otherwise. The main aim of user-defined partitioning is to increase performance by grouping data likely to be involved in common queries onto a single node, thus reducing network traffic between nodes while satisfying queries. In this recipe, we will show how to define our own partitioning functions. If the NoOfReplicas in the global cluster configuration file is equal to the number of storage nodes, then each storage node contains a complete copy of the cluster data and there is no partitioning involved. Partitioning is only involved when there are more storage nodes than replicas. Getting ready Look at the City table in the world dataset; there are two integer fields (ID and Population). MySQL Cluster will choose ID as the default partitioning scheme as follows: mysql> desc City;+-------------+----------+------+-----+---------+----------------+| Field | Type | Null | Key | Default | Extra |+-------------+----------+------+-----+---------+----------------+| ID | int(11) | NO | PRI | NULL | auto_increment || Name | char(35) | NO | | | || CountryCode | char(3) | NO | | | || District | char(20) | NO | | | || Population | int(11) | NO | | 0 | |+-------------+----------+------+-----+---------+----------------+5 rows in set (0.00 sec) Therefore, a query that searches for a specific ID will use only one partition. In the following example, partition p3 is used: mysql> explain partitions select * from City where ID=1;+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+-------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+-------+| 1 | SIMPLE | City | p3 | const | PRIMARY | PRIMARY | 4 | const | 1 | |+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+-------+1 row in set (0.00 sec) However, searching for a Population involves searching all partitions as follows: mysql> explain partitions select * from City where Population=42;+----+-------------+-------+-------------+------+---------------+------+---------+------+------+-----------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+-------------+------+---------------+------+---------+------+------+-----------+| 1 | SIMPLE | City | p0,p1,p2,p3 | ALL | NULL | NULL | NULL | NULL | 4079 | Using where with pushed condition |+----+-------------+-------+-------------+------+---------------+------+---------+------+------+-----------+1 row in set (0.01 sec) The first thing to do when considering user-defined partitioning is to decide if you can improve on the default partitioning scheme. In this case, if your application makes a lot of queries against this table specifying the City ID, it is unlikely that you can improve performance with user-defined partitioning. However, in case it makes a lot of queries by the Population and ID fields, it is likely that you can improve performance by switching the partitioning function from a hash of the primary key to a hash of the primary key and the Population field. How to do it... In this example, we are going to add the field Population to the partitioning function used by MySQL Cluster. We will add this field to the primary key rather than solely using this field. This is because the City table has an auto-increment field on the ID field, and in MySQL Cluster, an auto-increment field must be part of the primary key. Firstly, modify the primary key in the table to add the field that we will use to partition the table by: mysql> ALTER TABLE City DROP PRIMARY KEY, ADD PRIMARY KEY(ID, Population);Query OK, 4079 rows affected (2.61 sec)Records: 4079 Duplicates: 0 Warnings: 0 Now, tell MySQL Cluster to use the Population field as a partitioning function as follows: mysql> ALTER TABLE City partition by key (Population);Query OK, 4079 rows affected (2.84 sec)Records: 4079 Duplicates: 0 Warnings: 0 Now, verify that queries executed against this table only use one partition as follows: mysql> explain partitions select * from City where Population=42;+----+-------------+-------+------------+------+---------------+------+---------+------+------+------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+------------+------+---------------+------+---------+------+------+------------+| 1 | SIMPLE | City | p3 | ALL | NULL | NULL | NULL | NULL | 4079 | Using where with pushed condition |+----+-------------+-------+------------+------+---------------+------+---------+------+------+------------+1 row in set (0.01 sec) Now, notice that queries against the old partitioning function, ID, use all partitions as follows: mysql> explain partitions select * from City where ID=1;+----+-------------+-------+-------------+------+---------------+---------+---------+-------+------+-------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+-------------+------+---------------+---------+---------+-------+------+-------+| 1 | SIMPLE | City | p0,p1,p2,p3 | ref | PRIMARY | PRIMARY | 4 | const | 10 | |+----+-------------+-------+-------------+------+---------------+---------+---------+-------+------+-------+1 row in set (0.00 sec) Congratulations! You have now set up user-defined partitioning. Now, benchmark your application to see if you have gained an increase in performance. There's more... User-defined partitioning can be particularly useful where you have multiple tables and a join. For example, if you had a table of Areas within Cities consisting of an ID field (primary key, auto increment, and default partitioning field) and then a City ID, you would likely find an enormous number of queries that select all of the locations within a certain city and also select the relevant city row. It would therefore make sense to keep: all of the rows with the same City value inside the Areas table together on one node each of these groups of City values inside the Areas table on the same node as the relevant City row in the City table This can be achieved by configuring both tables to use the City field as a partitioning function, as described earlier in the Population field.
Read more
  • 0
  • 0
  • 1639

article-image-mysql-cluster-management-part-1
Packt
10 May 2010
10 min read
Save for later

MySQL Cluster Management : Part 1

Packt
10 May 2010
10 min read
(Read more interesting articles on MySQL High Availability here.) Configuring multiple management nodes Every MySQL Cluster must have a management node to start and also to carry out critical tasks such as allowing other nodes to restart, running online backups, and monitoring the status of the cluster. However, it is strongly recommended for a production cluster to ensure that a management node is always available, and this requires more than one node. In this recipe, we will discuss the minor complications that more than one management node will bring before showing the configuration of a new cluster with two management nodes. Finally, the modification of an existing cluster to add a second management node will be shown. Getting ready In a single management node cluster, everything is simple. Nodes connect to the management node, get a node ID, and join the cluster. When the management node starts, it reads the config.ini file, starts and prepares to give the cluster information contained within the config.ini file out to the cluster nodes as and when they join. This process can become slightly more complicated when there are multiple management nodes, and it is important that each management node takes a different ID. Therefore, the first additional complication is that it is an extremely good idea to specify node IDs and ensure that the HostName parameter is set for each management node in the config.ini file. It is technically possible to start two management nodes with different cluster configuration files in a cluster with multiple management nodes. It is not difficult to see that this can cause all sorts of bizarre behavior including a likely cluster shutdown in the case of the primary management node failing. Ensure that every time the config.ini file is changed, the change is correctly replicated to all management nodes. You should also ensure that all management nodes are always using the same version of the config.ini file. It is possible to hold the config.ini file on a shared location such as a NFS share, although to avoid introducing complexity and a single point of failure, the best practice would be to store the configuration file in a configuration management system such as Puppet (http://www.puppetlabs.com/) or Cfengine (http://www.cfengine.org/). How to do it... The following process should be followed to configure a cluster for multiple management nodes. In this recipe, High Availability with MySQL Cluster. Initially, this recipe will cover the procedure to be followed in order to configure a new cluster with two management nodes. Thereafter, the procedure for adding a second management node to an already running single management node cluster will be covered. The first step is to define two management nodes in the global configuration file config.ini on both management nodes. In this example, we are using IP addresses 10.0.0.5 and 10.0.0.6 for the two management nodes that require the following two entries of [ndb_mgmd] in the config.ini file: [ndb_mgmd]Id=1HostName=10.0.0.5DataDir=/var/lib/mysql-cluster[ndb_mgmd]Id=2HostName=10.0.0.6DataDir=/var/lib/mysql-cluster Update the [mysqld] section of each SQL node's /etc/my.cnf to point to both management nodes: [mysqld]ndb-connectstring=10.0.0.5,10.0.0.6 Update the [mysqld] section of each SQL node's /etc/my.cnf to point to both management nodes: [mysqld]ndb-connectstring=10.0.0.5,10.0.0.6 Now, prepare to start both the management nodes. Install the management node on both nodes, if it does not already exist. Before proceeding, ensure that you have copied the updated config.ini file to both management nodes. Start the first management node by changing to the correct directory and running the management node binary (ndb_mgmd) with the following flags: --initial: Deletes the local cache of the config.ini file and updates it (you must do this every time the config.ini file is changed). --ndb-nodeid=X: Tells the node to connect as this nodeid, as we specified in the config.ini file. This is technically unnecessary if there is no ambiguity as to which nodeid this particular node may connect to (in this case, both nodes have a HostName defined). However, defining it reduces the possibility of confusion. --config-file=config.ini: This is used to specify the configuration file. In theory, passing a value of the config.ini file in the local directory is unnecessary because it is the default value. But in certain situations, it seems that passing this in any case avoids issues, and again this reduces the possibility of confusion. [root@node6 mysql-cluster]# cd /usr/local/mysql-cluster[root@node6 mysql-cluster]# ndb_mgmd --config-file=config.ini --initial --ndb-nodeid=22009-08-15 20:49:21 [MgmSrvr] INFO -- NDB Cluster Management Server. mysql-5.1.34 ndb-7.0.62009-08-15 20:49:21 [MgmSrvr] INFO -- Reading cluster configuration from 'config.ini' Repeat this command on the other node using the correct node ID: [root@node5 mysql-cluster]# cd /usr/local/mysql-cluster[root@node5 mysql-cluster]# ndb_mgmd --config-file=config.ini --initial --ndb-nodeid=1 Now, start each storage node in turn. Use the storage management client's show command to show that both management nodes are connected and that all storage nodes have been reconnected: ndb_mgm> showConnected to Management Server at: 10.0.0.5:1186Cluster Configuration---------------------[ndbd(NDB)] 4 node(s)id=3 @10.0.0.1 (mysql-5.1.34 ndb-7.0.6, Nodegroup: 0, Master)id=4 @10.0.0.2 (mysql-5.1.34 ndb-7.0.6, Nodegroup: 0)id=5 @10.0.0.3 (mysql-5.1.34 ndb-7.0.6, Nodegroup: 1)id=6 @10.0.0.4 (mysql-5.1.34 ndb-7.0.6, Nodegroup: 1)[ndb_mgmd(MGM)] 2 node(s)id=1 @10.0.0.5 (mysql-5.1.34 ndb-7.0.6)id=2 @10.0.0.6 (mysql-5.1.34 ndb-7.0.6)[mysqld(API)] 4 node(s)id=11 @10.0.0.1 (mysql-5.1.34 ndb-7.0.6)id=12 @10.0.0.2 (mysql-5.1.34 ndb-7.0.6)id=13 @10.0.0.3 (mysql-5.1.34 ndb-7.0.6)id=14 @10.0.0.4 (mysql-5.1.34 ndb-7.0.6) Finally, restart all SQL nodes (mysqld processes). On RedHat-based systems, this can be achieved using the service command: [root@node1 ~]# service mysqld restart Congratulations! Your cluster is now configured with multiple management nodes. Test that failover works by killing a management node, in turn, the remaining management nodes should continue to work. There's more... It is sometimes necessary to add a management node to an existing cluster if for example, due to a lack of hardware or time, an initial cluster only has a single management node. Adding a management node is simple. Firstly, install the management client on the new node . Secondly, modify the config.ini file, as shown earlier in this recipe for adding the new management node, and copy this new config.ini file to both management nodes. Finally, stop the existing management node and start the new one using the following commands: For the existing management node, type: [root@node6 mysql-cluster]# killall ndb_mgmd [root@node6 mysql-cluster]# ndb_mgmd --config-file=config.ini --initial --ndb-nodeid=22009-08-15 21:29:53 [MgmSrvr] INFO -- NDB Cluster Management Server. mysql-5.1.34 ndb-7.0.62009-08-15 21:29:53 [MgmSrvr] INFO -- Reading cluster configuration from 'config.ini' Then type the following command for the new management node: [root@node5 mysql-cluster]# ndb_mgmd --config-file=config.ini --initial --ndb-nodeid=12009-08-15 21:29:53 [MgmSrvr] INFO -- NDB Cluster Management Server. mysql-5.1.34 ndb-7.0.62009-08-15 21:29:53 [MgmSrvr] INFO -- Reading cluster configuration from 'config.ini'torage node one at a time. Ensure that you only stop one node per nodegroup at a time and wait for it to fully restart before taking another node in the nodegroup, when offline, in order to avoid any downtime. See also Look at the section for the online addition of storage nodes (discussed later in this article) for further details on restarting storage nodes one at a time. Obtaining usage information This recipe explains how to monitor the usage of a MySQL Cluster, looking at the memory, CPU, IO, and network utilization on storage nodes. Getting ready MySQL Cluster is extremely memory-intensive. When a MySQL Cluster starts, the storage nodes will start using the entire DataMemory and IndexMemory allocated to them. In a production cluster with a large amount of RAM, it is likely that this will include a large proportion of the physical memory on the server. How to do it... An essential part of managing a MySQL Cluster is looking into what is happening inside each storage node. In this section, we will cover the vital commands used to monitor a cluster. To monitor the memory (RAM) usage of the nodes within the cluster, execute the &ltnodeid> REPORT MemoryUsage command within the management client as follows: ndb_mgm> 3 REPORT MemoryUsageNode 3: Data usage is 0%(21 32K pages of total 98304)Node 3: Index usage is 0%(13 8K pages of total 131104) This command can be executed for all storage nodes rather than just one by using ALL nodeid: ndb_mgm> ALL REPORT MemoryUsageNode 3: Data usage is 0%(21 32K pages of total 98304)Node 3: Index usage is 0%(13 8K pages of total 131104)Node 4: Data usage is 0%(21 32K pages of total 98304)Node 4: Index usage is 0%(13 8K pages of total 131104)Node 5: Data usage is 0%(21 32K pages of total 98304)Node 5: Index usage is 0%(13 8K pages of total 131104)Node 6: Data usage is 0%(21 32K pages of total 98304)Node 6: Index usage is 0%(13 8K pages of total 131104) This information shows that these nodes are actually using 0% of their DataMemory and IndexMemory. Memory allocation is important and unfortunately a little more complicated than a percentage used on each node. There is more detail about this in the How it works... section of this recipe, but the vital points to remember are: It is a good idea never to go over 80 percent of memory usage (particularly not for DataMemory) In the case of a cluster with a very high memory usage, it is possible that a cluster will not restart correctly MySQL Cluster storage nodes make extensive use of disk storage unless specifically configured not to, regardless of whether a cluster is using disk-based tables. It is important to ensure the following: There is sufficient storage available There is sufficient IO bandwidth for the storage node and the latency is not too high To confirm the disk usage on Linux, use the command df –h as follows: [root@node1 mysql-cluster]# df -hFilesystem Size Used Avail Use% Mounted on/dev/mapper/system-root7.6G 2.0G 5.3G 28% //dev/xvda1 99M 21M 74M 22% /boottmpfs 2.0G 0 2.0G 0% /dev/shm/dev/mapper/system-ndb_data2.0G 83M 1.8G 5% /var/lib/mysql-cluster/dev/mapper/system-ndb_backups2.0G 68M 1.9G 4% /var/lib/mysql-cluster/BACKUPS In this example, the cluster data directory and backup directory are on different logical volumes. This provides the following benefits: It is easy to see their usage (5% for data and 4% for backups) Each volume is isolated from other partitions or logical volumes—it means that they are protected from, let's say, a logfile growing in the logs directory To confirm the rate at which the kernel is writing to and reading from the disk, use the vmstat command: [root@node1 ~]# vmstat 1procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------r b swpd free buff cache si so bi bo in cs ussy id wa st0 0 0 2978804 324784 353856 0 0 1 121 39 15 00 100 0 03 0 0 2978804 324784 353856 0 0 0 0 497 620 00 99 0 10 0 0 2978804 324784 353856 0 0 0 172 529 665 00 100 0 0 The bi and bo columns represent the blocks read from a disk and blocks written to a disk, respectively. The first line can be ignored (it's the average since boot), and the number passed to the command, in this case, the refresh rate in seconds. By using a tool such as bonnie (refer to the See also section at the end of this recipe) to establish the potential of each block device, you can then check to see the maximum proportion of each block device is currently being used. At times of high stress, like during a hot backup, if the disk utilization is too high it is potentially possible that the storage node will start spending a lot of time in the iowait state—this will reduce performance and should be avoided. One way to avoid this is by using a separate block device (that is, disk or raid controller) for the backups mount point.
Read more
  • 0
  • 0
  • 2689
Modal Close icon
Modal Close icon