Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Data

1210 Articles
article-image-events-oracle-11g-database
Packt
20 Oct 2009
5 min read
Save for later

Events in Oracle 11g Database

Packt
20 Oct 2009
5 min read
Generally, jobs run immediately upon being enabled, or when we call the run_job procedure of the dbms_scheduler package. Many jobs are time-based; they are controlled by a schedule based on some kind of calendar. However, not everything in real life can be controlled by a calendar. Many things need an action on an ad hoc basis, depending on the occurrence of some other thing. This is called event-based scheduling. Events also exist as the outcome of a job. We can define a job to raise an event in several ways—when it ends, or when it ends in an error, or when it does not end within the expected runtime. Let's start with creating job events in order to make job monitoring a lot easier for you. Monitoring job events Most of the time when jobs just do their work as expected, there is not much to monitor. In most cases, the job controller has to fix application-specific problems (for example, sometimes file systems or table spaces get filled up). To make this easier, we can incorporate events. We can make jobs raise events when something unexpected happens, and we can have the Scheduler generate events when a job runs for too long. This gives us tremendous power. We can also use this to make chains a little easier to maintain. Events in chains A chain consists of steps that depend on each other. In many cases, it does not make sense to continue to step 2 when step 1 fails. For example, when a create table fails, why try to load data into the nonexistent table? So it is logical to terminate the job if no other independent steps can be performed. One of the ways to handle this is implementing error steps in the chain. This might be a good idea, but the disadvantage is that this quickly doubles the steps involved in the chain, where most of the steps—hopefully—will not be executed. Another disadvantage is that the chain becomes less maintainable. It's a lot of extra code, and more code (mostly) gives us less oversight. If a job chain has to be terminated because of a failure, using the option of creating an event handler to raise a Scheduler event is recommended instead of adding extra steps that try to tell which error possibly happened. This makes event notification a lot simpler because it's all in separate code and not mixed up with the application code. Another situation is when the application logic has to take care of steps that fail, and has well-defined countermeasures to be executed that make the total outcome of the job a success. An example is a situation that starts with a test for the existence of a fi le. If the test fails, get it by FTP; and if this succeeds, load it into the database. In this case, the first step can fail and go to the step that gets the file. As there is no other action possible when the FTP action fails, this should raise a Scheduler event that triggers—for example—a notification action. The same should happen when the load fails. In other third-party scheduling packages, I have seen these notification actions implemented as part of the chain definitions because they lack a Scheduler event queue. In such packages, messages are sent by mail in extra chain steps. In the Oracle Scheduler, this queue is present and is very useful for us. Compared to 10g, nothing has changed in 11g. An event monitoring package can de-queue from the SCHEDULER$_EVENT_QUEUE variable into a sys.scheduler$_event_info type variable. The definition is shown in the following screenshot: What you can do with an event handler is up to your imagination. The following DB Console screenshot shows the interface that can be used to specify which events to raise: It is easy to generate an event for every possible event listed above and have the handler decide (by the rules defined in tables) what to do. This does sound a little creepy, but it is not very different from having a table that can be used as a lookup for the job found in the event message where—most of the time—a notification mail is sent, or not sent. Sometimes, a user wants to get a message when a job starts running; and most of the time, they want a message when a job ends. In a chain, it is especially important to be able to tell in which step the event happened and what that step was supposed to do. In the event message, only the job name is present and so you have to search a bit to find the name of the step that failed. For this, we can use the LOG_ID to find the step name in the SCHEDULER_JOB_LOGS (user/dba/all_SCHEDULER_JOB_LOG) view, where the step name is listed as JOB_SUBNAME. The following query can be used to find the step_name from the dba all_scheduler_log view, assuming that the event message is received in msg: select job_subname from all_scheduler_job_log wherelog_id = msg.log_id; To enable the delivery of all the events a job can generate, we can set the raise_events attribute to a value of: dbms_scheduler.job_started + dbms_scheduler.job_succeeded +dbms_scheduler.job_failed + dbms_scheduler.job_broken +dbms_scheduler.job_completed + dbms_scheduler.job_stopped +dbms_scheduler.job_sch_lim_reached + dbms_scheduler.job_disabled +dbms_scheduler.job_chain_stalled Or in short, we can set it to: dbms_scheduler.job_all_events. There are many things that can be called events. In the job system, there are basically two types of events: events caused by jobs (which we already discussed) and events that makes a job execute.
Read more
  • 0
  • 0
  • 10224

article-image-customizing-page-management-liferay-portal-52-systems-development
Packt
20 Oct 2009
5 min read
Save for later

Customizing Page Management in Liferay Portal 5.2 Systems Development

Packt
20 Oct 2009
5 min read
Customizing page management with more features The Ext Manage Pages portlet not only clones the out of the box Manage Pages portlet, but it also extends the model and service — supporting customized data, for example, Keywords. We can make these Keywords localized too.   Adding localized feature Liferay portal is designed to handle as many languages as you want to support. By default, it supports up to 22 languages. When a page is loading, the portal will detect the language, pull up the corresponding language file, and display the text in the correct language. We want the Keywords to be localized too. For example, the default language is English (United States) and the localized language is Deutsch (Deutschland). Thus, you have the ability to enter not only the Name and HTML Title in German, but also the Keywords in German. As shown in the following screenshot, when you change the language of the page in German using the language portlet, you will see the entire web site changed to German, including the portlet title and input fields. For example, the title of the portlet now has the Ext Seiteneinstellungen value and the Keywords now become Schlüsselwörter. How do we implement this feature? In other words, how do we customize the language display in the page management? Let's add the localized feature for the Ext Manage Pages portlet. Extending model for locale First of all, we need to extend the model and to implement that model in order to support the localized feature. For the ExtLayout model, let's add the locale method first. Locate the ExtLayout.java file from the com.ext.portlet.layout.model package in the /ext/ext-service/src folder, and open it. Add the following lines before the line } in ExtLayout.java and save it: public String getKeywords(Locale locale);public String getKeywords(String localeLanguageId);public String getKeywords(Locale locale, boolean useDefault);public String getKeywords(String localeLanguageId, boolean useDefault);public void setKeywords(String keywords, Locale locale); As shown in the code above, it adds getting and setting methods for the Keywords field with locale features. Now let's add the implementation for the ExtLayout model: Locate the ExtLayoutImpl.java file from the com.ext.portlet.layout.model.impl package in the /ext/ext-impl/src folder and open it. Add the following lines before the last } in ExtLayoutImpl.java file and save it: public String getKeywords(Locale locale) { String localeLanguageId = LocaleUtil.toLanguageId(locale); return getKeywords(localeLanguageId);}public String getKeywords(String localeLanguageId) { return LocalizationUtil.getLocalization(getKeywords(), localeLanguageId);}public String getKeywords(Locale locale, boolean useDefault) { String localeLanguageId = LocaleUtil.toLanguageId(locale); return getKeywords(localeLanguageId, useDefault);}public String getKeywords(String localeLanguageId, boolean useDefault) { return LocalizationUtil.getLocalization( getKeywords(), localeLanguageId, useDefault);}public void setKeywords(String keywords, Locale locale) { String localeLanguageId = LocaleUtil.toLanguageId(locale); if (Validator.isNotNull(keywords)) { setKeywords(LocalizationUtil.updateLocalization( getKeywords(), "keywords", keywords, localeLanguageId)); } else { setKeywords(LocalizationUtil.removeLocalization( getKeywords(), "keywords", localeLanguageId)); }} As shown in the code above, it adds implementation for get and set methods of the ExtLayout model. Customizing language properties Language files have locale-specific definitions. By default, Language.properties (at /portal/portal-impl/src/content) contains English phrase variations further defined for United States, while Language_de.properties (at /portal/portal-impl/src/content) contains German phrase variations further defined for Germany. In Ext, Language-ext.properties (available at /ext/ext-impl/src/content) contains English phrase variations further defined for United States, while Language-ext_de.properties (should be available at /ext/ext-impl/src/content) contains German phrase variations further defined for Germany. First, let's add a message in Language-ext.properties, by using the following steps: Locate the Language-ext.properties file in the /ext/ext-impl/src/content folder and open it. Add the following line after the line view-reports=View Reports for Books and save it. keywords=Keywords This code specifies the keywords message key with a Keywords value in English: Then we need to add German language feature in Language-ext_de.properties as follows: Create a language file Language-ext_de.properties in the /ext/ext-impl/src/content folder and open it. Add the following lines at the beginning and save it: ## Portlet namesjavax.portlet.title.EXT_1=Berichtejavax.portlet.title.jsp_portlet=JSP Portletjavax.portlet.title.book_reports=Berichte für das Buchjavax.portlet.title.extLayoutManagement=Ext Seiteneinstellungenjavax.portlet.title.extCommunities=Ext Communities## Messagesview-reports=Ansicht-Berichte für Bücherkeywords=Schlüsselwörter## Category titlescategory.book=Buch## Model resourcesmodel.resource.com.ext.portlet.reports.model.ReportsEntry= Buch ## Action namesaction.ADD_BOOK=Fügen Sie Buch hinzu As shown in the code above, it specifies the same keys as that of Language-ext.properties. But all the keys' values were specified in German instead of English. For example, the message keywords has a Schlüsselwörter value in German. In addition, you can set German as the default language and Germany as the default country if it is required. Here are the simple steps to do so: Locate the system-ext.properties file in the /ext/ext-impl/src folder and open it. Add the following lines at the end of system-ext.properties and save it: user.country=DEuser.language=de The code above sets the default locale — the language German (Deutsch) and the country Germany (Deutschland). In general, there are many language files, for example Language-ext.properties and Language-ext_de.properties, and some language files would overwrite others in runtime loading. For example, Languageext_de.properties will overwrite Language-ext.properties when the language is set as German. These are the three simple rules which indicate the priorities of these language files: The ext versions take precedence over the non-ext versions. The language-specific versions, for example _de, take precedence over the non language-specific versions. The location-specific versions, such as -ext_de, take precedence over the non location-specific versions. For instance, the following is a ranking from bottom to top for the German language: Language-ext_de.properties Language_de.properties Language-ext.properties Language.properties
Read more
  • 0
  • 0
  • 2315

article-image-query-performance-tuning-microsoft-analysis-services-part-1
Packt
20 Oct 2009
41 min read
Save for later

Query Performance Tuning in Microsoft Analysis Services: Part 1

Packt
20 Oct 2009
41 min read
In this two-part article by Chris Webb, we will cover query performance tuning, including how to design aggregations and partitions and how to write efficient MDX. The first part will cover performance-specific design features, along with the concepts of partitions and aggregations. One of the main reasons for building Analysis Services cubes as part of a BI solution is because it should mean you get better query performance than if you were querying your relational database directly. While it's certainly the case that Analysis Services is very fast it would be naive to think that all of our queries, however complex, will return in seconds without any tuning being necessary. This article will describe the steps you'll need to go through in order to ensure your cube is as responsive as possible. How Analysis Services processes queries Before we start to discuss how to improve query performance, we need to understand what happens inside Analysis Services when a query is run. The two major parts of the Analysis Services engine are: The Formula Engine processes MDX queries, works out what data is needed to answer them, requests that data from the Storage Engine, and then performs all calculations needed for the query. The Storage Engine handles all reading and writing of data; it fetches the data that the Formula Engine requests when a query is run and aggregates it to the required granularity. When you run an MDX query, then, that query goes first to the Formula Engine where it is parsed; the Formula Engine then requests all of the raw data needed to answer the query from the Storage Engine, performs any calculations on that data that are necessary, and then returns the results in a cellset back to the user. There are numerous opportunities for performance tuning at all stages of this process, as we'll see. Performance tuning methodology When doing performance tuning there are certain steps you should follow to allow you to measure the effect of any changes you make to your cube, its calculations or the query you're running: Wherever possible, test your queries in an environment that is identical to your production environment. Otherwise ensure that the size of the cube and the server hardware you're running on is at least comparable, and running the same build of Analysis Services. Make sure that no-one else has access to the server you're running your tests on. You won't get reliable results if someone else starts running queries at the same time as you. Make sure that the queries you're testing with are equivalent to the ones that your users want to have tuned. As we'll see, you can use Profiler to capture the exact queries your users are running against the cube. Whenever you test a query, run it twice: first on a cold cache, and then on a warm cache. Make sure you keep a note of the time each query takes to run and what you changed on the cube or in the query for that run. Clearing the cache is a very important step—queries that run for a long time on a cold cache may be instant on a warm cache. When you run a query against Analysis Services, some or all of the results of that query (and possibly other data in the cube, not required for the query) will be held in cache so that the next time a query is run that requests the same data it can be answered from cache much more quickly. To clear the cache of an Analysis Services database, you need to execute a ClearCache XMLA command. To do this in SQL Management Studio, open up a new XMLA query window and enter the following: <Batch > <ClearCache> <Object> <DatabaseID>Adventure Works DW 2008</DatabaseID> </Object> </ClearCache></Batch> Remember that the ID of a database may not be the same as its name —you can check this by right-clicking on a database in the SQL Management Studio Object Explorer and selecting Properties. Alternatives to this method also exist: the MDX Studio tool allows you to clear the cache with a menu option, and the Analysis Services Stored Procedure Project (http://tinyurl.com/asstoredproc) contains code that allows you to clear the Analysis Services cache and the Windows File System cache directly from MDX. Clearing the Windows File System cache is interesting because it allows you to compare the performance of the cube on a warm and cold file system cache as well as a warm and cold Analysis Services cache: when the Analysis Services cache is cold or can't be used for some reason, a warm file system cache can still have a positive impact on query performance. After the cache has been cleared, before Analysis Services can answer a query it needs to recreate the calculated members, named sets and other objects defined in a cube's MDX script. If you have any reasonably complex named set expressions that need to be evaluated, you'll see some activity in Profiler relating to these sets being built and it's important to be able to distinguish between this and activity that's related to the queries you're actually running. All MDX Script related activity occurs between the Execute MDX Script Begin and Execute MDX Script End events; these are fired after the Query Begin event but before the Query Cube Begin event for the query run after the cache has been cleared. When looking at a Profiler trace you should either ignore everything between the Execute MDX Script Begin and End events or run a query that returns no data at all to trigger the evaluation of the MDX Script, for example: SELECT {} ON 0FROM [Adventure Works] Designing for performance Many of the recommendations for designing cubes we've given so far in this article have been given on the basis that they will improve query performance, and in fact the performance of a query is intimately linked to the design of the cube it's running against. For example, dimension design, especially optimizing attribute relationships, can have a significant effect on the performance of all queries—at least as much as any of the optimizations described in this article. As a result, we recommend that if you've got a poorly performing query the first thing you should do is review the design of your cube to see if there is anything you could do differently. There may well be some kind of trade-off needed between usability, manageability, time-to-develop, overall "elegance" of the design and query performance, but since query performance is usually the most important consideration for your users then it will take precedence. To put it bluntly, if the queries your users want to run don't run fast your users will not want to use the cube at all! Performance-specific design features Once you're sure that your cube design is as good as you can make it, it's time to look at two features of Analysis Services that are transparent to the end user but have an important impact on performance and scalability: measure group partitioning and aggregations. Both of these features relate to the Storage Engine and allow it to answer requests for data from the Formula Engine more efficiently. Partitions A partition is a data structure that holds some or all of the data held in a measure group. When you create a measure group, by default that measure group contains a single partition that contains all of the data. Enterprise Edition of Analysis Services allows you to divide a measure group into multiple partitions; Standard Edition is limited to one partition per measure group, and the ability to partition is one of the main reasons why you would want to use Enterprise Edition over Standard Edition. Why partition? Partitioning brings two important benefits: better manageability and better performance. Partitions within the same measure group can have different storage modes and different aggregation designs, although in practice they usually don't differ in these respects; more importantly they can be processed independently, so for example when new data is loaded into a fact table, you can process only the partitions that should contain the new data. Similarly, if you need to remove old or incorrect data from your cube, you can delete or reprocess a partition without affecting the rest of the measure group. Partitioning can also improve both processing performance and query performance significantly. Analysis Services can process multiple partitions in parallel and this can lead to much more efficient use of CPU and memory resources on your server while processing is taking place. Analysis Services can also fetch and aggregate data from multiple partitions in parallel when a query is run too, and again this can lead to more efficient use of CPU and memory and result in faster query performance. Lastly, Analysis Services will only scan the partitions that contain data necessary for a query and since this reduces the overall amount of IO needed this can also make queries faster. Building partitions You can view, create and delete partitions on the Partitions tab of the Cube Editor in BIDS. When you run the New Partition Wizard or edit the Source property of an existing partition, you'll see you have two options for controlling what data is used in the partition: Table Binding means that the partition contains all of the data in a table or view in your relational data source, or a named query defined in your DSV. You can choose the table you wish to bind to on the Specify Source Information step of the New Partition Wizard, or in the Partition Source dialog if you choose Table Binding from the Binding Type drop-down box. Query Binding allows you to specify an SQL SELECT statement to filter the rows you want from a table; BIDS will automatically generate part of the SELECT statement for you, and all you'll need to do is supply the WHERE clause. If you're using the New Partition Wizard, this is the option that will be chosen if you check the Specify a query to restrict rows checkbox on the second step of the wizard; in the Partition Source dialog you can choose this option from the Binding Type drop-down box. It might seem like query binding is the easiest way to filter your data, and while it's the most widely-used approach it does have one serious shortcoming: since it involves hard-coding an SQL SELECT statement into the definition of the partition, changes to your fact table such as the deletion or renaming of a column can mean the SELECT statement errors when it is run if that column is referenced in it. This means in turn will cause the partition processing to fail.. If you have a lot of partitions in your measure group—and it's not unusual to have over one hundred partitions on a large cube—altering the query used for each one is somewhat time-consuming. Instead, table-binding each partition to a view in your relational database will make this kind of maintenance much easier, although you do of course now need to generate one view for each partition. Alternatively, if you're building query-bound partitions from a single view on top of your fact table (which means you have complete control over the columns the view exposes), you could use a query like SELECT * FROM in each partition’s definition. It's very important that you check the queries you're using to filter your fact table for each partition. If the same fact table row appears in more than one partition, or if fact table rows don't appear in any partition, this will result in your cube displaying incorrect measure values. On the Processing and Storage Locations step of the wizard you have the chance to create the partition on a remote server instance, functionality that is called Remote Partitions. This is one way of scaling out Analysis Services: you can have a cube and measure group on one server but store some of the partitions for the measure group on a different server, something like a linked measure group but at a lower level. This can be useful for improving processing performance in situations when you have a very small time window available for processing but in general we recommend that you do not use remote partitions. They have an adverse effect on query performance and they make management of the cube (especially backup) very difficult. Also on the same step you have the chance to store the partition at a location other than the default of the Analysis Services data directory. Spreading your partitions over more than one volume may make it easier to improve the IO performance of your solution, although again it can complicate database backup and restore. After assigning an aggregation design to the partition (we'll talk about aggregations in detail next), the last important property to set on a partition is Slice. The Slice property takes the form of an MDX member, set or tuple—MDX expressions returning members, sets or tuples are not allowed however - and indicates what data is present in a partition. While you don't have to set it, we strongly recommend that you do so, even for MOLAP partitions, for the following reasons: While Analysis Services does automatically detect what data is present in a partition during processing, it doesn't always work as well as you'd expect and can result in unwanted partition scanning taking place at query time in a number of scenarios. The following blog entry on the SQLCat team site explains why in detail: http://tinyurl.com/partitionslicing It acts as a useful safety mechanism to ensure that you only load the data you're expecting into a partition. If, while processing, Analysis Services finds that data is being loaded into the partition that conflicts with what's specified in the Slice property, then processing will fail. More detail on how to set the Slice property can be found in Mosha Pasumansky's blog entry on the subject here: http://tinyurl.com/moshapartition Planning a partitioning strategy We now know why we should be partitioning our measure groups and what to do to create a partition; the next question is: how should we split the data in our partitions? We need to find some kind of happy medium between the manageability and performance aspects of partitioning—we need to split our data so that we do as little processing as possible, but also so that as few partitions are scanned as possible by our users' queries. Luckily, if we partition by our Time dimension we can usually meet both needs very well: it's usually the case that when new data arrives in a fact table it's for a single day, week or month, and it's also the case that the most popular way of slicing a query is by a time period. Therefore, it's almost always the case that when measure groups are partitioned they are partitioned by time. It's also worth considering, though, if it's a good idea to partition by time and another dimension: for example, in an international company you might have a Geography dimension and a Country attribute, and users may always be slicing their queries by Country too—in which case it might make sense to partition by Country. Measure groups that contain measures with the Distinct Count aggregation type require their own specific partitioning strategy. While you should still partition by time, you should also partition by non-overlapping ranges of values within the column you're doing the distinct count on. A lot more detail on this is available in the following white paper: http://tinyurl.com/distinctcountoptimize It's worth looking at the distribution of data over partitions for dimensions we're not explicitly slicing by, as there is often a dependency between data in these dimensions and the Time dimension: for example, a given Product may only have been sold in certain Years or in certain Countries. You can see the distribution of member DataIDs (the internal key values that Analysis Services creates for all members on a hierarchy) for a partition by querying the Discover_Partition_Dimension_Stat DMV, for example: SELECT *FROM SystemRestrictSchema($system.Discover_Partition_Dimension_Stat ,DATABASE_NAME = 'Adventure Works DW 2008' ,CUBE_NAME = 'Adventure Works' ,MEASURE_GROUP_NAME = 'Internet Sales' ,PARTITION_NAME = 'Internet_Sales_2003') The following screenshot shows what the results of this query look like: There's also a useful Analysis Services stored procedure that shows the same data and any partition overlaps included in the Analysis Services Stored Procedure Project (a free, community-developed set of sample Analysis Services stored procedures): http://tinyurl.com/partitionhealth. This blog entry describes how you can take this data and visualise it in a Reporting Services report: http://tinyurl.com/viewpartitionslice We also need to consider what size our partitions should be. In general between 5 and 20 million rows per partition, or up to around 3GB, is a good size. If you have a measure group with a single partition of below 5 million rows then don't worry, it will perform very well, but it's not worth dividing it into smaller partitions; it's equally possible to get good performance with partitions of 50-60 million rows. It's also best to avoid having too many partitions as well—if you have more than a few hundred it may make SQL Management Studio and BIDS slow to respond, and it may be worth creating fewer, larger partitions assuming these partitions stay within the size limits for a single partition we've just given. Automatically generating large numbers of partitionsWhen creating a measure group for the first time, it's likely you'll already have a large amount of data and may need to create a correspondingly large number of partitions for it. Clearly the last thing you'll want to do is create tens or hundreds of partitions manually and it's worth knowing some tricks to create these partitions automatically. One method involves taking a single partition, scripting it out to XMLA and then pasting and manipulating this in Excel, as detailed here: http://tinyurl.com/generatepartitions. The Analysis Services Stored Procedure Project also contains a set of functions for creating partitions automatically based on MDX set expressions: http://tinyurl.com/autopartition. Unexpected partition scans Even when you have configured your partitions properly it's sometimes the case that Analysis Services will scan partitions that you don't expect it to be scanning for a particular query. If you see this happening the first thing to determine is whether these extra scans are making a significant contribution to your query times. If they aren't, then it's probably not worth worrying about; if they are, there are some things to try to attempt to stop it happening. The extra scans could be the result of a number of factors, including: The way you have written MDX for queries or calculations. In most cases it will be very difficult to rewrite the MDX to stop the scans, but the following blog entry describes how it is possible in one scenario: http://tinyurl.com/moshapart The LastNonEmpty measure aggregation type may result in multiple partition scans. If you can restructure your cube so you can use the LastChild aggregation type, Analysis Services will only scan the last partition containing data for the current time period. In some cases, even when you've set the Slice property, Analysis Services has trouble working out which partitions should be scanned for a query. Changing the attributes mentioned in the Slice property may help, but not always. The section on Related Attributes and Almost Related Attributes in the following blog entry discusses this in more detail: http://tinyurl.com/mdxpartitions Analysis Services may also decide to retrieve more data than is needed for a query to make answering future queries more efficient. This behavior is called prefetching and can be turned off by setting the following connection string properties: Disable Prefetch Facts=True; Cache Ratio=1 More information on this can be found in the section on Prefetching and Request Ordering in the white paper Identifying and Resolving MDX Query Bottleneck available from http://tinyurl.com/mdxbottlenecks Note that setting these connection string properties can have other, negative effects on query performance. You can set connection string properties in SQL Management Studio when you open a new MDX Query window. Just click the Options button on the Connect to Analysis Services dialog, then go to the Additional Connection Parameters tab. Note that in the RTM version of SQL Management Studio there is a problem with this functionality, so that when you set a connection string property it will continue to be set for all connections, even though the textbox on the Additional Connection Parameters tab is blank, until SQL Management Studio is closed down or until you set the same property differently. Aggregations An aggregation is simply a pre-summarised data set, similar to the result of an SQL SELECT statement with a GROUP BY clause, that Analysis Services can use when answering queries. The advantage of having aggregations built in your cube is that it reduces the amount of aggregation that the Analysis Services Storage Engine has to do at query time, and building the right aggregations is one of the most important things you can do to improve query performance. Aggregation design is an ongoing process that should start once your cube and dimension designs have stabilised and which will continue throughout the lifetime of the cube as its structure and the queries you run against it change; in this section we'll talk about the steps you should go through to create an optimal aggregation design. Creating an initial aggregation design The first stage in creating an aggregation design should be to create a core set of aggregations that will be generally useful for most queries run against your cube. This should take place towards the end of the development cycle when you're sure that your cube and dimension designs are unlikely to change much, because any changes are likely to invalidate your aggregations and mean this step will have to be repeated. It can't be stressed enough that good dimension design is the key to getting the most out of aggregations: removing unnecessary attributes, setting AttributeHierarchyEnabled to False where possible, building optimal attribute relationships and building user hierarchies will all make the aggregation design process faster, easier and more effective. You should also take care to update the EstimatedRows property of each measure group and partition, and the EstimatedCount of each attribute before you start, and these values are also used by the aggregation design process. BIDS Helper adds a new button to the toolbar in thePartitions tab of the Cube Editor to update all of these count properties with one click. To build this initial set of aggregations we'll be running the Aggregation Design Wizard, which can be run by clicking the Design Aggregations button on the toolbar of the Aggregations tab of the Cube Editor. This wizard will analyse the structure of your cube and dimensions, look at various property values you've set, and try to come up with a set of aggregations that it thinks should be useful. The one key piece of information it doesn't have at this point is what queries you're running against the cube, so some of the aggregations it designs may not prove to be useful in the long-run, but running the wizard is extremely useful for creating a first draft of your aggregation designs. You can only design aggregations for one measure group at a time; if you have more than one partition in the measure group you've selected then the first step of the wizard asks you to choose which partitions you want to design aggregations for. An aggregation design can be associated with many partitions in a measure group, and a partition can be associated with just one aggregation design or none at all. We recommend that, in most cases, you have just one aggregation design for each measure group for the sake of simplicity. However if processing time is limited and you need to reduce the overall time spent building aggregations, or if query patterns are different for different partitions within the same measure group, then it may make sense to apply different aggregation designs to different partitions. The next step of the wizard asks you to review the AggregationUsage property of all the attributes on all of the cube dimensions in your cube; this property can also be set on the Cube Structure tab of the Cube Editor. The following figure shows the Aggregation Design Wizard: The AggregationUsage property controls how dimension attributes are treated in the aggregation design process. The property can be set to the following values: Full: This means the attribute, or an attribute at a lower granularity directly related to it by an attribute relationship, will be included in every single aggregation the wizard builds. We recommend that you use this value sparingly, for at most one or two attributes in your cube, because it can significantly reduce the number of aggregations that get built. You should set it for attributes that will almost always get used in queries. For example, if the vast majority of your queries are at the Month granularity it makes sense that all of your aggregations include the Month attribute from your Time dimension. None: This means the attribute will not be included in any aggregation that the wizard designs. Don't be afraid of using this value for any attributes that you don't think will be used often in your queries, it can be a useful way of ensuring that the attributes that are used often get good aggregation coverage. Note that Attributes with AttributeHierarchyEnabled set to False will have no aggregations designed for them anyway. Unrestricted: This means that the attribute may be included in the aggregations designed, depending on whether the algorithm used by the wizard considers it to be useful or not. Default: The default option applies a complex set of rules, which are: The granularity attribute (usually the key attribute, unless you specified otherwise in the dimension usage tab) is treated as Unrestricted. All attributes on dimensions involved in many-to-many relationships, unmaterialised referenced relationships, and data mining dimensions are treated as None. Aggregations may still be built at the root granularity, that is, the intersection of every All Member on every attribute. All attributes that are used as levels in natural user hierarchies are treated as Unrestricted. Attributes with IsAggregatable set to False are treated as Full. All other attributes are treated as None The next step in the wizard asks you to verify the number of EstimatedRows and EstimatedCount properties we've already talked about, and gives the option of setting a similar property that shows the estimated number of members from an attribute that appear in any one partition. This can be an important property to set: if you are partitioning by month, although you may have 36 members on your Month attribute a partition will only contain data for one of them. On the Set Aggregation Options step you finally reach the point where some aggregations can be built. Here you can apply one last set of restrictions on the set of aggregations that will be built, choosing to either: Estimated Storage Reaches, which means you build aggregations to fill a given amount of disk space. Performance Gain Reaches, the most useful option. It does not mean that all queries will run n% faster; nor does it mean that a query that hits an aggregation directly will run n% faster. Think of it like this: if the wizard built all the aggregations it thought were useful to build (note: this is not the same thing as all of the possible aggregations that could be built on the cube) then, in general, performance would be better. Some queries would not benefit from aggregations, some would be slightly faster, and some would be a lot faster; and some aggregations would be more often used than others. So if you set this property to 100% the wizard would build all the aggregations that it could, and you'd get 100% of the performance gain possible from building aggregations. Setting this property to 30%, the default and recommended value, will build the aggregations that give you 30% of this possible performance gain—not 30% of the possible aggregations, usually a much smaller number. As you can see from the screenshot below, the graph drawn on this step plots the size of the aggregations built versus overall performance gain, and the shape of the curve shows that a few, smaller aggregations usually provide the majority of the performance gain. I Click Stop, which means carry on building aggregations until you click the Stop button. Designing aggregations can take a very long time, especially on more complex cubes, because there may literally be millions or billions of possible aggregations that could be built. In fact, it's not unheard of for the aggregation design wizard to run for several days before it's stopped! Do Not Design Aggregations allows you to skip designing aggregations. The approach we suggest taking here is to first select I Click Stop and then click the Start button. On some measure groups this will complete very quickly, with only a few small aggregations built. If that's the case click Next; otherwise, if it's taking too long or too many aggregations are being built, click Stop and then Reset, and then select Performance Gain Reaches and enter 30% and Start again. This should result in a reasonable selection of aggregations being built; in general around 50-100 aggregations is the maximum number you should be building for a measure group, and if 30% leaves you short of this try increasing the number by 10% until you feel comfortable with what you get. On the final step of the wizard, enter a name for your aggregation design and save it. It's a good idea to give the aggregation design a name including the name of the measure group to make it easier to find if you ever need to script it to XMLA. It's quite common that Analysis Services cube developers stop thinking about aggregation design at this point. This is a serious mistake: just because you have run the Aggregation Design Wizard does not mean you have built all the aggregations you need, or indeed any useful ones at all! Doing Usage-Based Optimisation and/or building aggregations manually is absolutely essential. Usage-based optimization We now have some aggregations designed, but the chances are that despite our best efforts many of them will not prove to be useful. To a certain extent we might be able to pick out these aggregations by browsing through them; really, though, we need to know what queries our users are going to run before we can build aggregations to make them run faster. This is where usage-based optimisation comes in: it allows us to log the requests for data that Analysis Services makes when a query is run and then feed this information into the aggregation design process. To be able to do usage-based optimization, you must first set up Analysis Services to log these requests for data. This involves specifying a connection string to a relational database in the server properties of your Analysis Services instance and allowing Analysis Services to create a log table in that database. The white paper Configuring the Analysis Services Query Log contains more details on how to do this (it's written for Analysis Services 2005 but is still relevant for Analysis Services 2008), and can be downloaded from http://tinyurl.com/ssasquerylog. The query log is a misleading name, because as you'll see if you look inside it it doesn't actually contain the text of MDX queries run against the cube. When a user runs an MDX query, Analysis Services decomposes it into a set of requests for data at a particular granularity and it's these requests that are logged; we'll look at how to interpret this information in the next section. A single query can result in no requests for data, or it can result in as many as hundreds or thousands of requests, especially if it returns a lot of data and a lot of MDX calculations are involved. When setting up the log you also have to specify the percentage of all data requests that Analysis Services actually logs with the QueryLogSampling property—in some cases if it logged every single request you would end up with a very large amount of data very quickly, but on the other hand if you set this value too low you may end up not seeing certain important long-running requests. We recommend that you start by setting this property to 100 but that you monitor the size of the log closely and reduce the value if you find that the number of requests logged is too high. Once the log has been set up, let your users start querying the cube. Explain to them what you're doing and that some queries may not perform well at this stage. Given access to a new cube it will take them a little while to understand what data is present and what data they're interested in; if they're new to Analysis Services it's also likely they'll need some time to get used to whatever client tool they're using. Therefore you'll need to have logging enabled for at least a month or two before you can be sure that your query log contains enough useful information. Remember that if you change the structure of the cube while you're logging then the existing contents of the log will no longer be usable. Last of all, you'll need to run the Usage-Based Optimisation Wizard to build new aggregations using this information. The Usage-Based Optimisation Wizard is very similar to the Design Aggregations Wizard, with the added option to filter the information in the query log by date, user and query frequency before it's used to build aggregations. It's a good idea to do this filtering carefully: you should probably exclude any queries you've run yourself, for example, since they're unlikely to be representative of what the users are doing, and make sure that the most important users queries are over-represented. Once you've done this you'll have a chance to review what data is actually going to be used before you actually build the aggregations. On the last step of the wizard you have the choice of either creating a new aggregation design or merging the aggregations that have just been created with an existing aggregation design. We recommend the latter: what you've just done is optimize queries that ran slowly on an existing aggregation design, and if you abandon the aggregations you've already got then it's possible that queries which previously had been quick would be slow afterwards. This exercise should be repeated at regular intervals throughout the cube's lifetime to ensure that you built any new aggregations that are necessary as the queries that your users run change. Query logging can, however, have an impact on query performance so it's not a good idea to leave logging running all the time. Processing aggregationsWhen you've created or edited the aggregations on one or more partitions, you don't need to do a full process on the partitions. All you need to do is to deploy your changes and then run a ProcessIndex, which is usually fairly quick, and once you've done that queries will be able to use the new aggregations. When you run a ProcessIndex Analysis Services does not need to run any SQL queries against the relational data source if you're using MOLAP storage. Monitoring partition and aggregation usage Having created and configured your partitions and aggregations, you'll naturally want to be sure that when you run a query Analysis Services is using them as you expect. You can do this very easily by running a trace with SQL Server Profiler or by using MDX Studio (a free MDX Editor that can be downloaded from http://tinyurl.com/mdxstudio). To use Profiler, start it and then connect to your Analysis Services instance to create a new trace. On the Trace Properties dialog choose the Blank template and go to the Events Selection tab and check the following: Progress ReportsProgress Report Begin Progress ReportsProgress Report End Queries EventsQuery Begin Queries EventsQuery End Query ProcessingExecute MDX Script Begin Query ProcessingExecute MDX Script End Query ProcessingQuery Cube Begin Query ProcessingQuery Cube End Query ProcessingGet Data From Aggregation Query ProcessingQuery Subcube Verbose Then clear the cache and click Run to start the trace. Once you've done this you can either open up your Analysis Services client tool or you can start running MDX queries in SQL Management Studio. When you do this you'll notice that Profiler starts to show information about what Analysis Services is doing internally to answer these queries. The following screenshot shows what you might typically see: Interpreting the results of a Profiler trace is a complex task and well outside the scope of this article, but it's very easy to pick out some useful information relating to aggregation and partition usage. Put simply: The Query Subcube Verbose events represent individual requests for data from the Formula Engine to the Storage Engine, which can be answered either from cache, an aggregation or base-level partition data. Each of these requests is at a single granularity, meaning that all of the data in the request comes from a single distinct combination of attributes; we refer to these granularities as "subcubes". The TextData column for this event shows the granularity of data that is being requested in human readable form; the Query Subcube event will display exactly the same data but in the less friendly-format that the Usage-Based Optimisation Query Log uses. Pairs of Progress Report Begin and Progress Report End events show that data is being read from disk, either from an aggregation or a partition. The TextData column gives more information, including the name of the object being read; however, if you have more than one object (for example an aggregation) with the same name, you need to look at the contents of the ObjectPath column to see what object exactly is being queried. The Get Data From Aggregation event is fired when data is read from an aggregation, in addition to any Progress Report events. The Duration column shows how long each of these operations takes in milliseconds. At this point in the cube optimisation process you should be seeing in Profiler that when your users run queries they hit as few partitions as possible and hit aggregations as often as possible. If you regularly see slow queries that scan all the partitions in your cube or which do not use any aggregations at all, you should consider going back to the beginning of the process and rethinking your partitioning strategy and rerunning the aggregation design wizards. In a production system many queries will be answered from cache and therefore be very quick, but you should always try to optimise for the worst-case scenario of a query running on a cold cache. Building aggregations manually However good the aggregation designs produced by the wizards are, it's very likely that at some point you'll have to design aggregations manually for particular queries. Even after running the Usage Based Optimisation Wizard you may find that it still does not build some potentially useful aggregations: the algorithm the wizards use is very complex and something of a black box, so for whatever reason (perhaps because it thinks it would be too large) it may decide not to build an aggregation that, when built manually, turns out to have a significant positive impact on the performance of a particular query. Before we can build aggregations manually we need to work out which aggregations we need to build. To do this, we once again need to use Profiler and look at either the Query Subcube or the Query Subcube Verbose events. These events, remember, display the same thing in two different formats - requests for data made to the Analysis Services storage engine during query processing - and the contents of the Duration column in Profiler will show how long in milliseconds each of these requests took. A good rule of thumb is that any Query Subcube event that takes longer than half a second (500 ms) would benefit from having an aggregation built for it; you can expect that a Query Subcube event that requests data at the same granularity as an aggregation will execute almost instantaneously. The following screenshot shows an example of trace on an MDX query that takes 700ms: The single Query Subcube Verbose event is highlighted, and we can see that the duration of this event is the same as that of the query itself, so if we want to improve the performance of the query we need to build an aggregation for this particular request. Also, in the lower half of the screen we can see the contents of the TextData column displayed. This shows a list of all the dimensions and attributes from which data is being requested —the granularity of the request—and the simple rule to follow here is that whenever you see anything other than a zero by an attribute we know that the granularity of the request includes this attribute. We need to make a note of all of the attributes which have anything other than a zero next to them and then build an aggregation using them; in this case it's just the Product Category attribute of the Product dimension. The white paper Identifying and Resolving MDX Query Performance Bottlenecks (again, written for Analysis Services 2005 but still relevant for Analysis Services 2008), available from http://tinyurl.com/mdxbottlenecks, includes more detailed information on how to interpret the information given by the Query Subcube Verbose event. So now that we know what aggregation we need to build, we need to go ahead and build it. We have a choice of tools to do this: we can either use the functionality built into BIDS, or we can use some of the excellent functionality that BIDS Helper provides. In BIDS, to view and edit aggregations, you need to go to the Aggregations tab in the cube editor. On the Standard View you only see a list of partitions and which aggregation designs they have associated with them; if you switch to the Advanced View by pressing the appropriate button on the toolbar, you can view the aggregations in each aggregation design for each measure group. If you right-click in the area where the aggregations are displayed you can also create a new aggregation and once you've done that you can specify the granularity of the aggregation by checking and unchecking the attributes on each dimension. For our particular query we only need to check the box next to the Product Categories attribute, as follows: The small tick at the top of the list of dimensions in the Status row shows that this aggregation has passed the built-in validation rules that BIDS applies to make sure this is a useful aggregation. If you see an amber warning triangle here, hover over it with your mouse and in the tooltip you'll see a list of reasons why the aggregation has failed its status check. If we then deploy and run a ProcessIndex, we can then rerun our original query and watch it use the new aggregation, running much faster as a result: The problem with the native BIDS aggregation design functionality is that it becomes difficult to use when you have complex aggregations to build and edit. The functionality present in BIDS Helper, while it looks less polished, is far more useable and offers many benefits over the BIDS native functionality, for example: The BIDS Helper Aggregation Design interface displays the aggregation granularity in the same way (ie using 1s and 0s, as seen in the screenshot below) as the Query Subcube event in Profiler does, making it easier to cross reference between the two. It also shows attribute relationships when it displays the attributes on each dimension when you're designing an aggregation, as seen on the righthand side in the screenshot that follows. This is essential to being able to build optimal aggregations. It also shows whether an aggregation is rigid or flexible. It has functionality to remove duplicate aggregations and ones containing redundant attributes (see below), and search for similar aggregations. It allows you to create new aggregations based on the information stored in the Query Log. It also allows you to delete unused aggregations based on information from a Profiler trace. Finally, it has some very comprehensive functionality to allow you to test the performance of the aggregations you build (see http://tinyurl.com/testaggs). Unsurprisingly, if you need to do any serious work designing aggregations manually we recommend using BIDS Helper over the built-in functionality. Common aggregation design issues Several features of your cube design must be borne in mind when designing aggregations, because they can influence how Analysis Services storage engine queries are made and therefore which aggregations will be used. These include: There's no point building aggregations above the granularity you are slicing your partitions by. Aggregations are built on a per-partition basis, so for example if you're partitioning by month there's no value in building an aggregation at the Year granularity since no aggregation can contain more than one month's worth of data. It won't hurt if you do it, it just means that an aggregation at month will be the same size as one at year but useful to more queries. It follows from this that it might be possible to over-partition data and reduce the effectiveness of aggregations, but we have anecdotal evidence from people who have built very large cubes that this is not an issue. For queries involving a dimension with a many-to-many relationship to a measure group, aggregations must not be built using any attributes from the many-to-many dimension, but instead must be built at the granularity of the attributes with a regular relationship to the intermediate measure group. When a query is run using the Sales Reason dimension Analysis Services fi rst works out which Sales Orders relate to each Sales Reason, and then queries the main measure group for these Sales Orders. Therefore, only aggregations at the Sales Order granularity on the main measure group can be used. As a result, in most cases it's not worth building aggregations for queries on many-to-many dimensions since the granularity of these queries is often close to that of the original fact table. Queries involving measures which have semi-additive aggregation types are always resolved at the granularity attribute of the time dimension, so you need to include that attribute in all aggregations. Queries involving measures with measure expressions require aggregations at the common granularity of the two measure groups involved. You should not build aggregations including a parent/child attribute; instead you should use the key attribute in aggregations. No aggregation should include an attribute which has AttributeHierarchyEnabled set to False. No aggregation should include an attribute that is below the granularity attribute of the dimension for the measure group. Any attributes which have a default member that is anything other than the All Member, or which have IsAggregatable set to False, should also be included in all aggregations. Aggregations and indexes are not built for partitions with fewer than 4096 rows. This threshold is set by the IndexBuildThreshold property in msmdsrv.ini; you can change it but it's not a good idea to do so. Aggregations should not include redundant attributes, that is to say attributes from the same 'chain' of attribute relationships. For example if you had a chain of attribute relationships going from month to quarter to year, you should not build an aggregation including month and quarter—it should just include month. This will increase the chance that the aggregation can be used by more queries, as well as reducing the size of the aggregation. Summary In this part of the article we covered performance-specific design features such as partitions and aggregations. In the next part, we will cover MDX calculation performance and caching.
Read more
  • 0
  • 0
  • 20246

article-image-query-performance-tuning-microsoft-analysis-services-part-2
Packt
20 Oct 2009
21 min read
Save for later

Query Performance Tuning in Microsoft Analysis Services: Part 2

Packt
20 Oct 2009
21 min read
MDX calculation performance Optimizing the performance of the Storage Engine is relatively straightforward: you can diagnose performance problems easily and you only have two options—partitioning and aggregation—for solving them. Optimizing the performance of the Formula Engine is much more complicated because it requires knowledge of MDX, diagnosing performance problems is difficult because the internal workings of the Formula Engine are hard to follow, and solving the problem is reliant on knowing tips and tricks that may change from service pack to service pack. Diagnosing Formula Engine performance problems If you have a poorly-performing query, and if you can rule out the Storage Engine as the cause of the problem, then the issue is with the Formula Engine. We've already seen how we can use Profiler to check the performance of Query Subcube events, to see which partitions are being hit and to check whether aggregations are being used; if you subtract the sum of the durations of all the Query Subcube events from the duration of the query as a whole, you'll get the amount of time spent in the Formula Engine. You can use MDX Studio's Profile functionality to do the same thing much more easily—here's a screenshot of what it outputs when a calculation-heavy query is run: The following blog entry describes this functionality in detail: http://tinyurl.com/mdxtrace; but what this screenshot displays is essentially the same thing that we'd see if we ran a Profiler trace when running the same query on a cold and warm cache, but in a much more easy-to-read format. The column to look at here is the Ratio to Total, which shows the ratio of the duration of each event to the total duration of the query. We can see that on both a cold cache and a warm cache the query took almost ten seconds to run but none of the events recorded took anywhere near that amount of time: the highest ratio to parent is 0.09%. This is typical of what you'd see with a Formula Engine-bound query. Another hallmark of a query that spends most of its time in the Formula Engine is that it will only use one CPU, even on a multiple-CPU server. This is because the Formula Engine, unlike the Storage Engine, is single-threaded. As a result if you watch CPU usage in Task Manager while you run a query you can get a good idea of what's happening internally: high usage of multiple CPUs indicates work is taking place in the Storage Engine, while high usage of one CPU indicates work is taking place in the Formula Engine. Calculation performance tuning Having worked out that the Formula Engine is the cause of a query's poor performance then the next step is, obviously, to try to tune the query. In some cases you can achieve impressive performance gains (sometimes of several hundred percent) simply by rewriting a query and the calculations it depends on; the problem is knowing how to rewrite the MDX and working out which calculations contribute most to the overall query duration. Unfortunately Analysis Services doesn't give you much information to use to solve this problem and there are very few tools out there which can help either, so doing this is something of a black art. There are three main ways you can improve the performance of the Formula Engine: tune the structure of the cube it's running on, tune the algorithms you're using in your MDX, and tune the implementation of those algorithms so they use functions and expressions that Analysis Services can run efficiently. We've already talked in depth about how the overall cube structure is important for the performance of the Storage Engine and the same goes for the Formula Engine; the only thing to repeat here is the recommendation that if you can avoid doing a calculation in MDX by doing it at an earlier stage, for example in your ETL or in your relational source, and do so without compromising functionality, you should do so. We'll now go into more detail about tuning algorithms and implementations. Mosha Pasumansky's blog, http://tinyurl.com/moshablog, is a goldmine of information on this subject. If you're serious about learning MDX we recommend that you subscribe to it and read everything he's ever written. Tuning algorithms used in MDX Tuning an algorithm in MDX is much the same as tuning an algorithm in any other kind of programming language—it's more a matter of understanding your problem and working out the logic that provides the most efficient solution than anything else. That said, there are some general techniques that can be used often in MDX and which we will walk through here. Using named sets to avoid recalculating set expressions Many MDX calculations involve expensive set operations, a good example being rank calculations where the position of a tuple within an ordered set needs to be determined. The following query includes a calculated member that displays Dates on the Rows axis of a query, and on columns shows a calculated measure that returns the rank of that date within the set of all dates based on the value of the Internet Sales Amount measure: WITH MEMBER MEASURES.MYRANK AS Rank ( [Date].[Date].CurrentMember ,Order ( [Date].[Date].[Date].MEMBERS ,[Measures].[Internet Sales Amount] ,BDESC ) )SELECT MEASURES.MYRANK ON 0 ,[Date].[Date].[Date].MEMBERS ON 1 FROM [Adventure Works] It runs very slowly, and the problem is that every time the calculation is evaluated it has to evaluate the Order function to return the set of ordered dates. In this particular situation, though, you can probably see that the set returned will be the same every time the calculation is called, so it makes no sense to do the ordering more than once. Instead, we can create a named set hold the ordered set and refer to that named set from within the calculated measure, so: WITH SET ORDEREDDATES AS Order ( [Date].[Date].[Date].MEMBERS ,[Measures].[Internet Sales Amount] ,BDESC ) MEMBER MEASURES.MYRANK AS Rank ( [Date].[Date].CurrentMember ,ORDEREDDATES ) SELECT MEASURES.MYRANK ON 0 ,[Date].[Date].[Date].MEMBERS ON 1 FROM [Adventure Works] This version of the query is many times faster, simply as a result of improving the algorithm used; the problem is explored in more depth in this blog entry: http://tinyurl.com/mosharank Since normal named sets are only evaluated once they can be used to cache set expressions in some circumstances; however, the fact that they are static means they can be too inflexible to be useful most of the time. Note that normal named sets defined in the MDX Script are only evaluated once, when the MDX script executes and not in the context of any particular query, so it wouldn't be possible to change the example above so that the set and calculated measure were defined on the server. Even named sets defined in the WITH clause are evaluated only once, in the context of the WHERE clause, so it wouldn't be possible to crossjoin another hierarchy on columns and use this approach, because for it to work the set would have to be reordered once for each column. The introduction of dynamic named sets in Analysis Services 2008 improves the situation a little, and other more advanced techniques can be used to work around these issues, but in general named sets are less useful than you might hope. For further reading on this subject see the following blog posts: http://tinyurl.com/chrisrankhttp://tinyurl.com/moshadsetshttp://tinyurl.com/chrisdsets Using calculated members to cache numeric values In the same way that you can avoid unnecessary re-evaluations of set expressions by using named sets, you can also rely on the fact that the Formula Engine can (usually) cache the result of a calculated member to avoid recalculating expressions which return numeric values. What this means in practice is that anywhere in your code you see an MDX expression that returns a numeric value repeated across multiple calculations, you should consider abstracting it to its own calculated member; not only will this help performance, but it will improve the readability of your code. For example, take the following slow query which includes two calculated measures: WITH MEMBER [Measures].TEST1 AS [Measures].[Internet Sales Amount] / Count ( TopPercent ( { [Scenario].[Scenario].&[1] ,[Scenario].[Scenario].&[2] }* [Account].[Account].[Account].MEMBERS* [Date].[Date].[Date].MEMBERS ,10 ,[Measures].[Amount] ) )MEMBER [Measures].TEST2 AS [Measures].[Internet Tax Amount] / Count ( TopPercent ( { [Scenario].[Scenario].&[1] ,[Scenario].[Scenario].&[2] }* [Account].[Account].[Account].MEMBERS* [Date].[Date].[Date].MEMBERS* [Department].[Departments].[Department Level 02].MEMBERS ,10 ,[Measures].[Amount] ) )SELECT { [Measures].TEST1 ,[Measures].TEST2 } ON 0 ,[Customer].[Gender].[Gender].MEMBERS ON 1FROM [Adventure Works] A quick glance over the code shows that a large section of it occurs twice in both calculations—everything inside the Count function. If we remove that code to its own calculated member as follows: WITH MEMBER [Measures].Denominator AS Count ( TopPercent ( { [Scenario].[Scenario].&[1] ,[Scenario].[Scenario].&[2] }* [Account].[Account].[Account].MEMBERS* [Date].[Date].[Date].MEMBERS ,10 ,[Measures].[Amount] ) )MEMBER [Measures].TEST1 AS [Measures].[Internet Sales Amount] / [Measures].DenominatorMEMBER [Measures].TEST2 AS [Measures].[Internet Tax Amount] / [Measures].DenominatorSELECT { [Measures].TEST1 ,[Measures].TEST2 } ON 0 ,[Customer].[Gender].[Gender].MEMBERS ON 1FROM [Adventure Works] The query runs much faster, simply because instead of evaluating the count twice for each of the two visible calculated measures, we evaluate it once, cache the result in the calculated measure Denominator and then reference this in the other calculated measures. It's also possible to find situations where you can rewrite code to avoid evaluating a calculation that always returns the same result over different cells in the multidimensional space of the cube. This is much more difficult to do effectively though; the following blog entry describes how to do it in detail: http://tinyurl.com/fecache Tuning the implementation of MDX Like just about any other software product, Analysis Services is able to do some things more efficiently than others. It's possible to write the same query or calculation using the same algorithm but using different MDX functions and see a big difference in performance; as a result, we need to know which are the functions we should use and which ones we should avoid. Which ones are these though? Luckily MDX Studio includes functionality to analyse MDX code and flag up such problems—to do this you just need to click the Analyze button—and there's even an online version of MDX Studio that allows you to do this too, available at: http://mdx.mosha.com/. We recommend that you run any MDX code you write through this functionality and take its suggestions on board. Mosha walks through an example of using MDX Studio to optimise a calculation on his blog here: http://tinyurl.com/moshaprodvol Block computation versus cell-by-cellWhen the Formula Engine has to evaluate an MDX expression for a query it can basically do so in one of two ways. It can evaluate the expression for each cell returned by the query, one at a time, an evaluation mode known as "cell-by-cell"; or it can try to analyse the calculations required for the whole query and find situations where the same expression would need to be calculated for multiple cells and instead do it only once, an evaluation mode known variously as "block computation" or "bulk evaluation". Block computation is only possible in some situations, depending on how the code is written, but is often many times more efficient than cell-by-cell mode. As a result, we want to write MDX code in such a way that the Formula Engine can use block computation as much as possible, and when we talk about using efficient MDX functions or constructs then this is what we in fact mean. Given that different calculations in the same query, and different expressions within the same calculation, can be evaluated using block computation and cell-by-cell mode, it’s very difficult to know which mode is used when. Indeed in some cases Analysis Services can’t use block mode anyway, so it’s hard know whether we have written our MDX in the most efficient way possible. One of the few indicators we have is the Perfmon counter MDXTotal Cells Calculated, which basically returns the number of cells in a query that were calculated in cell-by-cell mode; if a change to your MDX increments this value by a smaller amount than before, and the query runs faster, you're doing something right. The list of rules that MDX Studio applies is too long to list here, and in any case it is liable to change in future service packs or versions; another good guide for Analysis Services 2008 best practices exists in the Books Online topic Performance Improvements for MDX in SQL Server 2008 Analysis Services, available online here: http://tinyurl.com/mdximp. However, there are a few general rules that are worth highlighting: Don't use the Non_Empty_Behavior calculation property in Analysis Services 2008, unless you really know how to set it and are sure that it will provide a performance benefit. It was widely misused with Analysis Services 2005 and most of the work that went into the Formula Engine for Analysis Services 2008 was to ensure that it wouldn't need to be set for most calculations. This is something that needs to be checked if you're migrating an Analysis Services 2005 cube to 2008. Never use late binding functions such as LookupCube, or StrToMember or StrToSet without the Constrained flag, inside calculations since they have a serious negative impact on performance. It's almost always possible to rewrite calculations so they don't need to be used; in fact, the only valid use for StrToMember or StrToSet in production code is when using MDX parameters. The LinkMember function suffers from a similar problem but is less easy to avoid using it. Use the NonEmpty function wherever possible; it can be much more efficient than using the Filter function or other methods. Never use NonEmptyCrossjoin either: it's deprecated, and everything you can do with it you can do more easily and reliably with NonEmpty. Lastly, don't assume that whatever worked best for Analysis Services 2000 or 2005 is still best practice for Analysis Services 2008. In general, you should always try to write the simplest MDX code possible initially, and then only change it when you find performance is unacceptable. Many of the tricks that existed to optimise common calculations for earlier versions now perform worse on Analysis Services 2008 than the straightforward approaches they were designed to replace. Caching We've already seen how Analysis Services can cache the values returned in the cells of a query, and how this can have a significant impact on the performance of a query. Both the Formula Engine and the Storage Engine can cache data, but may not be able to do so in all circumstances; similarly, although Analysis Services can share the contents of the cache between users there are several situations where it is unable to do so. Given that in most cubes there will be a lot of overlap in the data that users are querying, caching is a very important factor in the overall performance of the cube and as a result ensuring that as much caching as possible is taking place is a good idea. Formula cache scopes There are three different cache contexts within the Formula Engine, which relate to how long data can be stored within the cache and how that data can be shared between users: Query Context, which means that the results of calculations can only be cached for the lifetime of a single query and so cannot be reused by subsequent queries or by other users. Session Context, which means the results of calculations are cached for the lifetime of a session and can be reused by subsequent queries in the same session by the same user. Global Context, which means the results of calculations are cached until the cache has to be dropped because data in the cube has changed (usually when some form of processing takes place on the server). These cached values can be reused by subsequent queries run by other users as well as the user who ran the original query. Clearly the Global Context is the best from a performance point of view, followed by the Session Context and then the Query Context; Analysis Services will always try to use the Global Context wherever possible, but it is all too easy to accidentally write queries or calculations that force the use of the Session Context or the Query Context. Here's a list of the most important situations when that can happen: If you define any calculations (not including named sets) in the WITH clause of a query, even if you do not use them, then Analysis Services can only use the Query Context (see http://tinyurl.com/chrisfecache for more details). If you define session-scoped calculations but do not define calculations in the WITH clause, then the Session Context must be used. Using a subselect in a query will force the use of the Query Context (see http://tinyurl.com/chrissubfe). Use of the CREATE SUBCUBE statement will force the use of the Session Context. When a user connects to a cube using a role that uses cell security, then the Query Context will be used. When calculations are used that contain non-deterministic functions (functions which could return different results each time they are called), for example the Now() function that returns the system date and time, the Username() function or any Analysis Services stored procedure, then this forces the use of the Query Context. Other scenarios that restrict caching Apart from the restrictions imposed by cache context, there are other scenarios where caching is either turned off or restricted. When arbitrary-shaped sets are used in the WHERE clause of a query, no caching at all can take place in either the Storage Engine or the Formula Engine. An arbitrary-shaped set is a set of tuples that cannot be created by a crossjoin, for example: ({([Customer].[Country].&[Australia], [Product].[Category].&[1]),([Customer].[Country].&[Canada], [Product].[Category].&[3])}) If your users frequently run queries that use arbitrary-shaped sets then this can represent a very serious problem, and you should consider redesigning your cube to avoid it. The following blog entries discuss this problem in more detail: http://tinyurl.com/tkarbsethttp://tinyurl.com/chrisarbset Even within the Global Context, the presence of security can affect the extent to which cache can be shared between users. When dimension security is used the contents of the Formula Engine cache can only be shared between users who are members of roles which have the same permissions. Worse, the contents of the Formula Engine cache cannot be shared between users who are members of roles which use dynamic security at all, even if those users do in fact share the same permissions. Cache warming Since we can expect many of our queries to run instantaneously on a warm cache, and the majority at least to run faster on a warm cache than on a cold cache, it makes sense to preload the cache with data so that when users come to run their queries they will get warm-cache performance. There are two basic ways of doing this, running CREATE CACHE statements and automatically running batches of queries. Create Cache statement The CREATE CACHE statement allows you to load a specified subcube of data into the Storage Engine cache. Here's an example of what it looks like: CREATE CACHE FOR [Adventure Works] AS({[Measures].[Internet Sales Amount]}, [Customer].[Country].[Country].MEMBERS,[Date].[Calendar Year].[Calendar Year].MEMBERS) More detail on this statement can be found here: http://tinyurl.com/createcache CREATE CACHE statements can be added to the MDX Script of the cube so they execute every time the MDX Script is executed, although if the statements take a long time to execute (as they often do) then this might not be a good idea; they can also be run after processing has finished from an Integration Services package using an Execute SQL task or through ASCMD, and this is a much better option because it means you have much more control over when the statements actually execute—you wouldn't want them running every time you cleared the cache, for instance. Running batches of queries The main drawback of the CREATE CACHE statement is that it can only be used to populate the Storage Engine cache, and in many cases it's warming the Formula Engine cache that makes the biggest difference to query performance. The only way to do this is to find a way to automate the execution of large batches of MDX queries (potentially captured by running a Profiler trace while users go about their work) that return the results of calculations and so which will warm the Formula Engine cache. This automation can be done in a number of ways, for example by using the ASCMD command line utility which is part of the sample code for Analysis Services that Microsoft provides (available for download here: http://tinyurl.com/sqlprodsamples); another common option is to use an Integration Services package to run the queries, as described in the following blog entries— http://tinyurl.com/chriscachewarm and http://tinyurl.com/allancachewarm This approach is not without its own problems, though: it can be very difficult to make sure that the queries you're running return all the data you want to load into cache, and even when you have done that, user query patterns change over time so ongoing maintenance of the set of queries is important. Scale-up and scale-out Buying better or more hardware should be your last resort when trying to solve query performance problems: it's expensive and you need to be completely sure that it will indeed improve matters. Adding more memory will increase the space available for caching but nothing else; adding more or faster CPUs will lead to faster queries but you might be better off investing time in building more aggregations or tuning your MDX. Scaling up as much as your hardware budget allows is a good idea, but may have little impact on the performance of individual problem queries unless you badly under-specified your Analysis Services server in the first place. If your query performance degenerates as the number of concurrent users running queries increases, consider scaling-out by implementing what's known as an OLAP farm. This architecture is widely used in large implementations and involves multiple Analysis Services instances on different servers, and using network load balancing to distribute user queries between these servers. Each of these instances needs to have the same database on it and each of these databases must contain exactly the same data in it for queries to be answered consistently. This means that, as the number of concurrent users increases, you can easily add new servers to handle the increased query load. It also has the added advantage of removing a single point of failure, so if one Analysis Services server fails then the others take on its load automatically. Making sure that data is the same across all servers is a complex operation and you have a number of different options for doing this: you can either use the Analysis Services database synchronisation functionality, copy and paste the data from one location to another using a tool like Robocopy, or use the new Analysis Services 2008 shared scalable database functionality. The following white paper from the SQLCat team describes how the first two options can be used to implement a network load-balanced solution for Analysis Services 2005: http://tinyurl.com/ssasnlb. Shared scalable databases have a significant advantage over synchronisation and file-copying in that they don't need to involve any moving of files at all. They can be implemented using the same approach described in the white paper above, but instead of copying the databases between instances you process a database (attached in ReadWrite mode) on one server, detach it from there, and then attach it in ReadOnly mode to one or more user-facing servers for querying while the files themselves stay in one place. You do, however, have to ensure that your disk subsystem does not become a bottleneck as a result. Summary In this article we covered MDX calculation performance and caching, and also how to write MDX to ensure that the Formula Engine works as efficiently as possible. We've also seen how important caching is to overall query performance and what we need to do to ensure that we can cache data as often as possible, and we've discussed how to scale-out Analysis Services using network load balancing to handle large numbers of concurrent users.
Read more
  • 0
  • 0
  • 7991

article-image-oracle-wallet-manager
Packt
16 Oct 2009
9 min read
Save for later

Oracle Wallet Manager

Packt
16 Oct 2009
9 min read
  The Oracle Wallet Manager Oracle Wallet Manager is a password protected stand-alone Java application tool used to maintain security credentials and store SSL related information such as authentication and signing credentials, private keys, certificates, and trusted certificates. OWM uses Public Key Cryptographic Standards (PKCS) #12 specification for the Wallet format and PKCS #10 for certificate requests. Oracle Wallet Manager stores X.509 v3 certificates and private keys in industry-standard PKCS #12 formats, and generates certificate requests according to the PKCS #10 specification. This makes the Oracle Wallet structure interoperable with supported third party PKI applications, and provides Wallet portability across operating systems. Additionally, Oracle Wallet Manager Wallets can be enabled to store credentials on hardware security modules that use APIs compliant with the PKCS #11 specification. The OWM creates Wallets, generates certificate requests, accesses Public Key interface-based services, saves credentials into cryptographic hardware such as smart cards, uploads and unloads Wallets to LDAP directories, and imports Wallets in PKCS #12 format. In a Windows environment, Oracle Wallet Manager can be accessed from the start menu. The following screenshot shows the Oracle Wallet Manager Properties: In a Unix like environment, OWM can be accessed directly from the command line with the owm shell script located at $ORACLE_HOME/bin/owm, it requires a graphical environment so it can be launched. Creating the Oracle Wallet If this is the first time the Wallet has been opened, then a Wallet file does not yet exist. A Wallet is physically created in a specified directory. The user can declare the path where the Oracle Wallet file should be created. The user may either specify a default location or declare a particular directory. A file named ewallet.p12 will be created in the specified location. Enabling Auto Login The Oracle Wallet Manager Auto Login feature creates an obfuscated copy of the Wallet and enables PKI-based access to the services without a password. When this feature is enabled, only the user who created the Wallet will have access to it. By default, Single Sign-On (SSO) access to a different database is disabled. The auto login feature must be enabled in order for you to have access to multiple databases using SSO. Checking and unchecking the Auto Login option will enable and disable this feature. mkwallet, the CLI OWM version Besides the Java client, there is a command line interface version of the Wallet, which can be accessed by means of the mkwallet utility. This can also be used to generate a Wallet and have it configured in Auto Login mode. This is a fully featured tool that allows you to create Wallets, and to view and modify their content. The options provided by the mkwallet tool are shown in the following table:     Option Meaning -R rootPwd rootWrl DN keySize expDate Create the root Wallet -e pwd wrl Create an empty Wallet -r pwd wrl DN keySize certReqLoc Create a certificate request, add it to Wallet and export it to certReqLoc -c rootPwd rootWrl certReqLoc certLoc Create a certificate for a certificate request -i pwd wrl certLoc NZDST_CERTIFICATE | NZDST_CLEAR_PTP Install a certificate | trusted point -d pwd wrl DN Delete a certificate with matching DN -s pwd wrl Store sso Wallet -p pwd wrl Dump the content of Wallet -q certLoc Dump the content of the certificate -Lg pwd wrl crlLoc nextUpdate Generate CRL -La pwd wrl crlLoc certtoRevoke Revoke certificate -Ld crlLoc Display CRL -Lv crlLoc cacert Verify CRL signature -Ls crlLoc cert Check certificate revocation status -Ll oidHostname oidPortNumber cacert Fetch CRL from LDAP directory -Lc cert Fetch CRL from CRLDP in cert -Lb b64CrlLoc derCrlLoc Convert CRL from B64 to DER format -Pw pwd wrl pkcs11Lib tokenPassphrase Create an empty Wallet. Store PKCS11 info in it. -Pq pwd wrl DN keysize certreqLoc Create cert request. Generate key pair on pkcs11 device. -Pl pwd wrl Test pkcs11 device login using Wallet containing PKCS11 info. -Px pwd wrl pkcs11Lib tokenPassphrase Create a Wallet with pkcs11 info from a software Wallet.   Managing Wallets with orapki A CLI-based tool, orapki, is used to manage Public Key Infrastructure components such as Wallets and revocation lists. This tool eases the procedures related to PKI management and maintenance by allowing the user to include it in batch scripts. This tool can be used to create and view signed certificates for testing purposes, create Oracle Wallets, add and remove certificate and certificate requests, and manage Certification Revocation Lists (CRLs)—renaming them and managing them against the Oracle Internet Directory. The syntax for this tool is: orapki module command -parameter <value> module can have these values: wallet: Oracle Wallet crl: Certificate Revocation List cert: The PKI Certificate To create a Wallet you can issue this command: orapki wallet create -wallet <Path to Wallet> To create a Wallet with the auto login feature enabled, you can issue the command: orapki wallet create -wallet <Path to Wallet> -autologin To add a certificate request to the Wallet you can use the command: orapki wallet add -wallet <wallet_location> -dn <user_dn> -keySize <512|1024|2048> To add a user certificate to an Oracle Wallet: orapki wallet add -wallet <wallet_location> -user_cert -cert <certificate_location> The options and values available for the orapki tool depend on the module to be configured: orapki Action Description and Syntax orapki cert create Creates a signed certificate for testing purposes. orapki cert create [-wallet <wallet_location>] -request <certificate_request_location> -cert <certificate_location> -validity <number_of_days> [-summary] orapki cert display Displays details of a specific certificate. orapki cert display -cert <certificate_location> [-summary|-complete] orapki crl delete Deletes CRLs from Oracle Internet Directory.   orapki crl delete -issuer <issuer_name> -ldap <hostname: ssl_port> -user <username> [-wallet <wallet_location>] [-summary] orapki crl diskplay Displays specific CRLs that are stored in Oracle Internet Directory. orapki crl display -crl <crl_location> [-wallet <wallet_location>] [-summary|-complete] orapki crl hash Generates a hash value of the certificate revocation list (CRL) issuer to identify the location of the CRL in your file system for certificate validation. orapki crl hash -crl <crl_filename|URL> [-wallet <wallet_location>] [-symlink|-copy] <crl_directory> [-summary] orapki crl list Displays a list of CRLs stored in Oracle Internet Directory. orapki crl list -ldap <hostname:ssl_port> orapki crl upload Uploads CRLs to the CRL subtree in Oracle Internet Directory. orapki crl upload -crl <crl_location> -ldap <hostname:ssl_port> -user <username> [-wallet <wallet_location>] [-summary] orapki wallet add Add certificate requests and certificates to an Oracle Wallet. orapki wallet add -wallet <wallet_location> -dn <user_dn> -keySize <512|1024|2048> orapki wallet create Creates an Oracle Wallet or to set auto login on for an Oracle Wallet. orapki wallet create -wallet <wallet_location> [-auto_login] orapki wallet display Displays the certificate requests, user certificates, and trusted certificates in an Oracle Wallet. orapki wallet display -wallet <wallet_location> orapki wallet export Export certificate requests and certificates from an Oracle Wallet. orapki wallet export -wallet <wallet_location> -dn <certificate_dn> -cert <certificate_filename>   Oracle Wallet Manager CSR generation Oracle Wallet Manager generates a certificate request in PKCS #10 format. This certificate request can be sent to a certificate authority of your choice. The procedure to generate this certificate request is as follows: From the main menu choose the Operations menu and then select the Add Certificate Request submenu. As shown in the following screenshot, a form will be displayed where you can capture specific information. The parameters used to request a certificate are described next: Common Name: This parameter is mandatory. This is the user's name or entity's name. If you are using a user's name, then enter it using the first name, last name format. Organization Unit: This is the name of the identity's organization unit. It could be the name of the department where the entity belongs (optional parameter). Organization: This is the company's name (optional). Location/City: The location and the city where the entity resides (optional). State/Province: This is the full name of the state where the entity resides. Do not use abbreviations (optional). Country: This parameter is mandatory. It specifies the country where the entity is located. Key Size: This parameter is mandatory. It defines the key size used when a public/private key pair is created. The key size can be as little as 512 bytes and up to 4096 bytes. Advanced: When the parameters are introduced a Distinguished Name (DN) is assembled. If you want to customize this DN, then you can use the advanced DN configuration mode. Once the Certificate Request form has been completed, a PKCS#10 format certificate request is generated. The information that appears between the BEGIN and END keywords must be used to request a certificate to a Certificate Authority (CA); there are several well known certificate authorities, and depending on the usage you plan for your certificate, you could address the request to a known CA (from the browser perspective) so when an end user accesses your site it doesn't get warned about the site's identity. If the certificate will be targeted at a local community who doesn't mind about the certificate warning, then you may generate your own certificate or ask a CA to issue a certificate for you. For demonstration purposes, we used the Oracle Certificate Authority (OCA) included with the Oracle Application Server. OCA will provide the Certificate Authority capabilities to your site and it can issue standard certificates, suitable for the intranet users. If you are planning to use OCA then you should review the license agreements to determine if you are allowed to use it.  
Read more
  • 0
  • 0
  • 7656

article-image-measures-and-measure-groups-microsoft-analysis-services-part-2
Packt
15 Oct 2009
20 min read
Save for later

Measures and Measure Groups in Microsoft Analysis Services: Part 2

Packt
15 Oct 2009
20 min read
Measure groups All but the simplest data warehouses will contain multiple fact tables, and Analysis Services allows you to build a single cube on top of multiple fact tables through the creation of multiple measure groups. These measure groups can contain different dimensions and be at different granularities, but so long as you model your cube correctly, your users will be able to use measures from each of these measure groups in their queries easily and without worrying about the underlying complexity. Creating multiple measure groups To create a new measure group in the Cube Editor, go to the Cube Structure tab and right-click on the cube name in the Measures pane and select 'New Measure Group'. You'll then need to select the fact table to create the measure group from and then the new measure group will be created; any columns that aren't used as foreign key columns in the DSV will automatically be created as measures, and you'll also get an extra measure of aggregation type Count. It's a good idea to delete any measures you are not going to use at this stage. Once you've created a new measure group, BIDS will try to set up relationships between it and any existing dimensions in your cube based on the relationships you've defined in your DSV. Since doing this manually can be time-consuming, this is another great reason for defining relationships in the DSV. You can check the relationships that have been created on the Dimension Usage tab of the Cube Editor: In Analysis Services 2005, it was true in some cases that query performance was better on cubes with fewer measure groups, and that breaking a large cube with many measure groups up into many smaller cubes with only one or two measure groups could result in faster queries. This is no longer the case in Analysis Services 2008. Although there are other reasons why you might want to consider creating separate cubes for each measure group, this is still something of a controversial subject amongst Analysis Services developers. The advantages of a single cube approach are: All of your data is in one place. If your users need to display measures from multiple measure groups, or you need to create calculations that span measure groups, everything is already in place. You only have one cube to manage security and calculations on; with multiple cubes the same security and calculations might have to be duplicated. The advantages of the multiple cube approach are: If you have a complex cube but have to use Standard Edition, you cannot use Perspectives to hide complexity from your users. In this case, creating multiple cubes might be a more user-friendly approach. Depending on your requirements, security might be easier to manage with multiple cubes. It's very easy to grant or deny a role access to a cube; it's much harder to use dimension security to control which measures and dimensions in a multi-measure group cube a role can access. If you have complex calculations, especially MDX Script assignments, it's too easy to write a calculation that has an effect on part of the cube you didn't want to alter. With multiple cubes, the chances of this happening are reduced. Creating measure groups from dimension tables Measure groups don't always have to be created from fact tables. In many cases, it can be useful to build measure groups from dimension tables too. One common scenario where you might want to do this is when you want to create a measure that counts the number of days in the currently selected time period, so if you had selected a year on your Time dimension's hierarchy, the measure would show the number of days in the year. You could implement this with a calculated measure in MDX, but it would be hard to write code that worked in all possible circumstances, such as when a user multi-selects time periods. In fact, it's a better idea to create a new measure group from your Time dimension table containing a new measure with AggregateFunction Count, so you're simply counting the number of days as the number of rows in the dimension table. This measure will perform faster and always return the values you expect. This post on Mosha Pasumansky's blog discusses the problem in more detail: http://tinyurl.com/moshadays MDX formulas vs pre-calculating valuesIf you can somehow model a calculation into the structure of your cube, or perform it in your ETL, you should do so in preference to doing it in MDX only so long as you do not compromise the functionality of your cube. A pure MDX approach will be the most flexible and maintainable since it only involves writing code, and if calculation logic needs to change, then you just need to redeploy your updated MDX Script; doing calculations upstream in the ETL can be much more time-consuming to implement and if you decide to change your calculation logic, then it could involve reloading one or more tables. However, an MDX calculation, even one that is properly tuned, will of course never perform as well as a pre-calculated value or a regular measure. The day count measure, discussed in the previous paragraph, is a perfect example of where a cube-modeling approach trumps MDX. If your aim was to create a measure that showed average daily sales, though, it would make no sense to try to pre-calculate all possible values since that would be far too time-consuming and would result in a non-aggregatable measure. The best solution here would be a hybrid: create real measures for sales and day count, and then create an MDX calculated measure that divided the former by the latter. However, it's always necessary to consider the type of calculation, the volume of data involved and the chances of the calculation algorithm changing in the future before you can make an informed decision on which approach to take. Handling different dimensionality When you have different measure groups in a cube, they are almost always going to have different dimensions associated with them; indeed, if you have measure groups that have identical dimensionality, you might consider combining them into a single measure group if it is convenient to do so. As we've already seen, the Dimension Usage tab shows us which dimensions have relationships with which measure groups. When a dimension has a relationship with a measure group it goes without saying that making a selection on that dimension will affect the values that are displayed for measures on that measure group. But what happens to measures when you make a selection on a dimension that has no relationship with a measure group? In fact, you have two options here, controlled by the IgnoreUnrelatedDimensions property of a measure group: IgnoreUnrelatedDimensions=False displays a null value for all members below the root (the intersection of all of the All Members or default members on every hierarchy) of the dimension, except the Unknown member, or IgnoreUnrelatedDimensions=True repeats the value displayed at the root of the dimension for every member on every hierarchy of the dimension. This is the default state. The screenshot below shows what happens for two otherwise identical measures from measure groups which have IgnoreUnrelatedDimensions set to True and to False when they're displayed next to a dimension they have no relationship with: It's usually best to keep IgnoreUnrelatedDimensions set to True since if the users are querying measures from multiple measure groups, then they don't want some of their selected measures suddenly returning null if they slice by a dimension that has a regular relationship with their other selected measures. Handling different granularities Even when measure groups share the same dimensions, they may not share the same granularity. For example, we may hold sales information in one fact table down to the day level, but also hold sales quotas in another fact table at the quarter level. If we created measure groups from both these fact tables, then they would both have regular relationships with our Time dimension but at different granularities. Normally, when you create a regular relationship between a dimension and a measure group, Analysis Services will join the columns specified in the KeyColumns property of the key attribute of the dimension with the appropriate foreign key columns of the fact table (note that during processing, Analysis Services won't usually do the join in SQL, it does it internally). However, when you have a fact table of a higher granularity, you need to change the granularity attribute property of the relationship to choose the attribute from the dimension you do want to join on instead: In the previous screenshot, we can see an amber warning triangle telling us that by selecting a non-key attribute, the server may have trouble aggregating measure values. What does this mean exactly? Let's take a look at the attribute relationships defined on our Time dimension again: If we're loading data at the Quarter level, what do we expect to see at the Month and Date level? We can only expect to see useful values at the level of the granularity attribute we've chosen, and for only those attributes whose values can be derived from that attribute; this is yet another good reason to make sure your attribute relationships have been optimized. Below the granularity attribute, we've got the same options regarding what gets displayed as we had with dimensions that have no relationship at all with a measure group: either repeated values or null values. The IgnoreUnrelatedDimensions property is again used to control this behavior. Unfortunately, the default True setting for IgnoreUnrelatedDimensions is usually not the option you want to use in this scenario (users usually prefer to see nulls below the granularity of a measure in our experience) and this may conflict with how we want to set IgnoreUnrelatedDimensions to control the behavior of dimensions which have no relationship with a measure group. There are ways of resolving this conflict such as using MDX Script assignments to set cell values to null or by using the ValidMeasure() MDX function, but none are particularly elegant. Non-aggregatable measures: a different approach We've already seen how we can use parent/child hierarchies to load non-aggregatable measure values into our cube. However, given the problems associated with using parent/child hierarchies and knowing what we now know about measure groups, let's consider a different approach to solving this problem. A non-aggregatable measure will have, by its very nature, data stored for many different granularities of a dimension. Rather than storing all of these different granularities of values in the same fact table, we could create multiple fact tables for each granularity of value. Having built measure groups from these fact tables, we would then be able to join our dimension to each of them with a regular relationship but at different granularities. We'd then be in the position of having multiple measures representing the different granularities of a single, logical measure. What we actually want is a single non-aggregatable measure, and we can get this by using MDX Script assignments to combine the different granularities. Let's say we have a regular (non-parent/child) dimension called Employee with three attributes Manager, Team Leader and Sales Person, and a logical non-aggregatable measure called Sales Quota appearing in three measure groups as three measures called Sales Amount Quota_Manager, Sales Amount Quota_TeamLead and Sales Amount Quota for each of these three granularities. Here's a screenshot showing what a query against this cube would show at this stage: We can combine the three measures into one like this: SCOPE([Measures].[Sales Amount Quota]); SCOPE([Employee].[Salesperson].[All]); THIS=[Measures].[Sales Amount Quota_TeamLead]; END SCOPE; SCOPE([Employee].[Team Lead].[All]); THIS=[Measures].[Sales Amount Quota_Manager]; END SCOPE;END SCOPE; This code takes the lowest granularity measure Sales Amount Quota, and then overwrites it twice: the first assignment replaces all of the values above the Sales Person granularity with the value of the measure containing Sales Amount Quota for Team Leaders; the second assignment then replaces all of the values above the Team Leader granularity with the value of the measure containing Sales Quotas for Managers. Once we've set Visible=False for the Sales Amount Quota_TeamLead and Sales Amount Quota_Manager measures, we're left with just the Sales Amount Quota measure visible, thus displaying the non-aggregatable values that we wanted. The user would then see this: Using linked dimensions and measure groups Creating linked dimensions and measure groups allows you to share the same dimensions and measure groups across separate Analysis Services databases, and the same measure group across multiple cubes. To do this, all you need to do is to run the 'New Linked Object' wizard from the Cube Editor, either by clicking on the button in the toolbar on the Cube Structure or Dimension Usage tabs, or by selecting it from the right-click menu in the Measures pane of the Cube Structure tab. Doing this has the advantage of reducing the amount of processing and maintenance needed: instead of having many identical dimensions and measure groups to maintain and keep synchronized, all of which need processing separately, you can have a single object which only needs to be changed and processed once. At least that's the theory—in practice, linked objects are not as widely used as they could be because there are a number of limitations in their use: Linked objects represent a static snapshot of the metadata of the source object, and any changes to the source object are not passed through to the linked object. So for example, if you create a linked dimension and then add an attribute to the source dimension, you then have to delete and recreate the linked dimension—there's no option to refresh a linked object. You can also import the calculations defined in the MDX Script of the source cube using the wizard. However, you can only import the entire script and this may include references to objects present in the source cube that aren't in the target cube, and which may need to be deleted to prevent errors. The calculations that remain will also need to be updated manually when those in the source cube are changed, and if there are a lot, this can add an unwelcome maintenance overhead. A linked measure group can only be used with dimensions from the same database as the source measure group. This isn't a problem when you're sharing measure groups between cubes in the same database, but could be if you wanted to share measure groups across databases. As you would expect, when you query a linked measure group, your query is redirected to the source measure group. If the source measure group is on a different server, this may introduce some latency and hurt query performance. Analysis Services does try to mitigate this by doing some caching on the linked measure group's database, though. By default, it will cache data on a per-query basis, but if you change the RefreshPolicy property from ByQuery to ByInterval you can specify a time limit for data to be held in cache. Linked objects can be useful when cube development is split between multiple development teams, or when you need to create multiple cubes containing some shared data, but, in general, we recommend against using them widely because of these limitations. Role-playing dimensions It's also possible to add the same dimension to a cube more than once, and give each instance a different relationship to the same measure group. For example, in our Sales fact table, we might have several different foreign key columns that join to our Time dimension table: one which holds the date an order was placed on, one which holds the date it was shipped from the warehouse, and one which holds the date the order should arrive with the customer. In Analysis Services, we can create a single physical Time dimension in our database, which is referred to as a database dimension, and then add it three times to the cube to create three 'cube dimensions', renaming each cube dimension to something like Order Date, Ship Date and Due Date. These three cube dimensions are referred to as role-playing dimensions: the same dimension is playing three different roles in the same cube. Role playing dimensions are a very useful feature. They reduce maintenance overheads because you only need to edit one dimension, and unlike linked dimensions, any changes made to the underlying database dimension are propagated to all of the cube dimensions that are based on it. They also reduce processing time because you only need to process the database dimension once. However, there is one frustrating limitation with role-playing dimensions and that is that while you can override certain properties of the database dimension on a per-cube dimension basis, you can't change the name of any of the attributes or hierarchies of a cube dimension. So if you have a user hierarchy called 'Calendar' on your database dimension, all of your cube dimensions will also have a user hierarchy called 'Calendar', and your users might find it difficult to tell which hierarchy is which in certain client tools (Excel 2003 is particularly bad in this respect) or in reports. Unfortunately, we have seen numerous cases where this problem alone meant role-playing dimensions couldn't be used. Dimension/measure group relationships So far we've seen dimensions either having no relationship with a measure group or having a regular relationship, but that's not the whole story: there are many different types of relationships that a dimension can have with a measure group. Here's the complete list: No relationship Regular Fact Referenced Many-to-Many Data Mining Fact relationships Fact or degenerate dimensions are dimensions that are built directly from columns in a fact table, not from a separate dimension table. From an Analysis Services dimension point of view, they are no different from any other kind of dimension, except that there is a special fact relationship type that a dimension can have with a measure group. There are in fact very few differences between a fact relationship and a regular relationship, and they are: A fact relationship will result in marginally more efficient SQL being generated when the fact dimension is used in ROLAP drillthrough. Fact relationships are visible to client tools in the cube's metadata, so client tools may choose to display fact dimensions differently. A fact relationship can only be defined on dimensions and measure groups that are based on the same table in the DSV. A measure group can only have a fact relationship with one database dimension. It can have more than one fact relationship, but all of them have to be with cube dimensions based on the same database dimension. It still makes sense though to define relationships as fact relationships when you can. Apart from the reasons given above, the functionality might change in future versions of Analysis Services and fact relationship types might be further optimized in some way. Referenced relationships A referenced relationship is where a dimension joins to a measure group through another dimension. For example, you might have a Customer dimension that includes geographic attributes up to and including a customer's country; also, your organization might divide the world up into international regions such as North America, Europe, Middle East and Africa (EMEA), Latin America (LATAM) and Asia-Pacific and so on for financial reporting, and you might build a dimension for this too. If your sales fact table only contained a foreign key for the Customer dimension, but you wanted to analyze sales by international region, you would be able to create a referenced relationship from the Region dimension through the Customer dimension to the Sales measure group. When setting up a referenced relationship in the Define Relationship dialog in the Dimension Usage tab, you're asked to first choose the dimension that you wish to join through and then which attribute on the reference dimension joins to which attribute on the intermediate dimension: When the join is made between the attributes you've chosen on the reference dimension, once again it's the values in the columns that are defined in the KeyColumns property of each attribute that you're in fact joining on. The Materialize checkbox is automatically checked, and this ensures maximum query performance by resolving the join between the dimensions at processing time, which can lead to a significant decrease in processing performance. Unchecking this box means that no penalty is paid at processing time but query performance may be worse. The question you may well be asking yourself at this stage is: why bother to use referenced relationships at all? It is in fact a good question to ask, because, in general, it's better to include all of the attributes you need in a single Analysis Services dimension built from multiple tables rather than use a referenced relationship. The single dimension approach will perform better and is more user-friendly: for example, you can't define user hierarchies that span a reference dimension and its intermediate dimension. That said, there are situations where referenced relationships are useful because it's simply not feasible to add all of the attributes you need to a dimension. You might have a Customer dimension, for instance, that has a number of attributes representing dates—the date of a customer's first purchase, the date of a customer's tenth purchase, the date of a customer's last purchase and so on. If you had created these attributes with keys that matched the surrogate keys of your Time dimension, you could create multiple, referenced (but not materialized) role-playing Time dimensions joined to each of these attributes that would give you the ability to analyze each of these dates. You certainly wouldn't want to duplicate all of the attributes from your Time dimension for each of these dates in your Customer dimension. Another good use for referenced relationships is when you want to create multiple parent/child hierarchies from the same dimension table Data mining relationships The data mining functionality of Analysis Services is outside the scope of this article, so we won't spend much time on the data mining relationship type. Suffice to say that when you create an Analysis Services mining structure from data sourced from a cube, you have the option of using that mining structure as the source for a special type of dimension, called a data mining dimension. The wizard will also create a new cube containing linked copies of all of the dimensions and measure groups in the source cube, plus the new data mining dimension, which then has a data mining relationships with the measure groups. Summary In this part, we focused on how to create new measure groups and handle the problems of different dimensionality and granularity, and looked at the different types of relationships that are possible between dimensions and measure groups.
Read more
  • 0
  • 0
  • 21796
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-measures-and-measure-groups-microsoft-analysis-services-part-1
Packt
15 Oct 2009
12 min read
Save for later

Measures and Measure Groups in Microsoft Analysis Services: Part 1

Packt
15 Oct 2009
12 min read
In this two-part article by Chris Webb, we will look at measures and measure groups, ways to control how measures aggregate up, and how dimensions can be related to measure groups. In this part, will cover useful properties of measures, along with built-in measure aggregation types and dimension calculations. Measures and aggregation Measures are the numeric values that our users want to aggregate, slice, dice and otherwise analyze, and as a result, it's important to make sure they behave the way we want them to. One of the fundamental reasons for using Analysis Services is that, unlike a relational database it allows us to build into our cube design business rules about measures: how they should be formatted, how they should aggregate up, how they interact with specific dimensions and so on. It's therefore no surprise that we'll spend a lot of our cube development time thinking about measures. Useful properties of Measures Apart from the AggregateFunction property of a measure, which we'll come to next, there are two other important properties we'll want to set on a measure, once we've created it. Format string The Format String property of a measure specifies how the raw value of the measure gets formatted when it's displayed in query results. Almost all client tools will display the formatted value of a measure, and this allows us to ensure consistent formatting of a measure across all applications that display data from our cube. A notable exception is Excel 2003 and earlier versions, which can only display raw measure values and not formatted values. Excel 2007 will display properly formatted measure values in most cases, but not all. For instance, it ignores the fourth section of the Format String which controls formatting for nulls. Reporting Services can display formatted values in reports, but doesn't by default; this blog entry describes how you can make it do so:  http://tinyurl.com/gregformatstring. There are a number of built-in formats that you can choose from, and you can also build your own by using syntax very similar to the one used by Visual BASIC for Applications (VBA) for number formatting. The Books Online topic FORMAT_STRING Contents gives a complete description of the syntax used. Here are some points to bear in mind when setting the Format String property: If you're working with percentage values, using the % symbol will display your values multiplied by one hundred and add a percentage sign to the end. Note that only the display value gets multiplied by hundred—the real value of the measure will not be, so although your user might see a value of 98% the actual value of the cell would be 0.98. If you have a measure that returns null values in some circumstances and you want your users to see something other than null, don't try to use a MDX calculation to replace the nulls—this will cause severe query performance problems. You can use the fourth section of the Format String property to do this instead—for example, the following: #,#.00;#,#.00;0;NA will display the string NA for null values, while keeping the actual cell value as null without affecting performance. Be careful while using the Currency built-in format: it will format values with the currency symbol for the locale specified in the Language property of the cube. This combination of the Currency format and the Language property is frequently recommended for formatting measures that contain monetary values, but setting this property will also affect the way number formats are displayed: for example, in the UK and the USA, the comma is used as a thousands separator, but in continental Europe it is used as a decimal separator. As a result, if you wanted to display a currency value to a user in a locale that didn't use that currency, then you could end up with confusing results. The value €100,101 would be interpreted as a value just over one hundred Euros to a user in France, but in the UK, it would be interpreted as a value of just over one hundred thousand Euros. You can use the desired currency symbol in a Format String instead, for example '$#,#.00', but this will not have an effect on the thousands and decimal separators used, which will always correspond to the Language setting. You can find an example of how to change the language property using a scoped assignment in the MDX Script here: http://tinyurl.com/gregformatstring. Similarly, while Analysis Services 2008 supports the translation of captions and member names for users in different locales, unlike in previous versions, it will not translate the number formats used. As a result, if your cube might be used by users in different locales you need to ensure they understand whichever number format the cube is using. Display folders Many cubes have a lot of measures on them, and as with dimension hierarchies, it's possible to group measures together into folders to make it easier for your users to find the one they want. Most, but not all, client tools support display folders, so it may be worth checking whether the one you intend to use does. By default each measure group in a cube will have its own folder containing all of the measures on the measure group; these top level measure group folders cannot be removed and it's not possible to make a measure from one measure group appear in a folder under another measure group. By entering a folder name in a measure's Display Folder property, you'll make the measure appear in a folder underneath its measure group with that name; if there isn't already a folder with that name, then one will be created, and folder names are case-sensitive. You can make a measure appear under multiple folders by entering a semi-colon delimited list of names as follows: Folder One; Folder Two. You can also create a folder hierarchy by entering either a forward-slash / or back-slash delimited list (the documentation contradicts itself on which is meant to be used—most client tools that support display folders support both) of folder names as follows: Folder One; Folder TwoFolder Three. Calculated measures defined in the MDX Script can also be associated with a measure group, through the Associated_Measure_Group property, and with a display folder through the Display_Folder property. These properties can be set either in code or in Form View in the Calculations tab in the Cube Editor: If you don't associate a calculated measure with a measure group, but do put it in a folder, the folder will appear at the same level as the folders created for each measure group. Built-in measure aggregation types The most important property of a measure is AggregateFunction; it controls how the measure aggregates up through each hierarchy in the cube. When you run an MDX query, you can think of it as being similar to a SQL SELECT statement with a GROUP BY clause—but whereas in SQL you have to specify an aggregate function to control how each column's values get aggregated, in MDX you specify this for each measure when the cube is designed. Basic aggregation types Anyone with a passing knowledge of SQL will understand the four basic aggregation types available when setting the AggregateFunction property: Sum is the commonest aggregation type to use, probably the one you'll use for 90% of all the measures. It means that the values for this measure will be summed up. Count is another commonly used property value and aggregates either by counting the overall number of rows from the fact table that the measure group is built from (when the Binding Type property, found on the Measure Source dialog that appears when you click on the ellipses button next to the Source property of a measure, is set to Row Binding), or by counting non-null values from a specific measure column (when Binding Type property is set to Column Binding). Min and Max return the minimum and maximum measure values. There isn't a built-in Average aggregation type—as we'll soon see, AverageOfChildren does not do a simple average—but it's very easy to create a calculated measure that returns an average by dividing a measure with AggregateFunction Sum by one with AggregateFunction Count, for example: CREATE MEMBER CURRENTCUBE.[Measures].[Average Measure Example] ASIIF([Measures].[Count Measure]=0, NULL,[Measures].[Sum Measure]/[Measures].[Count Measure]); Distinct Count The DistinctCount aggregation type counts the number of distinct values in a column in your fact table, similar to a Count(Distinct) in SQL. It's generally used in scenarios where you're counting some kind of key, for example, finding the number of unique Customers who bought a particular product in a given time period. This is, by its very nature, an expensive operation for Analysis Services and queries that use DistinctCount measures can perform worse than those which use additive measures. It is possible to get distinct count values using MDX calculations but this almost always performs worse; it is also possible to use many-to-many dimensions to get the same results and this may perform better in some circumstances; see the section on "Distinct Count" in the "Many to Many Revolution" white paper, available at http://tinyurl.com/m2mrev. When you create a new distinct count measure, BIDS will create a new measure group to hold it automatically. Each distinct count measure needs to be put into its own measure group for query performance reasons, and although it is possible to override BIDS and create a distinct count measure in an existing measure group with measures that have other aggregation types, we strongly recommend that you do not do this. None The None aggregation type simply means that no aggregation takes place on the measure at all. Although it might seem that a measure with this aggregation type displays no values at all, that's not true: it only contains values at the lowest possible granularity in the cube, at the intersection of the key attributes of all the dimensions. It's very rarely used, and only makes sense for values such as prices that should never be aggregated. If you ever find that your cube seems to contain no data even though it has processed successfully, check to see if you have accidentally deleted the Calculate statement from the beginning of your MDX Script. Without this statement, no aggregation will take place within the cube and you'll only see data at the intersection of the leaves of every dimension, as if every measure had AggregateFunction None. Semi-additive aggregation types The semi-additive aggregation types are: AverageOfChildren FirstChild LastChild FirstNonEmpty LastNonEmpty They behave the same as measures with aggregation type Sum on all dimensions except Time dimensions. In order to get Analysis Services to recognize a Time dimension, you'll need to have set the dimension's Type property to Time in the Dimension Editor. Sometimes you'll have multiple, role-playing Time dimensions in a cube, and if you have semi-additive measures, they'll be semi-additive for just one of these Time dimensions. In this situation, Analysis Services 2008 RTM uses the first Time dimension in the cube that has a relationship with the measure group containing the semi-additive measure. You can control the order of dimensions in the cube by dragging and dropping them in the Dimensions pane in the bottom left-hand corner of the Cube Structure tab of the Cube Editor; the following blog entry describes how to do this in more detail: http://tinyurl.com/gregsemiadd. However, this behavior has changed between versions in the past and may change again in the future. Semi-additive aggregation is extremely useful when you have a fact table that contains snapshot data. For example, if you had a fact table containing information on the number of items in stock in a warehouse, then it would never make sense to aggregate these measures over time: if you had ten widgets in stock on January 1, eleven in stock on January 2, eight on January 3 and so on, the value you would want to display for the whole of January would never be the sum of the number of items in stock on each day in January. The value you do display depends on your organization's business rules. Let's take a look at what each of the semi-additive measure values actually do: AverageOfChildren displays the average of all the values at the lowest level of granularity on the Time dimension. So, for example, if Date was the lowest level of granularity, when looking at a Year value, then Analysis Services would display the average value for all days in the year. FirstChild displays the value of the first time period at the lowest level of granularity, for example, the first day of the year. LastChild displays the value of the last time period at the lowest level of granularity, for example, the last day of the year. FirstNonEmpty displays the value of the first time period at the lowest level of granularity that is not empty, for example the first day of the year that has a value. LastNonEmpty displays the value of the last time period at the lowest level of granularity that is not empty, for example the last day of the year that has a value. This is the most commonly used semi-additive type; a good example of its use would be where the measure group contains data about stock levels in a warehouse, so when you aggregated along the Time dimension what you'd want to see is the amount of stock you had at the end of the current time period. The following screenshot of an Excel pivot table illustrates how each of these semi-additive aggregation types works: Note that the semi-additive measures only have an effect above the lowest level of granularity on a Time dimension. For dates like July 17th in the screenshot above, where there is no data for the Sum measure, the LastNonEmpty measure still returns null and not the value of the last non-empty date.
Read more
  • 0
  • 0
  • 15076

article-image-deployment-reports-birt
Packt
15 Oct 2009
4 min read
Save for later

Deployment of Reports with BIRT

Packt
15 Oct 2009
4 min read
Everything in this article uses utilities from the BIRT Runtime installation package, available from the BIRT homepage at http://www.eclipse.org/birt. BIRT Viewer The BIRT Viewer is a J2EE application that is designed to demonstrate how to implement the Report Engine API to execute reports in an online web application. For most basic uses—such as for small to medium size Intranet applications—this is an appropriate approach. The point to keep in mind about the BIRT Web Viewer is that it is an example application. It can be used as a baseline for more sophisticated web applications that will implement the BIRT Report Engine API. Installation of the BIRT Viewer is documented at a number of places. The Eclipse BIRT website has some great tutorials at: http://www.eclipse.org/birt/phoenix/deploy/viewerSetup.php http://wiki.eclipse.org/BIRT/FAQ/Deployment This is also documented on my website in a series of articles introducing people to BIRT: http://digiassn.blogspot.com/2005/10/birt-report-server-pt-2.html I won't go into the details about installing Apache Tomcat as this is covered in depth in other locations, but I will cover how to install the Viewer in a Tomcat environment. For the most part these instructions can be used in other J2EE containers, such as WebSphere. In some cases a WAR package is used instead. I prefer Tomcat because it is a widely used open-source J2EE environment. Under the BIRT Runtime package is a folder containing an example Web Viewer application. The Web Viewer is a useful application as you require basic report viewing capabilities, such as parameter passing, pagination, and export capabilities to formats such as Word, Excel, RTF, and CSV. For this example, I have Apache Tomcat 5.5 installed into a folder at C:apache-tomcat-5.5.25. To install the Web Viewer, I simply need to copy the WebViewerExample folder from the BIRT Runtime to the web application folder at C:apache-tomcat-5.5.25webapps. Accessing the BIRT Web Viewer is as simple as calling the WebViewerExample Context. When copying the WebViewerExample folder, you can rename this folder to anything you want. Obviously WebViewerExample is not a good name for an online web application. So in the following screenshot, I renamed the WebViewerExample folder to birtViewer, and am accessing the BIRT Web Viewer test report. Installing Reports into the Web Viewer Once the BIRT Viewer is set up, Deploying reports is as simple as copying the report design files, Libraries, or report documents into the application's Context, and calling it with the appropriate URL parameters. For example, we will install the reports from the Classic Cars – With Library folder into the BIRT Web Viewer at birtViewer. In order for these reports to work, all dependent Libraries need to be installed with the reports. In the case of the example application, we currently have the report folder set to the Root of the web application folder. Accessing Reports in the Web Viewer Accessing reports is as simple as passing the correct parameters to the Web Viewer. In the BIRT Web Viewer, there are seven servlets that you can call to run reports, which are as follows: frameset run preview download parameter document output Out of these, you will only need frameset and run as the other servlets are for Engine-related purposes, such as the preview for the Eclipse designer, the parameter Dialog, and the download of report documents. Out of the these two servlets, frameset is the one that is typically used for user interaction with reports, as it provides the pagination options, parameter Dialogs, table of contents viewing, and export and print Dialogs. The run servlet only provides report output. There are a few URL parameters for the BIRT Web Viewer, such as: __format : which is the output format, either HTML or PDF. __isnull: which sets a Report Parameter to null, parameter name as a value. __locale: which is the reports locale. __report: which is the report design file to run. __document: which is the report document file to open. Any remaining URL parameter will be treated as a Report Parameter. In the following image, I am running the Employee_Sales_Percentage.rptdesign file with the startDate and endDate parameters set.  
Read more
  • 0
  • 0
  • 6279

article-image-databasedata-model-round-trip-engineering-mysql
Packt
15 Oct 2009
3 min read
Save for later

Database/Data Model Round-Trip Engineering with MySQL

Packt
15 Oct 2009
3 min read
Power*Architect—from SQL Power—is a free software data modeling tool, which you can download from its website www.sqlpower.ca and use it under GPLv3 license. Reverse Engineering To reverse engineer is to create the data model of an existing database. To reverse engineer an existing database in Power*Architect, we need to connect to the database. Figure 1 shows the Power*Architect's connection window where we define (create) our connection to the MySQL sales database that we'd like to reengineer. Figure 1: Creating a database connection By adding the conn_packt connection, the sales database objects are now available in Power*Architect. Figure 2: Adding a database connection By expanding the sales database, you can see all the objects that you need to create its data model. Figure 3: Database objects You create the ER diagram of the sales data model by dragging the sales object into the canvas (called playpen in Power*Architect) Note that the objects in the model (those in the diagram) are now in the PlayPen Database. Figure 4: Database objects in the PlayPen Now that you have created the data model, you might want to save it. Figure 5: Saving the data model (project) Figure 6: Saving sales.architect data model (project) You have completed the sales database reverse-engineering. Updating the Data Model Let's now add two new tables (hardware and software) and relate them to the product table. You add a table by clicking the New Table tool and dropping your cursor on the white space of the canvas. Figure 7: New Table tool Type in the name of the table, and then click OK. Figure 8: Adding hardware table We now add a column to the hardware table by right-clicking the table and selecting New Column. Figure 9: New Column menu selection Type in the name of the column (model), select VARCHAR data type (and its length), then click OK. Figure 10: The model column After adding the two tables and their columns, our ER diagram will look like in Figure 11. Figure 11: The hardware and software tables Our last update is relating the hardware and software tables to the product table. Select the New Identifying Relationship tool; click it to the product and then the software. Figure 12: New Identifying Relationship tool The software table is now related to the product table. Note that the product's primary key is migrated to the software table as a primary key. Figure 13: software and product tables are related
Read more
  • 0
  • 0
  • 2536

article-image-essbase-aso-aggregate-storage-option
Packt
14 Oct 2009
5 min read
Save for later

Essbase ASO (Aggregate Storage Option)

Packt
14 Oct 2009
5 min read
Welcome to the exciting world of Essbase Analytics known as the Aggregate Storage Option (ASO). Well, now you're ready to take everything one step further. You see, the BSO architecture used by Essbase is the original database architecture as the behind the scenes method of data storage in an Essbase database. The ASO method is entirely different. What is ASO ASO is Essbase's alternative to the sometimes cumbersome BSO method of storing data in an Essbase database. In fact, it is BSO that is exactly what makes Essbase a superior OLAP analytical tool but it is also the BSO that can occasionally be a detriment to the level of system performance demanded in today's business world. In a BSO database, all data is stored, except for dynamically calculated members. All data consolidations and parent-child relationships in the database outline are stored as well. While the block storage method is quite efficient from a data to size ratio perspective, a BSO database can require large amounts of overhead to deliver the retrieval performance demanded by the business customer. The ASO database efficiently stores not only zero level data, but can also store aggregated hierarchical data with the understandings that stored hierarchies can only have the no-consolidation (~) or the addition (+) operator assigned to them and the no-consolidation (~) operator can only be used underneath Label Only members. Outline member consolidations are performed on the fly using dynamic calculations and only at the time of the request for data. This is the main reason why ASO is a valuable option worth consideration when building an Essbase system for your customer. Because of the simplified levels of data stored in the ASO database, a more simplified method of storing the physical data on the disk can also be used. It is this simplified storage method which can help result in higher performance for the customer. Your choice of one database type over the other will always depend on balancing the customer's needs with the server's physical capabilities, along with the volume of data. These factors must be given equal consideration. Creating an aggregate storage Application|Database Believe it or not, creating an ASO Essbase application and database is as easy as creating a BSO application and database. All you need to do is follow these simple steps: Right-click on the server name in your EAS console for the server on which you want to create your ASO application. Select Create application | Using aggregate storage as shown in the following screenshot: Click on Using aggregate storage and that's it. The rest of the steps are easy to follow and basically the same as for a BSO application. To create an ASO application and database, you follow virtually the same steps as you do to create a BSO application and database. However, there are some important differences, and here we list a few: A BSO database outline can be converted into an Aggregate Storage database outline, but an Aggregate Storage database outline cannot be converted into a Block Storage database outline.Steps to convert a BSO application into an ASO application: Open the BSO outline that you wish to convert, select the Essbase database and click on the File | Wizards | Aggregate Storage Outline Conversion option. You will see the first screen Select Source Outline. The source of the outline can be in a file system or on the Essbase Server. In this case, we have selected the OTL from the Essbase Server and then click Next as shown in the following screenshot: In the Next screen, the conversion wizard will verify the conversion and display a message that the conversion has completed successfully. Click Next. Here, Essbase prompts you to select the destination of the ASO outline. If you have not yet created an ASO application, you can click on the Create Aggregate Storage Application on the bottom-right corner of the screen as shown in the next screenshot: Enter the Application and the Database name and click on OK. Your new ASO application is created, now click on Finish. Your BSO application is now converted into an ASO application. You may still need to tweak the ASO application settings and outline members to be the best fit for your needs. In an ASO database, all dimensions are Sparse so there is no need to try to determine the best Dense/Sparse settings as you would do with a BSO database. Although Essbase recommends that you only have one Essbase database in an Essbase application, you can create more than one database per application when you are using the BSO. When you create an ASO application, Essbase will only allow one database per application. There is quite a bit to know about ASO but have no fear, with all that you know about Essbase and how to design and build an Essbase system, it will seem easy for you. Keep reading for more valuable information on the ASO for things like, when it is a good time to use ASO, or how do you query ASO databases effectively, or even what are the differences between ASO and BSO. If you understand the differences, you can then understand the benefits.
Read more
  • 0
  • 0
  • 7905
article-image-data-access-adonet-data-services
Packt
14 Oct 2009
3 min read
Save for later

Data Access with ADO.NET Data Services

Packt
14 Oct 2009
3 min read
In essence, ADO.NET Data Services serves up data as a web service over HTTP using the objects in the Entity Model of the data and conforms to the principles of REST (Representational State Transfer) wherein everything is a kind of resource with a name and they can be manipulated using the well known HTTP Verbs; Get, Put, Post and Delete. The data can be returned in a number of formats including RSS and ATOM. According to W3C, "Atom" is an XML-based document format that describes lists of related information known as "feeds". Feeds are composed of a number of items, known as "entries", each with an extensible set of attached "metadata". Create an ASP.NET Web Application Create an ASP.NET Web application as shown below and make sure the target platform is 3.5. Create the Entity Model Right click the Application in Solution Explorer and click on Add New Item. In the Add New Item window click on ADO.NET Entity Data Model and change the default name Model1.edmx to one of your choice. Here it is MyFirst.edmx. Then click on the Add button. This brings up the Entity Data Model Wizard as shown. Read the notes on this page. Now click on the Generate from Database and click Next. This opens the 'Choose Your Data Connection' page of the wizard as shown with some default connection as shown. Change it over to the TestNorthwind database on SQL Server 2008 Enterprise Edition (RTM). TestNorthwind is a copy of the Northwind database and click Next. Of course you can create a new connection as well. The connection is saved to the Web.Config file. Click on the Next button. The wizard changes page so that you can 'Choose your Database Objects' This may take some time as the program accesses the database and the following page is displayed. The model uses the relational data in the database. Place a check mark for the Tables to include all the tables in the database. The TestNorthwindModel namespace gets created. Click on the Finish button. This brings up the ModelBrowser browsing the MyFirst.edmx file as shown. The MyFirst.edmx file contents are shown in the next figure. By right clicking inside the MyFirst.edmx you can choose the Zoom level for the display. The connected items shown are the tables with relationships in the database. Those standing alone are just tables that are not a part of the original database and can be removed by right clicking and deleting them. The tables retained finally for the model are as shown. Make use of the Zoom controls to adjust to your comfort level. As seen in the above each table has both Scalar Properties as well as Navigation Properties. The Navigation properties shows the relationship with other tables. In the OrderDetails table shown, the connections extend to the Orders table and the Products table. On the other hand the Orders table is connected to, Customers, Employees, Order_Details and Shippers. In the Solution Explorer the MyFirst.edmx file consists of the TestNorthwindModel and the TestNorthwindModel.store folders. The model folder consist of the Entities and the Assoications(relationships between entities) and the EntityContainer as shown.
Read more
  • 0
  • 0
  • 2741

article-image-migrating-ms-sql-server-2008-enterprisedb
Packt
09 Oct 2009
2 min read
Save for later

Migrating from MS SQL Server 2008 to EnterpriseDB

Packt
09 Oct 2009
2 min read
With many database vendor products in the market and data intensive applications using them, it is often required to port the application to use the data or, migrate the data so that the application can use it. Migration of data is therefore one of the realities of the IT Industry. Some of the author's previous articles on migration can be found at this link. You may find more if you do a search on his blog site. Table to be migrated in SQL Server 2008 The following figure shows the Categories table in the Microsoft SQL Server 2008's Management Studio that will be migrated to the Postgres database. Creating a database in Postgres Studio Right click Databases node in the Advanced Server 8.3 and click on New Database... menu as shown. The New Database... window gets displayed as shown. Create an empty database PGNorthwind in Postgres Studio by entering information shown in the next figure. This creates a new database and related objects as shown in the next figure. This also creates the script in the Properties pane as shown. Review the properties. The database may be dropped using the Drop Database statement. Starting the Migration Studio Click on Start | All Programs | Postgres Advanced Server 8.3 to display the drop-down menu as shown. Click on the Migration Studio drop-down item. This opens the EnterpriseDB Migration Studio 8.3(Migration Studio for the rest of the tutorial) with a modal form with the title Edit Server Advanced Server 8.3(localhost:5432). This is the server we installed in the previous tutorial. Enter the User Name and Password and click OK. You will get a message displaying the result as shown in the next figure. Click OK to both the open windows and the EnterpriseDB Migration Studio shows up as shown here. Click on the File in the main menu on the Migration Studio and pick Add Server. This brings up the Add Server window with a default as shown.
Read more
  • 0
  • 0
  • 2939

article-image-remote-job-agent-oracle-11g-database-oracle-scheduler
Packt
09 Oct 2009
7 min read
Save for later

Remote Job Agent in Oracle 11g Database with Oracle Scheduler

Packt
09 Oct 2009
7 min read
Oracle Scheduler in Oracle 10g is a very powerful tool. However, Oracle 11g has many added advantages that give you more power. In this article by Ronald Rood, we will get our hands on the most important addition to the Scheduler—the remote job agent . This is a whole new kind of process, which allows us to run jobs on machines that do not have a running database. However, they must have Oracle Scheduler Agent installed, as this agent is responsible for executing the remote job. This gives us a lot of extra power and also solves the process owner's problem that exists in classical local external jobs. In classical local external jobs, the process owner is by default nobody and is controlled by $ORACLE_HOME/rdbms/admin/externaljob.ora. This creates problems in installation, where the software is shared between multiple databases because it is not possible to separate the processes. In this article, we will start by installing the software, and then see how we can make good use of it. After this, you will want to get rid of the classical local external jobs as soon as possible because you will want to embrace all the improvements in the remote job agent over the old job type. Security Anything that runs on our database server can cause havoc to our databases. No matter what happens, we want to be sure that our databases cannot be harmed. As we have no control over the contents of scripts that can be called from the database, it seems logical not to have these scripts run by the same operating system user who also owns the Oracle database files and processes. This is why, by default, Oracle chose the user nobody as the default user to run the classical local external jobs. This can be adjusted by editing the contents of $ORACLE_HOME/rdbms/admin/externaljob.ora. On systems where more databases are using the same $ORACLE_HOME directory, this automatically means that all the databases run their external jobs using the same operating system account. This is not very flexible. Luckily for us, Oracle has changed this in the 11g release where remote external jobs are introduced. In this release, Oracle decoupled the job runner process and the database processes. The job runner process, that is the job agent, now runs as a remote process and is contacted using a host:port combination over TCP/IP. The complete name for the agent is remote job agent, but this does not mean the job agent can be installed only remotely. It can be installed on the same machine where the database runs, and where it can easily replace the old-fashioned remote jobs. As the communication is done by TCP/IP, this job agent process can be run using any account on the machine. Oracle has no recommendations for the account, but this could very well be nobody. The operating system user who runs the job agent does need some privileges in the $ORACLE_HOME directory of the remote job agent, namely, an execution privilege on $ORACLE_HOME/bin/* as well as read privileges on $ORACLE_HOME/lib/*. At the end of the day, the user has to be able to use the software. The remote job agent should also have the ability to write its administration (log) in a location that (by default) is in $ORACLE_HOME/data, but it can be configured to a different location by setting the EXECUTION_AGENT_DATA environment variable. In 11g, Oracle also introduced a new object type called CREDENTIAL. We can create credentials using dbms_scheduler.create_credential. This allows us to administrate which operating system user is going to run our jobs in the database. This also allows us to have control over who can use this credential. To see which credentials are defined, we can use the *_SCHEDULER_CREDENTIAL views. We can grant access to a credential by granting execute privilege on the credential. This adds lots more control than we ever had in Oracle 10gR2. Currently, the Scheduler Agent can only use a username-password combination to authenticate against the operating system. The jobs scheduled on the remote job agent will run using the account specified in the credential that we use in the job definition. Check the Creating job section to see how this works. This does introduce a small problem in maintenance. On many systems, customers are forced to use security policies such as password aging. When combining with credentials, this might cause a credential to become invalid. Any change in the password of a job runtime account needs to be reflected in the credential definition that uses the account. As we get much more control over who executes a job, it is strongly recommend to use the new remote job agent in favor of the classical local external jobs, even locally. The classical external job type will soon become history. A quick glimpse with a wireshark, a network sniffer, does not reveal the credentials in the clear text, so it looks like it's secure by default. However, the job results do pass in clear text. The agent and the database communicate using SSL and because of this, a certificate is installed in the ${EXECUTION_AGENT_DATA}/agent.key.  You can check this certificate using Firefox. Just point your browser to the host:port where the Scheduler Agent is running and use Firefox to examine the certificate. There is a bug in 11.1.0.6 that generates a certificate with an expiration date of 90 days past the agent's registration date. In such a case, you will start receiving certificate validation errors when trying to launch a job. Stopping the agent can solve this. Just remove the agent.key and re-register the agent with the database. The registration will be explained in this article shortly. Installation on Windows We need to get the software before the installation can take place. The Scheduler Agent can be found on the Transparent Gateways disk, which can be downloaded from Oracle technet at http://www.oracle.com/technology/software/products/database/index.html. There's no direct link to this software, so find a platform of your choice and click on See All to get the complete list of database software products for that platform. Then download the Oracle Database Gateways CD. Unzip the installation CD, and then navigate to the setup program found in the top level folder and start it. The following screenshot shows the download directory where you run the setup file: After running the setup, the following Welcome screen will appear. The installation process is simple. Click on the Next button to continue to the product selection screen. Select Oracle Scheduler Agent 11.1.0.6.0 and click on the Next button to continue. Enter Name and Path for ORACLE_HOME (we can keep the default values). Now click on Next to reach the screen where we can choose a port on which the database can contact the agent. I chose 15021. On Unix systems, pick a port above 1023 because the lower ports require root privileges to open. The port should be unused and easily memorizable, and should not be used by the database's listener process. If possible, keep all the remote job agents registered to the same database and the same port. Also, don't forget to open the firewall for that port. Hitting the Next button brings us to the following Summary screen: We click on the Install button to complete the installation. If everything goes as expected, the End of Installation screen pops up as follows: Click on the Exit button and confirm the exit. We can find Oracle Execution Agent in the services control panel. Make sure it is running when you want to use the agent to run jobs.
Read more
  • 0
  • 0
  • 3452
article-image-debugging-scheduler-oracle-11g-databases
Packt
08 Oct 2009
5 min read
Save for later

Debugging the Scheduler in Oracle 11g Databases

Packt
08 Oct 2009
5 min read
Unix—all releases Something that has not been made very clear in the Oracle Scheduler documentation is that redirection cannot be used in jobs (<, >, >>, |, &&, ||). Therefore, many developers have tried to use it. So, let's keep in mind that we cannot use redirection, not in 11g as well as older releases of the database. The scripts must be executable, so don't forget to set the execution bits. This might seem like knocking down an open door, but it's easily forgotten. The user (who is the process owner of the external job and is nobody:nobody by default) should be able to execute the $ORACLE_HOME/bin/extjob file. In Unix, this means that the user should have execution permissions on all the parent directories of this file. This is not something specific to Oracle; it's just the way a Unix file system works. Really! Check it out. Since 10gR1, Oracle does not give execution privileges to others. A simple test for this is to try starting SQL*Plus as a user who is neither the Oracle installation user, nor a member of the DBA group—but a regular user. If you get all kinds of errors, then it implies that the permissions are not correct, assuming that the environment variables (ORACLE_HOME and PATH) are set up correctly. The $ORACLE_HOME/install/changePerm.sh script can fix the permissions within ORACLE_HOME (for 10g). In Oracle 11g, this again changed and is no longer needed. The Scheduler interprets the return code of external jobs and records it in the *_scheduler_job_run_details view. This interpretation can be very misleading, especially when using your own exit codes. For example, when you code your script to check the number of arguments, and code an exit 1 when you find the incorrect number of arguments, the error number is translated to ORA-27301: OS failure message:No such file or directory by Oracle using the code in errno.h. In 11g, the Scheduler also records the return code in the error# column. This lets us recognize the error code better and find where it is raised in the script that ran, when the error codes are unique within the script. When Oracle started with Scheduler, there were some quick changes. Here are the most important changes listed that could cause us problems when the definitions of the mentioned files are not exactly as listed: 10.2.0.1: $ORACLE_HOME/bin/extjob should be owned by the user who runs the jobs (process owner) and have 6550 permissions (setuid process owner). In a regular notation, that is what ls –l shows, and the privileges should be -r-sr-s---. 10.2.0.2: $ORACLE_HOME/rdbms/admin/externaljob.ora should be owned by root. This file is owned by the Oracle user (the user who installed Oracle) and the Oracle install group with 644 permissions or -rw-r—-r--, as shown by ls –l. This file controls which operating system user is going to be the process owner, or which user is going to run the job. The default contents of this file are as shown in the following screenshot: $ORACLE_HOME/bin/extjob must be the setuid root (permissions 4750 or -rwsr-x---) and executable for the Oracle install group, where the setuid root means that the root should be the owner of the file. This also means that while executing this binary, we temporarily get root privileges on the system. $ ORACLE_HOME/bin/extjobo should have normal 755 or -rwxr-xr-x permissions,and be owned by the normal Oracle software owner and group. If this file is missing, just copy it from $ORACLE_HOME/bin/extjob. On AIX, this is the first release that has external job support. 11g release: I n 11g, the same files as in 10.2.0.2 exist with the same permissions. But $ORACLE_HOME/bin/jssu is owned by root and the Oracle install group with the setuid root (permissions 4750 or -rwsr-x---). It is undoubtedly best to stop using the old 10g external jobs and migrate to the 11g external jobs with credentials as soon as possible. The security of the remote external jobs is better because of the use of credentials instead of falling back to the contents of a single file in $ORACLE_HOME/, and the flexibility is much better. In 11g, the process owner of the remote external jobs is controlled by the credential and not by a file. Windows usage On Windows, this is a little easier with regard to file system security. The OracleJobscheduler service must exist in a running state, and the user who runs this service should have the Logon as batch job privilege. A .batfile cannot be run directly, but should be called as an argument of cmd.exe, for example: --/BEGINDBMS_SCHEDULER.create_job(job_name => 'env_windows',job_type => 'EXECUTABLE',number_of_arguments => 2,job_action => 'C:windowssystem32cmd.exe',auto_drop => FALSE,enabled => FALSE);DBMS_SCHEDULER.set_job_argument_value('env_windows',1,'/c');DBMS_SCHEDULER.set_job_argument_value('env_windows',2,'d:temptest.bat');end;/ This job named env_windows calls cmd.exe, which eventually runs the script named test.bat that we created in d:temp. When the script we want to call needs arguments, they should be listed from argument number 3 onwards.
Read more
  • 0
  • 0
  • 4590

article-image-time-dimension-oracle-warehouse-builder-11g
Packt
05 Oct 2009
4 min read
Save for later

Time Dimension in Oracle Warehouse Builder 11g

Packt
05 Oct 2009
4 min read
Let's discuss briefly what a Time dimension is, and then we'll dive right into the Warehouse Builder Design Center and create one. A Time dimension is a key part of most data warehouses. It provides the time series information to describe our data. A key feature of data warehouses is being able to analyze data from several time periods and compare results between them. The Time dimension is what provides us the means to retrieve data by time period. Do not be confused by the use of the word Time to refer to this dimension. In this case, it does not refer to the time of day but to time in general which can span days, weeks, months, and so on. We are using it because the Warehouse Builder uses the word Time for this type of dimension to signify a time period. So when referring to a Time dimension here, we will be talking about our time period dimension that we will be using to store the date. We will give the name Date to be clear about what information it contains. Every dimension, whether time or not, has four characteristics that have to be defined in OWB: Levels Dimension Attributes Level Attributes Hierarchies The Levels are for defining the levels where aggregations will occur, or to which data can be summed. We must have at least two levels in our Time dimension. While reporting on data from our data warehouse, users will want to see totals summed up by certain time periods such as per day, per month, or per year. These become the levels. A multidimensional implementation includes metadata to enable aggregations automatically at those levels, if we use the OLAP feature. The relational implementation can make use of those levels in queries to sum the data. The Warehouse Builder has the following Levels available for the Time dimension: Day Fiscal week Calendar week Fiscal month Calendar month Fiscal quarter Calendar quarter Fiscal year Calendar year The Dimension Attributes are individual pieces of information we're going to store in the dimension that can be found at more than one level. Each level will have an ID that identifies that level, a start and an end date for the time period represented at that level, a time span that indicates the number of days in the period, and a description of the level. Each level has Level Attributes associated with it that provide descriptive information about the value in that level. The dimension attributes found at that level and additional attributes specific to the level are included. For example, if we're talking about the Month level, we will find attributes that describe the value for the month such as the month of the year it represents, or the month in the calendar quarter. These would be numbers indicating which month of the year or which month of the quarter it is. The Oracle Warehouse Builder Users' Guide contains a more complete list of all the attributes that are available. OWB tracks which of these attributes are applicable to which level and allows the setting of a separate description that identifies the attribute for that level. Toward the end of the chapter, when we look at the Data Object Editor, we'll see the feature provided by the Warehouse Builder to view details about objects such as dimensions and cubes. We must also define at least one Hierarchy for our Time dimension. A hierarchy is a structure in our dimension that is composed of certain levels in order; there can be one or more hierarchies in a dimension. Calendar month, calendar quarter, and calendar year can be a hierarchy. We could view our data at each of these levels, and the next level up would simply be a summation of all the lower-level data within that period. A calendar quarter sum would be the sum of all the values in the calendar month level in that quarter, and the multidimensional implementation includes the metadata to facilitate these kinds of calculations. This is one of the strengths of a multidimensional implementation. The good news is that the Warehouse Builder contains a wizard that will do all the work for us—create our Time dimension and define the above four characteristics—just by asking us a few questions.
Read more
  • 0
  • 0
  • 2234
Modal Close icon
Modal Close icon