Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Data

1210 Articles
article-image-excel-2010-financials-identifying-profitability-investment
Packt
12 Jul 2011
5 min read
Save for later

Excel 2010 Financials: Identifying the Profitability of an Investment

Packt
12 Jul 2011
5 min read
  Excel 2010 Financials Cookbook Powerful techniques for financial organization, analysis, and presentation in Microsoft Excel         Read more about this book       (For more resources on this subject, see here.) Calculating the depreciation of assets Assets within a business are extremely important for a number of reasons. Assets can become investments for growth or investments in another line of business. Assets can also take on many forms such as computer equipment, vehicles, furniture, buildings, land, and so on. Assets are not only important within the business for which they are used, but they are also used as a method of reducing the tax burden on a business. As a financial manager, you are tasked with calculating the depreciation expense for a laptop computer with a useful life of five years. In this recipe, you will learn to calculate the depreciation of an asset over the life of the asset. Getting ready There are several different methods of depreciation. A business may use straight-line depreciation, declining depreciation, double-declining depreciation, or a variation of these methods. Excel has the functionality to calculate each of the methods with a slight variation to the function; however, in this recipe, we will use a straight-line depreciation. Straight-line depreciation provides equal reduction of an asset over its life. Getting ready We will first need to set up the Excel worksheet to hold the depreciation values for the laptop computer: In cell A5 list Year and in cell B5 list 1. This will account for the depreciation for the year that the asset was purchased. Continue this list until all five years are listed: In cell B2, list the purchase price of the Laptop computer; the purchase price is $2500: In cell B4, we will need to enter the salvage value of the asset. The salvage value will be the estimated resale value of the asset when it is useful life, as determined by generally accepted accounting principles, has elapsed. Enter $500 in cell B3: In cell B5 enter the formula =SLN($B$2,$B$3,5) and press Enter: Copy the formula from cell C5 and paste it through cell C9: Excel now has listed the straight-line depreciation expense for each of the five years. As you can see in this schedule, the depreciation expense remains consistent through each year of the asset's useful life. How it works... Straight-line depreciation calculates the value of the purchase price minus the salvage price, and divides the remainder across the useful life. The function accepts inputs as follows =SLN(purchase price, salvage price, useful life). There's more... Other depreciation methods are as follows: Calculating the future versus current value of your money When working within finance, accounting, or general business it is important to know how much money you have. However, knowing how much money you have now is only a portion of the whole financial picture. You must also know how much your money will be worth in the future. Knowing future value allows you to know truly how much your money is worth, and with this knowledge, you can decide what you need to do with it. As a financial manager, you must provide feedback on whether to introduce a new product line. As with any new venture, there will be several related costs including start-up costs, operational costs, and more. Initially, you must spend $20,000 to account for most start-up costs and you will potentially, for the sake of the example, earn a profit of $5500 for five years. You also know due to expenditures, you expect your cost of capital to be 10%. In this recipe, you will learn to use Excel functions to calculate the future value of the venture and whether this proves to be profitable. How to do it... We will first need to enter all known values and variables into the worksheet: In cell B2, enter the initial cost of the business venture: In cell B3, enter the discount rate, or the cost of capital of 10%: In cells B4 through B8, enter the five years' worth of expected net profit from the business venture: In cell B10, we will calculate the net present value: Enter the formula =NPV(B3,B4:B8) and press Enter We now see that accounting for future inflows, the net present value of the business venture is $20,849.33. Our last step is to account for the initial start-up costs and determine the overall profitability. In cell B11, enter the formula =B10/B2 and press Enter: As a financial manager, we now see that for every $1 invested in this venture, you will receive $1.04 in present value inflows. How it works... NPV or net present value is calculated in Excel using all of the inflow information that was entered across the estimated period. For the five years used in this recipe, the venture shows a profit of $5500 for each year. This number cannot be used directly, because there is a cost of making money. The cost in this instance pertains to taxes and other expenditures. In the NPV formula, we did not include the initial cost of start-up because this cost is exempt from the cost of capital; however, it must be used at the end of the formula to account for the outflow compared to the inflows. There's more... The $1.04 value calculated at the end of this recipe is also known as the profitability index. When this index is greater than one, the venture is said to be a positive investment.
Read more
  • 0
  • 0
  • 5172

article-image-building-financial-functions-excel-2010
Packt
07 Jul 2011
5 min read
Save for later

Building Financial Functions into Excel 2010

Packt
07 Jul 2011
5 min read
Excel 2010 Financials Cookbook Powerful techniques for financial organization, analysis, and presentation in Microsoft Excel         Till now, in the previous articles, we have focused on manipulating data within and outside of Excel in order to prepare to make financial decisions. Now that the data has been prepared, re-arranged, or otherwise adjusted, we are able to leverage the functions within Excel to make actual decisions. Utilizing these functions and the individual scenarios, we will be able to effectively eliminate the uncertainty due to poor analysis. Since this article utilizes financial scenarios for demonstrating the use of the various functions, it is important to note that these scenarios take certain "unknowns" for granted, and makes a number of assumptions in order to minimize the complexity of the calculation. Real-world scenarios will require a greater focus on calculating and accounting for all variables. Determining standard deviation for assessing risk In the recipes mentioned so far, we have shown the importance of monitoring and analyzing frequency to determine the likelihood that an event will occur. Standard deviation will now allow for an analysis of the frequency in a different manner, or more specifically, through variance. With standard deviation, we will be able to determine the basic top and bottom thresholds of data, and plot general movement within that threshold to determine the variance within the data range. This variance will allow the calculation of risk within investments. As a financial manager, you must determine the risk associated with investing capital in order to gain a return. In this particular instance, you will invest in stock. In order to minimize loss of investment capital, you must determine the risk associated between investing between two different stocks, Stock A, and Stock B. In this recipe, we will utilize standard deviation to determine which stock, either A or B, presents a higher risk, and hence a greater risk of loss. How to do it... We will begin by entering the selling prices of Stock A and Stock B in columns A and B, respectively: Within this list of selling prices, at first glance we can see that Stock B has a higher selling price. The stock opening price and selling price over the course of 52 weeks almost always remains above that of Stock A. As an investor looking to gain a higher return, we may wish to choose Stock B based on this cursory review; however, high selling price does not negate the need for consistency. In cell C2, enter the formula =STDEV(A2:A53) and press Enter: In cell C3, enter the formula =STDEV(B2:B53) and press Enter: We can see from the calculation of standard deviation, that Stock B has a deviation range or variance of over $20, whereas Stock A's variance is just over $9: Given this information, we can determine that Stock A presents a lower risk than Stock B. If we invest in Stock A, at any given time, utilizing past performance, our average risk of loss is $9, whereas in Stock B we an average risk of $20. How it works... The function of STDEV or standard deviation in Excel utilizes the given numbers as a complete population. This means that it does not account for any other changes or unknowns. Excel will use this data set as a complete set and determine the greatest change from high to low within the numbers. This range of change is your standard deviation. Excel also includes the function STDEVP that treats the data as a selection of a larger population. This function should be sed if you are calculating standard deviation on a subset of data (for example, six months out of an entire year). If we translate these numbers into a line graph with standard deviation bars, as shown in the following screenshot for Stock A, you can see the selling prices of the stock, and how they travel within the deviation range: If we translate these numbers into a line graph with standard deviation bars, as shown in the following screenshot for Stock B, you can see the selling prices of the stock, and understand how they travel within the deviation range: The bars shown on the graphs represent the standard deviation as calculated by Excel. We can see visually that not only does Stock B represent a greater risk with the larger deviation, but also many of the stock prices fall below our deviation, representing further risk to the investor. With funds to invest as a finance manager, Stock A represents a lower risk investment. There's more... Standard deviation can be calculated for almost any data set. For this recipe, we calculated deviation over the course of one year; however, if we expand the data to include multiple years we can further determine long-term risk. While Stock B represents high short-term risk, in the long-term analysis, Stock B may present as a less risky investment. Combining standard deviation with a five-number summary analysis, we can further gain risk and performance information.
Read more
  • 0
  • 0
  • 4923

article-image-designing-user-security-oracle-peoplesoft-applications
Packt
04 Jul 2011
12 min read
Save for later

Designing User Security for Oracle PeopleSoft Applications

Packt
04 Jul 2011
12 min read
Understanding User security Before we get into discussing the PeopleSoft security, let's spend some time trying to set the context for user security. Whenever we think of a complex system like PeopleSoft Financial applications with potentially hundreds of users, the following are but a few questions that face us: Should a user working in billing group have access to transactions, such as vouchers and payments, in Accounts Payable? Should a user who is a part of North America business unit have access to the data belonging to the Europe business unit? Should a user whose job involves entering vouchers be able to approve and pay them as well? Should a data entry clerk be able to view departmental budgets for the organization? These questions barely scratch the surface of the complex security considerations of an organization. Of course, there is no right or wrong answer for such questions, as every organization has its own unique security policies. What is more important is the fact that we need a mechanism that can segregate the access to system features. We need to enforce appropriate controls to ensure users can access only the features they need. Implementation challenge Global Vehicles' Accounts Payable department has three different types of users – Mary, who is the AP manager; Amy, who is the AP Supervisor; and Anton, who is the AP Clerk. These three types of users need to have the following access to PeopleSoft features: User typeAccess to featureDescriptionAP ClerkVoucher entryA clerk should be able to access system pages to enter vouchers from various vendors.AP SupervisorVoucher entry Voucher approval Voucher postingA supervisor also needs to have access to enter vouchers. He/she should review and approve each voucher entered by the clerk. Also, the supervisor should be able to execute the Voucher Post process that creates accounting entries for vouchers.AP ManagerPay cycle Voucher approvalAP manager should be the only one who can execute the Pay Cycle (a process that prints checks to issue payments to vendors). Manager (in addition to the Supervisor) should also have the authority to approve vouchers. Note that this is an extremely simplified scenario that does not really include all the required features in Accounts Payable. Solution We will design a security matrix that uses distinct security roles. We'll configure permission lists, roles, and user profiles to limit user access to required system features. PeopleSoft security matrix is a three-level structure consisting of Permission lists (at the bottom), Roles (in the middle) and User profiles (at the top). The following illustration shows how it is structured: We need to create a User Profile for each user of the system. This user profile can have as many Roles as needed. For example, a user can have roles of Billing Supervisor and Accounts Receivable Payment Processor, if he/she approves customer invoices as well as processes customer payments. Thus, the number of roles that a user should be granted depends entirely on his/her job responsibilities. Each role can have multiple permission lists. A Permission list determines which features can be accessed by a Role. We can specify which pages can be accessed, the mode in which they can be accessed (read only/add/update) in a permission list. In a nutshell, we can think of a Role as a grouping of system features that a user needs to access, while a Permission list defines the nature of access to those system features. Expert tip Deciding how many and which roles should be created needs quite a bit of thinking about. How easy will it be to maintain them in future? Think of the scenarios where a function needs to be removed from a role and added to another. How easy would it be to do so? As a rule of thumb, you should map system features to roles in such a way that they are non-overlapping. Similarly, map system pages to permission lists so that they are mutually exclusive. This simplifies user security maintenance to a large extent. Note that although it is advisable, it may not always be possible. Organizational requirements are sometimes too complicated to achieve this. However, a PeopleSoft professional should try to build a modular security design to the extent possible. Now, let's try to design our security matrix for the hypothetical scenario presented previously and test our thumb rule of mutually exclusive roles and permission lists. What do you observe as far as required system features for a role are concerned? You are absolutely right if you thought that some of system features (such as Voucher entry) are common across roles. Which roles should we design for this situation? Going by the principle of mutually exclusive roles, we can map system features to the permission lists (and in turn roles) without overlapping them. We'll denote our roles by the prefix 'RL' and permission lists by the prefix 'PL'. Thus, the mapping may look something like this: RolePermission listSystem featureRL_Voucher_EntryPL_Voucher_EntryVoucher EntryRL_Voucher_ApprovalPL_Voucher_ApprovalVoucher ApprovalRL_Voucher_PostingPL_Voucher_PostingVoucher PostingRL_Pay_CyclePL_Pay_CyclePay Cycle So, now we have created the required roles and permission lists, and attached appropriate permission lists to each of the roles. In this example, due to the simple scenario, each role has only a single permission list assigned to it. Now as the final step, we'll assign appropriate roles to each user's User profile. UserRoleSystem feature accessedMaryRL_Voucher_ApprovalVoucher ApprovalRL_Pay_CyclePay CycleAmyRL_Voucher_EntryVoucher EntryRL_Voucher_ApprovalVoucher ApprovalRL_Voucher_PostingVoucher PostingAntonRL_Voucher_EntryVoucher Entry Now, as you can see, each user has access to the appropriate system feature through the roles and in turn, permission lists attached to their user profiles. Can you think of the advantage of our approach? Let's say that a few months down the line, it is decided that Mary (AP Manager) should be the only one approving vouchers, while Amy (AP Supervisor) should also have the ability to execute pay cycles and issue payments to vendors. How can we accommodate this change? It's quite simple really – we'll remove the role RL_Voucher_Approval from Amy's user profile and add the role RL_Pay_Cycle to her profile. So now the security matrix will look like this: UserRoleSystem feature accessedMaryRL_Voucher_ApprovalVoucher ApprovalRL_Pay_CyclePay CycleAmyRL_Voucher_EntryVoucher EntryRL_Pay_CyclePay CycleRL_Voucher_PostingVoucher PostingAntonRL_Voucher_EntryVoucher Entry Thus, security maintenance becomes less cumbersome when we design roles and permission lists with non-overlapping system features. Of course, this comes with a downside as well. This approach results in a large number of roles and permission lists, thereby increasing the initial configuration effort. The solution that we actually design for an organization needs to balance these two objectives. Expert tip Having too many permission lists assigned to a User Profile can adversely affect the system performance. PeopleSoft recommends 10-20 permission lists per user. Configuring permission lists Follow this navigation to configure permission lists: PeopleTools | Security | Permissions & Roles | Permission Lists The following screenshot shows the General tab of the Permission List set up page. We can specify the permission list description on this tab. We'll go through some of the important configuration activities for the Voucher Entry permission list discussed previously. Can Start Application Server?: Selecting this field enables a user with this permission list to start PeopleSoft application servers (all batch processes are executed on this server). A typical system user does not need this option. Allow Password to be Emailed?: Selecting this field enables users to receive forgotten passwords through e-mail. Leave the field unchecked to prevent unencrypted passwords from being sent in e-mails. Time-out Minutes – Never Time-out and Specific Time-out: These fields determine the number of minutes of inactivity after which the system automatically logs out the user with this permission list. The following screenshot shows the Pages tab of the permission list page. This is the most important place where we specify the menus, components, and ultimately the pages that a user can access. In PeopleSoft parlance, a component means collection of pages that are related to each other. In the previous screenshot, you are looking at a page. You can also see other related pages used to configure permission lists – General, PeopleTools, Process, and so on. All these pages constitute a component. A menu is a collection of various components. It is typically related to a system feature, such as 'Enter Voucher Information' as you can see in the screenshot. Thus, to grant a page access, we need to enter its component and menu details. Expert tip In order to find out the component and menu for a page, press CTRL+J when it is open in the internet browser. Menu Name: This is where we need to specify all the menus to which a user needs to have access. Note that a permission list can grant access to multiple menus, but our simple example includes only one system feature (Voucher entry) and in turn, only one menu. Click the + button to add and select more menus. Edit Component hyperlink: Once we select a menu, we need to specify which components under it should be accessed by the permission list. Click the Edit Components hyperlink to proceed. The system opens the Component Permissions page, as shown in the following screenshot: In this page, the system shows all components under the selected menu. The previous screenshot shows only a part of the component list under the 'Enter Voucher Information' menu. Voucher entry pages for which we need to grant access exist under a component named VCHR_EXPRESS. Click the Edit Pages hyperlink to grant access to specific pages under a given component. The system opens the Page Permissions page, as shown in the following screenshot: The given screenshot shows the partial list of pages under the Voucher Entry component. Panel Item Name: This field shows the page name to which access is to be granted. Authorized?: In order to enable a user to access a page, select this checkbox. As you can see, we have authorized this permission list to access six pages in this component. Display Only: Clicking this checkbox allows the user to get read-only access for the given page. He/she cannot make any changes to the data on this page. Add: Selecting this checkbox enables the user to add a new transaction (in this case new vouchers). Update/Display: Selecting this checkbox enables the user to retrieve the current effective dated row. He/she can also add future effective dated rows in addition to modifying them. Update/Display All: This option gives the user all of the privileges of the Update/Display option. In addition he/she can retrieve past effective dated rows as well. Correction: This option enables the user to perform all the possible operations; that is, to retrieve, modify or add past, current, and future effective dated rows. Effective dates are usually relevant for master data set ups. It drives when a particular value comes into effect. A vendor may be set up with effective date of 1/1/2001, so thatit comes into effect from that date. Now assume that its name is slated to change on 1/1/2012. However, we can add a new vendor row with the new name and the appropriate effective date. The system automatically starts using the new name from 1/1/2012. Note that there can be multiple future effective dated rows, but only one current row. The next tab PeopleTools contains configuration options that are more relevant for technical developers. As we are concentrating on business users of the system, we'll not discuss them. Click the Process tab. As shown in the following screenshot, this tab is used to configure options for Process groups. A process group is a collection of batch processes belonging to a specific internal department or a business process. For example, PeopleSoft delivers a process group ARALL that includes all Accounts Receivable batch processes. PeopleSoft delivers various pre-configured process groups; however, we can create our own process groups depending on the organization's requirements. Click the Process Group Permissions hyperlink. The system opens a new page where we can select as many process groups as needed. When a process group is selected for a permission list, it enables the users to execute batch and online processes that are part of it. The following screenshot shows the Sign-on Times tab. This tab controls the time spans when a user with this permission list can sign-on to the system. We can enforce specific days or specific time spans for a particular day when users can sign-on. In the case of our permission list, there are no such limits and users with this permission list will be able to sign-on anytime on all days of the week. The next tab on this page is the Component Interface. The component interface is a PeopleSoft utility that automates bulk data entry into PeopleSoft pages. We can select component interface values on this tab, so that users with this permission list have access rights to use them. Due to the highly technical nature of the activities involved, we will not discuss the Web Libraries, Web Services, Mass change, Links, and Audit tabs. Oracle offers exhaustive documentation on PeopleSoft applications at the following here. The next important tab on the permission list page is Query. On this tab, the system shows two hyperlinks: Access Group Permissions and Query Profile. Click the Access Group Permissions hyperlink. The system opens the Permission List Access Groups page, as shown in the next screenshot. This page controls which database tables can be accessed by users to create database queries, using a PeopleSoft tool called Query Manager. Tree Name: A tree is a hierarchical structure of database tables. As shown in the screenshot, the tree QUERY_TREE_AP groups all AP tables. Access Group: Each tree has multiple nodes called access groups. These access groups are just logical groups of tables within a tree. In the screenshot, VOUCHERS is a group of voucher-related tables within the AP table tree. With this configuration, users will be able to create database queries on voucher tables in AP. Using the Query Profile hyperlink, we can set various options that control how users can create queries (such as if he/she can use joins, unions, and so on in queries, how many database rows can be fetched by the query, if the user can copy the query output to Microsoft Excel, and so on).
Read more
  • 0
  • 0
  • 7038

article-image-overview-oracle-peoplesoft-commitment-control
Packt
29 Jun 2011
9 min read
Save for later

An Overview of Oracle PeopleSoft Commitment Control

Packt
29 Jun 2011
9 min read
Oracle PeopleSoft Enterprise Financial Management 9.1 Implementation An exhaustive resource for PeopleSoft Financials application practitioners to understand core concepts, configurations, and business processes        Understanding commitment control Before we proceed further, let's take some time to understand the basic concepts of commitment control. Commitment control can be used to track expenses against pre-defined control budgets as well as to track recognized revenue against revenue estimate budgets. In this article, we'll concentrate more on the expense side of commitment control, as it is more widely used. Defining control budgets An organization may draw budgets for different countries in which it operates or for its various departments. Going further, it may then define budget amounts for different areas of spending, such as IT hardware, construction of buildings, and so on. Finally, it will also need to specify the time periods for which the budget applies, such as a month, a quarter, six months, or a year. In other words, a budget needs the following components: Account, to specify expense areas such as hardware expenses, construction expense, and so on One or more chartfields to specify the level of budget such as Operating unit, Department, Product, and so on Time period to specify if the budgeted amount applies to a month quarter or year, and so on Let's take a simple example to understand how control budgets are defined. An organization defines budgets for each of its departments. Budgets are defined for different expense categories, such as the purchase of computers and purchase of office supplies such as notepads, pens, and so on. It sets up budgets for each month of the year. Assume that following chartfield values are used by the organization: DepartmentDescriptionAccountDescription700Sales135000Hardware expenses900Manufacturing335000Stationery expenses Now, the budgets are defined for each period and a combination of chartfields as follows: PeriodDepartmentAccountBudget amountJanuary 2012700135000$100,000January 2012700335000$10,000January 2012900135000$120,000January 2012900335000$15,000February 2012700135000$200,000February 2012700335000$40,000February 2012900135000$150,000February 2012900335000$30,000 Thus, $100,000 has been allocated for hardware purchases for the Sales department for January 2012. Purchases will be allowed until this budget is exhausted. If a purchase exceeds the available amount, it will be prevented from taking place. Tracking expense transactions Commitment control spending transactions are classified into Pre-encumbrance, Encumbrance, and Expenditure categories. To understand this, we'll consider a simple procurement example. This involves the PeopleSoft Purchasing and Accounts Payable modules. In an organization, a department manager may decide that he/she needs three new computers for the newly recruited staff. A purchase requisition may be created by the manager to request purchase of these computers. Once it is approved by the appropriate authority, it is passed on to the procurement group. This group may refer to the procurement policies, inquire with various vendors about prices, and decide to buy these computers from a particular vendor. The procurement group then creates a purchase order containing the quantity, configuration, price, and so on and sends it to the vendor. Once the vendor delivers the computers, the organization receives the invoice and creates a voucher to process the vendor payment. Voucher creation takes place in the Accounts Payable module, while creation of requisition and purchase order takes place in the Purchasing module. In commitment control terms, Pre-encumbrance is the amount that may be spent in future, but there is no obligation to spend it. In the previous example, the requisition constitutes the pre-encumbrance amount. Note that the requisition is an internal document which may or may not get approved, thus there is no obligation to spend the money to purchase computers. Encumbrance is the amount for which there is a legal obligation to spend in future. In the previous example, the purchase order sent to the vendor constitutes the encumbrance amount, as we have asked the vendor to deliver the goods. Finally, when a voucher is created, it indicates the Expenditure amount that is actually being spent. A voucher indicates that we have already received the goods and, in accounting terms, the expense has been recognized. To understand how PeopleSoft handles this, think of four different buckets: Budget amount, Pre-encumbrance amount, Encumbrance amount, and Expenditure amount. Step 1 Budget definition is the first step in commitment control. Let's say that an organization has budgeted $50,000 for purchase of IT hardware at the beginning of the year 2011. At that time, these buckets will show the amounts as follows: BudgetPre-encumbranceEncumbranceExpenditureAvailable budget amount$50,000000$50,000 Available budget amount is calculated using the following formula: Available budget amount = Budget amount – (Pre-encumbrance + Encumbrance + Expenditure) Step 2 Now when the requisition for three computers (costing a total of $3,000) is created, it is checked against the available budget. It will be approved, as the entire $50,000 budget amount is available. After getting approved, the requisition amount of $3,000 is recorded as pre-encumbrance and the available budget is accordingly reduced. Thus, the budget amounts are updated as shown next: BudgetPre-encumbranceEncumbranceExpenditureAvailable budget amount$50,000$3,00000$47,000 Step 3 A purchase order can be created only after a requisition is successfully budget checked. When the purchase order i created (again for $3,000), it is once again checked against the available budget and will pass due to sufficient available budget. Thus, once approved, the purchase order amount of $3,000 is recorded as encumbrance, while the pre-encumbrance is eliminated. In other words, the pre-encumbrance amount is converted into an encumbrance amount, as now there is a legal obligation to spend it. A purchase order can be sent to the vendor only after it is successfully budget checked. Now, the amounts are updated as shown next: Budget Pre-encumbranceEncumbranceExpenditureAvailable budget amount$50,0000$3,0000$47,000 Step 4 When the voucher gets created (for $3,000), it is once again checked against the available budget and will pass, as the available budget is sufficient. Once approved, the voucher amount of $3,000 is recorded as expenditure, while the encumbrance is eliminated. In other words, the encumbrance amount is converted into actual expenditure amount. Now, the amounts are updated as shown next: BudgetPre-encumbranceEncumbranceExpenditureAvailable budget amount$50,00000$3,000$47,000 The process of eliminating the previous encumbrance or pre-encumbrance amount is known as liquidation. Thus, whenever a document (purchase order or voucher) is budget checked, the amount for the previous document is liquidated. Thus, a transaction will move successively through these three stages with the system checking if available budget is sufficient to process it. Whenever the transaction encounters insufficient budget, it is flagged as an exception. So, now the obvious question is how do we implement this in PeopleSoft? To put it simply, we need the following building blocks at the minimum: Ledgers to record budget, pre-encumbrance, encumbrance, and expenditure amounts Batch processes to budget check various transactions Of course, there are other configurations involved as well. We'll discuss them in the upcoming section. Commitment control configurations In this section, we'll go through the following important configurations needed to use the commitment control feature: Enabling commitment control Setting up the system-level commitment control options Defining the commitment control ledgers and ledger groups Defining the budget period calendar Configuring the control budget definition Linking the commitment control ledger group with the actual transaction ledger group Defining commitment control transactions Enabling commitment control Before using the commitment control feature for a PeopleSoft module, it needs to be enabled on the Installation Options page. Follow this navigation to enable or disable commitment control for an individual module: Setup Financials / Supply Chain | Install | Installation Options | Products The following screenshot shows the Installation Options – Products page: This page allows a system administrator to activate any installed PeopleSoft modules as well as to enable commitment control feature for a PeopleSoft module. The PeopleSoft Products section lists all PeopleSoft modules. Select the checkbox next to a module that is implemented by the organization. The Enable Commitment Control section shows the PeopleSoft modules for which commitment control can be enabled. Select the checkbox next to a module to activate commitment control and validate transactions in it against defined budgets. Setting up system-level commitment control options After enabling commitment control for desired modules, we need to set up some processing options at the system level. Follow this navigation to set up these system level options: Setup Financials / Supply Chain | Install | Installation Options | Commitment Control The following screenshot shows the Installation Options – Commitment Control page: Default Budget Date: This field specifies the default budget date that is populated on the requisitions, purchase orders and vouchers. The available options are Accounting Date Default (to use the document accounting date as the budget date) and Predecessor Doc Date Default (to use the budget date from the predecessor document). For example, the purchase order inherits requisition's budget date, and the voucher inherits the purchase order's budget date. Reversal Date Option: There are situations when changes are made to an already budget-checked transaction. Whenever this happens, the document needs to be budget checked again. This field determines how the changed transactions are posted. Consider a case where a requisition for $1,000 is created and successfully budget checked in January. In February, the requisition has to be increased to $1,200. It will need to be budget checked again. The available options are Prior Date (this option completely reverses pre-encumbrance entries for January for $1,000 and creates new entries for $1,200 in February) and Current Date (this option creates additional entries for $200 for February, while leaving $1,000 entries for January unchanged). BP Liquidation Option: We already saw that system liquidates the pre-encumbrance and encumbrance amounts while budget checking purchase orders and vouchers. This field determines the period when the previous amount is liquidated, if the transactions occur in different periods. The available options are Current Document Budget period (liquidate the amounts in the current period) and Prior Document Budget period (liquidate the amounts in the period when the source document was created). Enable Budget Pre-Check: This is a useful option to test the expense transactions without actually committing the amounts (pre-encumbrance, encumbrance, and expenditure) in respective ledgers. We may budget check a transaction and find that it fails the validation. Rather than budget checking and then handling the exception, it is much more efficient to simply do a budget pre-check. The system shows all the potential error messages which can help us in correcting the transaction data appropriately. Select the checkbox next to a module to enable this feature.  
Read more
  • 0
  • 0
  • 8730

article-image-excel-2010-financials-adding-animations-excel-graphs
Packt
28 Jun 2011
3 min read
Save for later

Excel 2010 Financials: Adding Animations to Excel Graphs

Packt
28 Jun 2011
3 min read
  Excel 2010 Financials Cookbook Powerful techniques for financial organization, analysis, and presentation in Microsoft Excel      Getting ready We will start this recipe with a single column graph demonstrating profitability. The dataset for this graph is cell A1. Cell A1 has the formula =A2/100. Cell A2 contains the number 1000: How to do it... Press Alt + F11 on the keyboard to open the Excel Visual Basic Editor (VBE). Once in the VBE, choose Insert | Module from the file menu. Enter the following code: Sub Animate() Dim x As Integer x = 0 Range("A2").Value = x While x < 1000 x = x + 1 Range("A2").Value = Range("A2").Value + 1 For y = 1 To 500000 Next DoEvents Wend End Sub The code should be formatted within the VBE code window as follows: Save your work and close the VBE. Once back at the worksheet, from the Ribbon, choose View | Macros | View Macros. Select the Animate macro and choose Run. The graph for profitability will now drop to 0 and slowly rise to account for the full profitability. How it works... The graph was set to use the data within cell A1 as the dataset to graph. While there was a value within the cell, the graph shows the column and corresponding value. If you were to manually change the value of cell A1 and press enter, the graph would also change to the new value. It is this update event that we harness within the VBA macro that was added. Sub Animate() Dim x As Integer x = 0 Here we first declare and set any variables that we will need. The value of x will hold the overall profit: Range("A2").Value = x While x < 1000 x = x + 1 Range("A2").Value = Range("A2").Value + 1 For y = 1 To 500000 Next DoEvents Wend Next, we choose cell A2 from the worksheet and set its value to x (which begins as 0). Since cell A1 is set to be A2/100, A1 (which is the graph dataset) is now zero. Using a "while" clause in Excel, we take x and add 1 to its value, then pause a moment using DoEvents, to allow Excel to update the graph, then we repeat adding another value of 1. This is repeated until x is equal to 1000 which when divided by 100 as in the formula for A1, becomes 10. Now, the profitability we end with in our animation graph is 10. End Sub There's more... We can change the height of the profitability graph by simply changing the 1000 in the While x < 1000 line. By setting this number to 900, the graph will only grow to the profit level of nine. Further resources on this subject: Excel 2010 Financials: Using Graphs for Analysis [Article] Load Testing Using Visual Studio 2008 [Article] How to Manage Content in a List in Microsoft Sharepoint [Article] Sage ACT! 2011: Creating a Quick Report [Article] Tips and Tricks: Report Page in IBM Cognos 8 Report Studio [Article]
Read more
  • 0
  • 0
  • 3402

article-image-excel-2010-financials-using-graphs-analysis
Packt
28 Jun 2011
6 min read
Save for later

Excel 2010 Financials: Using Graphs for Analysis

Packt
28 Jun 2011
6 min read
  Introduction Graphing is one of the most effective ways to display datasets, financial scenarios, and statistical functions in a way that can be understood easily by the users. When you give an individual a list of 40 different numbers and ask him or her to draw a conclusion, it is not only difficult, it may be impossible without the use of extra functions. However, if you provide the same individual a graph of the numbers, they will most likely be able to notice trending, dataset size, frequency, and so on. Despite the effectiveness of graphing and visual modeling, financial and statistical graphing is often overlooked in Excel 2010 due to difficulty, or lack of native functions. In this article, you will learn to not only add reusable methods to automate graph production, but also how to create graphs and graphing sets that are not native to Excel. You will learn to use box and whisker plots, histograms to demonstrate frequency, stem and leaf plots, and other methods to graph financial ratios and scenarios. Charting financial frequency trending with a histogram Frequency calculations are used throughout financial analysis, statistics, and other mathematical representations to determine how often an event has occurred. Determining the frequency of financial events in a transaction list can assist in determining the popularity of an item or product, the future likelihood of an event to reoccur, or frequency of profitability of an organization. Excel, however, does not create histograms by default. In this recipe, you will learn how to use several functions including bar charts and FREQUENCY functions to create a histogram frequency chart within Excel to determine profitability of an entity. Getting ready When plotting histogram frequency, we are using frequency and charting to determine the continued likelihood of an event from past data visually. Past data can be flexible in terms of what we are trying to determine; in this instance, we will use the daily net profit (Sale income Versus Gross expenses) for a retail store. The daily net profit numbers for one month are as follows: $150, $237, -$94.75, $1,231, $876, $455, $349, -$173, -$34, -$234, $110, $83, -$97, -$129, $34, $456, $1010, $878, $211, -$34, -$142, -$87, $312 How to do it... Utilizing the profit numbers from above, we will begin by adding the data to the Excel worksheet: Within Excel, enter the daily net profit numbers into column A starting on row 2 until all the data has been entered: We must now create the boundaries to be used within the histogram. The boundary numbers will be the highest and the lowest number thresholds that will be included within the graph. The boundaries to be used in this instance will be of $1500, and -$1500. These boundaries will encompass our entire dataset, and it will allow padding on the higher and lower ends of the data to allow for variation when plotting larger datasets encompassing multiple months or years worth of profit. We must now create bins that we will chart against the frequency. The bins will be the individual data-points that we want to determine the frequency against. For instance, one bin will be $1500, and we will want to know how often the net profit of the retail location falls within the range of $1500. The smaller the bins chosen, the larger the chart area. You will want to choose a bin size that will accurately reflect the frequency of your dataset without creating a large blank space. Bin size will change depending on the range of data to be graphed. Enter the chosen bin number into the worksheet in Column C. The bins will be a $150 difference from the previous bin.The bin sizes needed to include an appropriate range in order to illustrate the expanse of the dataset completely. In order to encompass all appropriate bin sizes, it is necessary to begin with the largest negative number, and increment to the largest positive number: The last component for creating the frequency histogram actually determines the frequency of net profit to the designated bins. For this, we will use the Excel function FREQUENCY. Select rows D2 through D22, and enter the following formula: =FREQUENCY(A:A,C2:C22) After entering the formula, press Shift + Ctrl + Enter to finalize the formula as an array formula.Excel has now displayed the frequency of the net profit for each of the designated bins: The information for the histogram is now ready. We will now be able to create the actual histogram graph. Select rows C2 through D22. With the rows selected, using the Excel ribbon, choose Insert | Column: From the Column drop-down, choose the Clustered Column chart option: Excel will now attempt to determine the plot area, and will present you with a bar chart. The chart that Excel creates does not accurately display the plot areas, due to Excel being unable to determine the correct data range: Select the chart. From the Ribbon, choose Select Data: From the select Data Source window that has appeared, you will change the Horizontal Axis to include the Bins column (column C), and the Legend Series to include the Frequency (column D). Excel will also create two series entries within the Legend Series panel; remove Series2 and select OK: Excel now presents the histogram graph of net profit frequency: Using the Format graph options on the Excel Ribbon, reduce the bar spacing and adjust the horizontal label to reduce the chart area to your specifications: Using the histogram for financial analysis, we can now determine that the major trend of the retail location quickly and easily, which maintains a net profit within $0 - $150, while maintaining a positive net profit throughout the month. How it works... While a histogram graph/chart is not native to Excel, we were able to use a bar/column chart to plot the frequency of net profit within specific bin ranges. The FREQUENCY function of Excel follows the following format: =FREQUENCY(Data Range to plot, Bins to plot within) It is important to note that within the data range, we chose the range A:A. This range includes all of the data within the A column. If arbitrary or unnecessary data unrelated to the histogram were added into column A, this range would include them. Do not allow unnecessary data to be added to column A, or use a limited range such as A1:A5 if your data was only included in the first five cells of column A. We entered the formula by pressing Shift + Ctrl + Enter in order to submit the formula as an array formula. This allows Excel to calculate individual information within a large array of data. The graph modifications allowed the bins to show frequency in the vertical axis, or indicate how many times a specific number range was achieved. The horizontal axis displayed the actual bins. There's more... The amount of data for this histogram was limited; however, its usefulness was already evident. When this same charting recipe is used to chart a large dataset (for example, multiple year data), the histogram becomes even more useful in displaying trends.
Read more
  • 0
  • 0
  • 12439
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-pentaho-data-integration-4-working-complex-data-flows
Packt
27 Jun 2011
7 min read
Save for later

Pentaho Data Integration 4: working with complex data flows

Packt
27 Jun 2011
7 min read
Joining two or more streams based on given conditions There are occasions where you will need to join two datasets. If you are working with databases, you could use SQL statements to perform this task, but for other kinds of input (XML, text, Excel), you will need another solution. Kettle provides the Merge Join step to join data coming from any kind of source. Let's assume that you are building a house and want to track and manage the costs of building it. Before starting, you prepared an Excel file with the estimated costs for the different parts of your house. Now, you are given a weekly file with the progress and the real costs. So, you want to compare both to see the progress. Getting ready To run this recipe, you will need two Excel files, one for the budget and another with the real costs. The budget.xls has the estimated starting date, estimated end date, and cost for the planned tasks. The costs.xls has the real starting date, end date, and cost for tasks that have already started. You can download the sample files from here. How to do it... Carry out the following steps: Create a new transformation. Drop two Excel input steps into the canvas. Use one step for reading the budget information (budget.xls file) and the other for reading the costs information (costs.xls file). Under the Fields tab of these steps, click on the Get fields from header row... button in order to populate the grid automatically. Apply the format dd/MM/yyyy to the fields of type Date and $0.00 to the fields with costs. Add a Merge Join step from the Join category, and create a hop from each Excel input step toward this step. The following diagram depicts what you have so far: Configure the Merge Join step, as shown in the following screenshot: If you do a preview on this step, you will obtain the result of the two Excel files merged. In order to have the columns more organized, add a Select values step from the Transform category. In this new step, select the fields in this order: task, starting date (est.), starting date, end date (est.), end date, cost (est.), cost. Doing a preview on the last step, you will obtain the merged data with the columns of both Excel files interspersed, as shown in the following screenshot: How it works... In the example, you saw how to use the Merge Join step to join data coming from two Excel files. You can use this step to join any other kind of input. In the Merge Join step, you set the name of the incoming steps, and the fields to use as the keys for joining them. In the recipe, you joined the streams by just a single field: the task field. The rows are expected to be sorted in an ascending manner on the specified key fields. There's more... In the example, you set the Join Type to LEFT OUTER JOIN. Let's see explanations of the possible join options: Interspersing new rows between existent rows In most Kettle datasets, all rows share a common meaning; they represent the same kind of entity, for example: In a dataset with sold items, each row has data about one item In a dataset with the mean temperature for a range of days in five different regions, each row has the mean temperature for a different day in one of those regions In a dataset with a list of people ordered by age range (0-10, 11-20, 20-40, and so on), each row has data about one person Sometimes, there is a need of interspersing new rows between your current rows. Taking the previous examples, imagine the following situations: In the sold items dataset, every 10 items, you have to insert a row with the running quantity of items and running sold price from the first line until that line. In the temperature's dataset, you have to order the data by region and the last row for each region has to have the average temperature for that region. In the people's dataset, for each age range, you have to insert a header row just before the rows of people in that range. In general, the rows you need to intersperse can have fixed data, subtotals of the numbers in previous rows, header to the rows coming next, and so on. What they have in common is that they have a different structure or meaning compared to the rows in your dataset. Interspersing these rows is not a complicated task, but is a tricky one. In this recipe, you will learn how to do it. Suppose that you have to create a list of products by category. For each category, you have to insert a header row with the category description and the number of products inside that category. The final result should be as follows: Getting ready This recipe uses an outdoor database with the structure shown in Appendix, Data Structures (Download here). As source, you can use a database like this or any other source, for example a text file with the same structure. How to do it... Carry out the following steps: Create a transformation, drag into the canvas a Table Input step, select the connection to the outdoor database, or create it if it doesn't exist. Then enter the following statement: SELECT category , desc_product FROM products p ,categories c WHERE p.id_category = c.id_category ORDER by category Do a preview of this step. You already have the product list! Now, you have to create and intersperse the header rows. In order to create the headers, do the following: From the Statistics category, add a Group by step and fill in the grids, as shown in the following screenshot: From the Scripting category, add a User Defined Java Expression step, and use it to add two fields: The first will be a String named desc_product, with value ("Category: " + category).toUpperCase(). The second will be an Integer field named order with value 1. Use a Select values step to reorder the fields as category, desc_product, qty_product, and order. Do a preview on this step; you should see the following result: Those are the headers. The next step is mixing all the rows in the proper order. Drag an Add constants step into the canvas and a Sort rows step. Link them to the other steps as shown: Use the Add constants to add two Integer fields: qty_prod and order. As Value, leave the first field empty, and type 2 for the second field. Use the Sort rows step for sorting by category, order, and desc_product. Select the last step and do a preview. You should see the rows exactly as shown in the introduction. How it works... When you have to intersperse rows between existing rows, there are just four main tasks to do, as follows: Create a secondary stream that will be used for creating new rows. In this case, the rows with the headers of the categories. In each stream, add a field that will help you intersperse rows in the proper order. In this case, the key field was named order. Before joining the two streams, add, remove, and reorder the fields in each stream to make sure that the output fields in each stream have the same metadata. Join the streams and sort by the fields that you consider appropriate, including the field created earlier. In this case, you sorted by category, inside each category by the field named order and finally by the products description. Note that in this case, you created a single secondary stream. You could create more if needed, for example, if you need a header and footer for each category.
Read more
  • 0
  • 0
  • 8270

article-image-pentaho-data-integration-4-understanding-data-flows
Packt
24 Jun 2011
12 min read
Save for later

Pentaho Data Integration 4: Understanding Data Flows

Packt
24 Jun 2011
12 min read
  Pentaho Data Integration 4 Cookbook Over 70 recipes to solve ETL problems using Pentaho Kettle         Read more about this book       This article by Adrián Sergio Pulvirenti and María Carina Roldán, authors of Pentaho Data Integration 4 Cookbook, focuses on the different ways for combining, splitting, or manipulating streams or flows of data using Kettle transformations. The main purpose of Kettle transformations is to manipulate data in the form of a dataset; this task is done by the steps of the transformation. In this article, we will cover: Splitting a stream into two or more streams based on a condition Merging rows from two streams with the same or different structure Comparing two streams and generating differences Generating all possible pairs formed from two datasets (For more resources on this subject, see here.) Introduction The main purpose of Kettle transformations is to manipulate data in the form of a dataset; this task is done by the steps of the transformation. When a transformation is launched, all its steps are started. During the execution, the steps work simultaneously reading rows from the incoming hops, processing them, and delivering them to the outgoing hops. When there are no more rows left, the execution of the transformation ends. The dataset that flows from step to step is not more than a set of rows all having the same structure or metadata. This means that all rows have the same number of columns, and the columns in all rows have the same type and name. Suppose that you have a single stream of data and that you apply the same transformations to all rows, that is, you have all steps connected in a row one after the other. In other words, you have the simplest of the transformations from the point of view of its structure. In this case, you don't have to worry much about the structure of your data stream, nor the origin or destination of the rows. The interesting part comes when you face other situations, for example: You want a step to start processing rows only after another given step has processed all rows You have more than one stream and you have to combine them into a single stream You have to inject rows in the middle of your stream and those rows don't have the same structure as the rows in your dataset With Kettle, you can actually do this, but you have to be careful because it's easy to end up doing wrong things and getting unexpected results or even worse: undesirable errors. With regard to the first example, it doesn't represent a default behavior due to the parallel nature of the transformations as explained earlier. There are two steps however, that might help, which are as follows: Blocking Step: This step blocks processing until all incoming rows have been processed. Block this step until steps finish: This step blocks processing until the selected steps finish. Both these steps are in the Flow category. This and the next article on Working with Complex Data Flows focuses on the other two examples and some similar use cases, by explaining the different ways for combining, splitting, or manipulating streams of data. Splitting a stream into two or more streams based on a condition In this recipe, you will learn to use the Filter rows step in order to split a single stream into different smaller streams. In the There's more section, you will also see alternative and more efficient ways for doing the same thing in different scenarios. Let's assume that you have a set of outdoor products in a text file, and you want to differentiate the tents from other kind of products, and also create a subclassification of the tents depending on their prices. Let's see a sample of this data: id_product,desc_product,price,category1,"Swedish Firesteel - Army Model",19,"kitchen"2,"Mountain House #10 Can Freeze-Dried Food",53,"kitchen"3,"Lodge Logic L9OG3 Pre-Seasoned 10-1/2-Inch RoundGriddle",14,"kitchen"... Getting ready To run this recipe, you will need a text file named outdoorProducts.txt with information about outdoor products. The file contains information about the category and price of each product. How to do it... Carry out the following steps: Create a transformation. Drag into the canvas a Text file input step and fill in the File tab to read the file named outdoorProducts.txt. If you are using the sample text file, type , as the Separator. Under the Fields tab, use the Get Fields button to populate the grid. Adjust the entries so that the grid looks like the one shown in the following screenshot: Now, let's add the steps to manage the flow of the rows. To do this, drag two Filter rows steps from the Flow category. Also, drag three Dummy steps that will represent the three resulting streams. Create the hops, as shown in the following screenshot. When you create the hops, make sure that you choose the options according to the image: Result is TRUE for creating a hop with a green icon, and Result is FALSE for creating a hop with a red icon in it. Double-click on the first Filter rows step and complete the condition, as shown in the following screenshot: Double-click on the second Filter rows step and complete the condition with price < 100. You have just split the original dataset into three groups. You can verify it by previewing each Dummy step. The first one has products whose category is not tents; the second one, the tents under 100 US$; and the last group, the expensive tents; those whose price is over 100 US$. The preview of the last Dummy step will show the following: How it works... The main objective in the recipe is to split a dataset with products depending on their category and price. To do this, you used the Filter rows step. In the Filter rows setting window, you tell Kettle where the data flows to depending on the result of evaluating a condition for each row. In order to do that, you have two list boxes: Send 'true' data to step and Send 'false' data to step. The destination steps can be set by using the hop properties as you did in the recipe. Alternatively, you can set them in the Filter rows setting dialog by selecting the name of the destination steps from the available drop-down lists. You also have to enter the condition. The condition has the following different parts: The upper textbox on the left is meant to negate the condition. The left textbox is meant to select the field that will be used for comparison. Then, you have a list of possible comparators to choose from. On the right, you have two textboxes: The upper textbox for comparing against a field and the bottom textbox for comparing against a constant value. Also, you can include more conditions by clicking on the Add Condition button on the right. If you right-click on a condition, a contextual menu appears to let you delete, edit, or move it. In the first Filter rows step of the recipe, you typed a simple condition: You compared a field (category) with a fixed value (tents) by using the equal (=) operator. You did this to separate the tents products from the others. The second filter had the purpose of differentiating the expensive and the cheap tents. There's more... You will find more filter features in the following subsections. Avoiding the use of Dummy steps In the recipe, we assumed that you wanted all three groups of products for further processing. Now, suppose that you only want the cheapest tents and you don't care about the rest. You could use just one Filter rows step with the condition category = tents AND price < 100, and send the 'false' data to a Dummy step, as in shown in the following diagram: The rows that don't meet the condition will end at the Dummy step. Although this is a very commonly used solution for keeping just the rows that meet the conditions, there is a simpler way to implement it. When you create the hop from the Filter rows toward the next step, you are asked for the kind of hop. If you choose Main output of step, the two options Send 'true' data to step and Send 'false' data to step will remain empty. This will cause two things: Only the rows that meet the condition will pass. The rest will be discarded. Comparing against the value of a Kettle variable The recipe above shows you how to configure the condition in the Filter rows step to compare a field against another field or a constant value, but what if you want to compare against the value of a Kettle variable? Let's assume, for example, you have a named parameter called categ with kitchen as Default Value. As you might know, named parameters are a particular kind of Kettle variable.   You create the named parameters under the Parameter tab from the Settings option of the Edit menu. To use this variable in a condition, you must add it to your dataset in advance. You do this as follows: Add a Get Variables step from the Job category. Put it in the stream after the Text file input step and before the Filter Rows step; use it to create a new field named categ of String type with the value ${categ} in the Variable column. Now, the transformation looks like the one shown in the following screenshot: After this, you can set the condition of the first Filter rows step to category = categ, selecting categ from the listbox of fields to the right. This way, you will be filtering the kitchen products. If you run the transformation and set the parameter to tents, you will obtain similar results to those that were obtained in the main recipe. Avoiding the use of nested Filter Rows steps Suppose that you want to compare a single field against a discrete and short list of possible values and do different things for each value in that list. In this case, you can use the Switch / Case step instead of nested Filter rows steps. Let's assume that you have to send the rows to different steps depending on the category. The best way to do this is with the Switch / Case step. This way you avoid adding one Filter row step for each category. In this step, you have to select the field to be used for comparing. You do it in the Field name to switch listbox. In the Case values grid, you set the Value—Target step pairs. The following screenshot shows how to fill in the grid for our particular problem: The following are some considerations about this step: You can have multiple values directed to the same target step You can leave the value column blank to specify a target step for empty values You have a listbox named Default target step to specify the target step for rows that do not match any of the case values You can only compare with an equal operator If you want to compare against a substring of the field, you could enable the Use string contains option and as Case Value, type the substring you are interested in. For example, if for Case Value, you type tent_ then all categories containing tent_ such as tent_large, tent_small, or best_tents will be redirected to the same target step. Overcoming the difficulties of complex conditions There will be situations where the condition is too complex to be expressed in a single Filter rows step. You can nest them and create temporary fields in order to solve the problem, but it would be more efficient if you used the Java Filter or User Defined Java Expression step as explained next. You can find the Java Filter step in the Flow category. The difference compared to the Filter Rows step is that in this step, you write the condition using a Java expression. The names of the listboxes—Destination step for matching rows (optional) and Destination step for non-matching rows (optional)—differ from the names in the Filter rows step, but their purpose is the same. As an example, the following are the conditions you used in the recipe rewritten as Java expressions: category.equals("tents") and price < 100. These are extremely simple, but you can write any Java expression as long as it evaluates to a Boolean result. If you can't guarantee that the category will not be null, you'd better invert the first expression and put "tents".equals(category) instead. By doing this, whenever you have to check if a field is equal to a constant, you avoid an unexpected Java error. Finally, suppose that you have to split the streams simply to set some fields and then join the streams again. For example, assume that you want to change the category as follows: Doing this with nested Filter rows steps leads to a transformation like the following: You can do the same thing in a simpler way: Replace all the steps but the Text file input with a User Defined Java Expression step located in the Scripting category. In the setting window of this step, add a row in order to replace the value of the category field: As New field and Replace value type category. As Value type select String. As Java expression, type the following: (category.equals("tents"))?(price<100?"cheap_tents":"expensive_tents"):category The preceding expression uses the Java ternary operator ?:. If you're not familiar with the syntax, think of it as shorthand for the if-then-else statement. For example, the inner expression price<100?"cheap_tents":"expensive_tents" means if (price<100) then return "cheap_tents" else return "expensive_tents". Do a preview on this step. You will see something similar to the following:
Read more
  • 0
  • 0
  • 15640

article-image-segmenting-images-opencv
Packt
21 Jun 2011
7 min read
Save for later

Segmenting images in OpenCV

Packt
21 Jun 2011
7 min read
  OpenCV 2 Computer Vision Application Programming Cookbook Over 50 recipes to master this library of programming functions for real-time computer vision         Read more about this book       OpenCV (Open Source Computer Vision) is an open source library containing more than 500 optimized algorithms for image and video analysis. Since its introduction in 1999, it has been largely adopted as the primary development tool by the community of researchers and developers in computer vision. OpenCV was originally developed at Intel by a team led by Gary Bradski as an initiative to advance research in vision and promote the development of rich, vision-based CPU-intensive applications. In the previous article by Robert Laganière, author of OpenCV 2 Computer Vision Application Programming Cookbook, we took a look at image processing using morphological filters. In this article we will see how to segment images using watersheds and GrabCut algorithm. (For more resources related to this subject, see here.) Segmenting images using watersheds The watershed transformation is a popular image processing algorithm that is used to quickly segment an image into homogenous regions. It relies on the idea that when the image is seen as a topological relief, homogeneous regions correspond to relatively flat basins delimitated by steep edges. As a result of its simplicity, the original version of this algorithm tends to over-segment the image which produces multiple small regions. This is why OpenCV proposes a variant of this algorithm that uses a set of predefined markers which guide the definition of the image segments. How to do it... The watershed segmentation is obtained through the use of the cv::watershed function. The input to this function is a 32-bit signed integer marker image in which each non-zero pixel represents a label. The idea is to mark some pixels of the image that are known to certainly belong to a given region. From this initial labeling, the watershed algorithm will determine the regions to which the other pixels belong. In this recipe, we will first create the marker image as a gray-level image, and then convert it into an image of integers. We conveniently encapsulated this step into a WatershedSegmenter class: class WatershedSegmenter { private: cv::Mat markers;public: void setMarkers(const cv::Mat& markerImage) { // Convert to image of ints markerImage.convertTo(markers,CV_32S); } cv::Mat process(const cv::Mat &image) { // Apply watershed cv::watershed(image,markers); return markers; } The way these markers are obtained depends on the application. For example, some preprocessing steps might have resulted in the identification of some pixels belonging to an object of interest. The watershed would then be used to delimitate the complete object from that initial detection. In this recipe, we will simply use the binary image used in the previous article (OpenCV: Image Processing using Morphological Filters) in order to identify the animals of the corresponding original image. Therefore, from our binary image, we need to identify pixels that certainly belong to the foreground (the animals) and pixels that certainly belong to the background (mainly the grass). Here, we will mark foreground pixels with label 255 and background pixels with label 128 (this choice is totally arbitrary, any label number other than 255 would work). The other pixels, that is the ones for which the labeling is unknown, are assigned value 0. As it is now, the binary image includes too many white pixels belonging to various parts of the image. We will then severely erode this image in order to retain only pixels belonging to the important objects: // Eliminate noise and smaller objectscv::Mat fg;cv::erode(binary,fg,cv::Mat(),cv::Point(-1,-1),6); The result is the following image: Note that a few pixels belonging to the background forest are still present. Let's simply keep them. Therefore, they will be considered to correspond to an object of interest. Similarly, we also select a few pixels of the background by a large dilation of the original binary image: // Identify image pixels without objectscv::Mat bg;cv::dilate(binary,bg,cv::Mat(),cv::Point(-1,-1),6);cv::threshold(bg,bg,1,128,cv::THRESH_BINARY_INV); The resulting black pixels correspond to background pixels. This is why the thresholding operation immediately after the dilation assigns to these pixels the value 128. The following image is then obtained: These images are combined to form the marker image: // Create markers imagecv::Mat markers(binary.size(),CV_8U,cv::Scalar(0));markers= fg+bg; Note how we used the overloaded operator+ here in order to combine the images. This is the image that will be used as input to the watershed algorithm: The segmentation is then obtained as follows: // Create watershed segmentation objectWatershedSegmenter segmenter;// Set markers and processsegmenter.setMarkers(markers);segmenter.process(image); The marker image is then updated such that each zero pixel is assigned one of the input labels, while the pixels belonging to the found boundaries have value -1. The resulting image of labels is then: The boundary image is: How it works... As we did in the preceding recipe, we will use the topological map analogy in the description of the watershed algorithm. In order to create a watershed segmentation, the idea is to progressively flood the image starting at level 0. As the level of "water" progressively increases (to levels 1, 2, 3, and so on), catchment basins are formed. The size of these basins also gradually increase and, consequently, the water of two different basins will eventually merge. When this happens, a watershed is created in order to keep the two basins separated. Once the level of water has reached its maximal level, the sets of these created basins and watersheds form the watershed segmentation. As one can expect, the flooding process initially creates many small individual basins. When all of these are merged, many watershed lines are created which results in an over-segmented image. To overcome this problem, a modification to this algorithm has been proposed in which the flooding process starts from a predefined set of marked pixels. The basins created from these markers are labeled in accordance with the values assigned to the initial marks. When two basins having the same label merge, no watersheds are created, thus preventing the oversegmentation. This is what happens when the cv::watershed function is called. The input marker image is updated to produce the final watershed segmentation. Users can input a marker image with any number of labels with pixels of unknown labeling left to value 0. The marker image has been chosen to be an image of a 32-bit signed integer in order to be able to define more than 255 labels. It also allows the special value -1, to be assigned to pixels associated with a watershed. This is what is returned by the cv::watershed function. To facilitate the displaying of the result, we have introduced two special methods. The first one returns an image of the labels (with watersheds at value 0). This is easily done through thresholding: // Return result in the form of an imagecv::Mat getSegmentation() { cv::Mat tmp; // all segment with label higher than 255 // will be assigned value 255 markers.convertTo(tmp,CV_8U); return tmp;} Similarly, the second method returns an image in which the watershed lines are assigned value 0, and the rest of the image is at 255. This time, the cv::convertTo method is used to achieve this result: // Return watershed in the form of an imagecv::Mat getWatersheds() { cv::Mat tmp; // Each pixel p is transformed into // 255p+255 before conversion markers.convertTo(tmp,CV_8U,255,255); return tmp;} The linear transformation that is applied before the conversion allows -1 pixels to be converted into 0 (since -1*255+255=0). Pixels with a value greater than 255 are assigned the value 255. This is due to the saturation operation that is applied when signed integers are converted into unsigned chars. See also The article The viscous watershed transform by C. Vachier, F. Meyer, Journal of Mathematical Imaging and Vision, volume 22, issue 2-3, May 2005, for more information on the watershed transform. The next recipe which presents another image segmentation algorithm that can also segment an image into background and foreground objects.
Read more
  • 0
  • 0
  • 6135

article-image-getting-started-opensso
Packt
13 Jun 2011
8 min read
Save for later

Getting Started with OpenSSO

Packt
13 Jun 2011
8 min read
OpenAM Written and tested with OpenAM Snapshot 9—the Single Sign-On (SSO) tool for securing your web applications in a fast and easy way History of OpenSSO Back in early 2000, Sun Microsystems Inc. started the Directory Server Access Management Edition project to develop a product that would solve the web Single Sign-On (SSO) problems. The initial scope was to just provide authentication and authorization as part of the SSO using a proprietary protocol. Over the years it evolved to become a lightweight pure Java application providing a comprehensive set of security features to protect the enterprise resources. After undergoing a lot of changes in terms of product name, features, among other things, it ended up with OpenSSO. As part of Sun Microsystems software business strategy, they have open sourced most of their commercial enterprise software products including Sun Java Enterprise System (JES), Java SDK, and Communication suite of products. OpenSSO is the term coined by the Sun's access management product team as part of their strategy to open source Sun Java System Access Manager, Sun Java System SAML v2 Plugin for Federation Services, and Sun Java System Federation Manager. OpenSSO is an integrated product that includes the features of both Access and Federation manager products. The goal was to amalgamate the access, federation, and web services features. This goal was achieved when Sun released the OpenSSO Enterprise 8.0 that received strong market traction and analysts coverage. Sun's OpenSSO product made it to the leadership quadrant in 2008 in the Access Management category which was published by the Gartner Analyst group. As part of the open source initiative, a lot of code re-factoring and feature alignments occurred that changed the product's outlook. It removed all the native dependencies that were required to build and deploy earlier. OpenSSO became the pure Java web application that enabled the customers and open source community to take advantage of the automatic deployment by just dropping the web archive on any Java servlet container to deploy it. The build and check-in processes were highly simplified which attracted the open source community to contribute to the code and quality of the product. OpenSSO had a very flexible release model wherein new features or bug fixes could easily be implemented in the customer environment by picking up the nightly, express, or enterprise build. OpenSSO Enterprise 8.0 was a major release by Sun that was built from the open source branch. After this release, there were two other express releases. Those were very feature-rich and introduced Secure Token Service (STS) and OAuth functionality. Express build 9 was not released in the binary form by Oracle but the source code has been made available to the open source community. You can download the OpenAM express build, built using the express build 9 branch from the Forgerock site. As part of the acquisition of Sun Microsystems Inc. by Oracle Corporation that happened back in early 2010, the release and support models have been changed for OpenSSO. If you are interested in obtaining a support contract for the enterprise version of the product, you should call up the Oracle support team or the product management team. Oracle continues its support for the OpenSSO enterprise's existing customers. For the OpenSSO open source version (also known as OpenAM) you can approach the Forgerock team to obtain support. OpenSSO vs. OpenAM OpenSSO was the only open source product in the access management segment that had production level quality. Over eight thousands test cases were executed on twelve different Java servlet containers. OpenSSO is supported by a vibrant community that includes engineers, architects, and solution developers. If you have any questions, just send a mail to users@opensso.dev.java.net, and you are likely get the answer to what you want to know. Recently Forgerock (http://www.forgerock.com) undertook an initiative to keep the community and product strong. They periodically fix the bugs in the open source branch. Their version of OpenSSO is called OpenAM, but the code base is the same as OpenSSO. There may be incompatibilities in future if OpenAM code base deviates a lot from the OpenSSO express build 9 code base. Note that the Oracle Open SSO Enterprise 8.0 update releases are based on the OpenSSO Enterprise release 8.0 code base, whereas the open source version OpenAM is continuing its development from the express build 9 code base. OpenSSO—an overview OpenSSO is a freely available feature-rich access management product; it can be downloaded from http://www.forgerock.com/openam.html. It integrates authentication and authorization services, SSO, and open standards-based federation protocols to provide SSO among disparate business domains, entitlement services, and web services security. Overall, customers will be able to build a comprehensive solution for protecting their network resources by preventing unauthorized access to web services, applications, web content, and securing identity data. OpenSSO offers a complete solution for securing both web applications and web services. You can enforce a comprehensive security policy for web applications and services across the enterprise, rather than relying on developers to come up with ad hoc ways to secure services as they develop them. OpenSSO is a pure Java application that makes it easy to deploy on any operating system platform or container as it supports a broad range of operating systems and servlet containers. OpenSSO services All the services provided by the OpenSSO are exposed over HTTP protocol. The clients access them using appropriate interfaces. OpenSSO exposes a rich library of Application Programming Interfaces (APIs) and Service Provider Interfaces (SPIs) using which, customers can achieve the desired functionality. These services developed for OpenSSO generally contain both a server component and a client component. The server component is a simple Java servlet developed to receive XML requests and return XML responses. The opensso.war web application encompasses all the services and associated configuration items that are required to deliver the OpenSSO functionality. The client component is provided as Java API, and in some cases, C API. This allows remote applications and other OpenSSO services to communicate with and consume the particular functionality. Each core service uses its own framework to retrieve customer and service data and to provide it to other OpenSSO services. The OpenSSO framework integrates all of these service frameworks to form a layer that is accessible to all product components and plugins as shown in the following diagram: There are certain core services that are not covered due to the scope of this article. Just to make you aware of the breadth of features provided by the OpenSSO, in the next few sections, some of the prominent features that are not covered will be briefly introduced. Federation services Typically, in the web access management the Single Sign-On happens in the same company, within the same Domain Name Service (DNS) domain. Most of the time this will work for small companies or in B2C type scenarios, whereas in a B2B scenario use of a DNS domain-based SSO will not work as the cookie will not be forwarded to the other DNS domains. Besides, there are privacy and security concerns to perform SSO across multiple businesses using this approach. So how do we solve these kinds of problems where customers want to seamlessly sign on to services even though the services are provided by a third party? Federation is the solution. So, what is federation? Federation is a process that establishes a standards-based method for sharing and managing identity data and establishing a Single Sign-On across security domains and organizations. It allows an organization to offer a variety of external services to trusted business partners, as well as corporate services to internal departments and divisions. Forming trust relationships across security domains allows an organization to integrate applications offered by different departments or divisions within the enterprise, as well as engage in relationships with co-operating business partners that offer complementary services. Towards the federation or solving SSO across multiple domains, multiple industry standards, such as those developed by the Organization for the Advancement of Structured Information Standards (OASIS) and the Liberty Alliance Project), are supported. OpenSSO provides an open and extensible framework for identity federation and associated web services that resolves the problems of identity-enabling web services, web service discovery and invocation, security, and privacy. Federation services are built on the following standards: Liberty Alliance Project Identity Federation Framework (Liberty ID-FF) 1.1 and 1.2 OASIS Security Assertion Markup Language (SAML) 1.0 and 1.1 OASIS Security Assertion Markup Language (SAML) 2.0 WS-Federation (Passive Requestor Profile) SAML 2.0 is becoming the de facto standard for the federation SAML 2.0 is becoming the de facto standard for the federation SSO as many of the vendors and service providers support SAML 2.0 protocol. For instance Google Apps and Salesforce support SAML 2.0 as their choice of protocol for SSO.
Read more
  • 0
  • 0
  • 5781
article-image-learn-computer-vision-applications-open-cv
Packt
07 Jun 2011
16 min read
Save for later

Learn computer vision applications in Open CV

Packt
07 Jun 2011
16 min read
  OpenCV 2 Computer Vision Application Programming Cookbook Over 50 recipes to master this library of programming functions for real-time computer vision         Read more about this book       OpenCV (Open Source Computer Vision) is an open source library containing more than 500 optimized algorithms for image and video analysis. Since its introduction in 1999, it has been largely adopted as the primary development tool by the community of researchers and developers in computer vision. OpenCV was originally developed at Intel by a team led by Gary Bradski as an initiative to advance research in vision and promote the development of rich, vision-based CPU-intensive applications. In this article by Robert Laganière, author of OpenCV 2 Computer Vision Application Programming Cookbook, we will cover: Calibrating a camera Computing the fundamental matrix of an image pair Matching images using random sample consensus Computing a homography between two images (For more resources related to the article, see here.) Introduction Images are generally produced using a digital camera that captures a scene by projecting light onto an image sensor going through its lens. The fact that an image is formed through the projection of a 3D scene onto a 2D plane imposes the existence of important relations between a scene and its image, and between different images of the same scene. Projective geometry is the tool that is used to describe and characterize, in mathematical terms, the process of image formation. In this article, you will learn some of the fundamental projective relations that exist in multi-view imagery and how these can be used in computer vision programming. But before we start the recipes, let's explore the basic concepts related to scene projection and image formation. Image formation Fundamentally, the process used to produce images has not changed since the beginning of photography. The light coming from an observed scene is captured by a camera through a frontal aperture and the captured light rays hit an image plane (or image sensor) located on the back of the camera. Additionally, a lens is used to concentrate the rays coming from the different scene elements. This process is illustrated by the following figure: Here, do is the distance from the lens to the observed object, di is the distance from the lens to the image plane, and f is the focal length of the lens. These quantities are related by the so-called thin lens equation: In computer vision, this camera model can be simplified in a number of ways. First, we can neglect the effect of the lens by considering a camera with an infinitesimal aperture since, in theory, this does not change the image. Only the central ray is therefore considered. Second, since most of the time we have do>>di, we can assume that the image plane is located at the focal distance. Finally, we can notice from the geometry of the system, that the image on the plane is inverted. We can obtain an identical but upright image by simply positioning the image plane in front of the lens. Obviously, this is not physically feasible, but from a mathematical point of view, this is completely equivalent. This simplified model is often referred to as the pin-hole camera model and it is represented as follows: From this model, and using the law of similar triangles, we can easily derive the basic projective equation: The size (hi) of the image of an object (of height ho) is therefore inversely proportional to its distance (do) from the camera which is naturally true. This relation allows the position of the image of a 3D scene point to be predicted onto the image plane of a camera. Calibrating a camera From the introduction of this article, we learned that the essential parameters of a camera under the pin-hole model are its focal length and the size of the image plane (which defines the field of view of the camera). Also, since we are dealing with digital images, the number of pixels on the image plane is another important characteristic of a camera. Finally, in order to be able to compute the position of an image's scene point in pixel coordinates, we need one additional piece of information. Considering the line coming from the focal point that is orthogonal to the image plane, we need to know at which pixel position this line pierces the image plane. This point is called the principal point. It could be logical to assume that this principal point is at the center of the image plane, but in practice, this one might be off by few pixels depending at which precision the camera has been manufactured. Camera calibration is the process by which the different camera parameters are obtained. One can obviously use the specifications provided by the camera manufacturer, but for some tasks, such as 3D reconstruction, these specifications are not accurate enough. Camera calibration will proceed by showing known patterns to the camera and analyzing the obtained images. An optimization process will then determine the optimal parameter values that explain the observations. This is a complex process but made easy by the availability of OpenCV calibration functions. How to do it... To calibrate a camera, the idea is show to this camera a set of scene points for which their 3D position is known. You must then determine where on the image these points project. Obviously, for accurate results, we need to observe several of these points. One way to achieve this would be to take one picture of a scene with many known 3D points. A more convenient way would be to take several images from different viewpoints of a set of some 3D points. This approach is simpler but requires computing the position of each camera view, in addition to the computation of the internal camera parameters which fortunately is feasible. OpenCV proposes to use a chessboard pattern to generate the set of 3D scene points required for calibration. This pattern creates points at the corners of each square, and since this pattern is flat, we can freely assume that the board is located at Z=0 with the X and Y axes well aligned with the grid. In this case, the calibration process simply consists of showing the chessboard pattern to the camera from different viewpoints. Here is one example of a calibration pattern image: The nice thing is that OpenCV has a function that automatically detects the corners of this chessboard pattern. You simply provide an image and the size of the chessboard used (number of vertical and horizontal inner corner points). The function will return the position of these chessboard corners on the image. If the function fails to find the pattern, then it simply returns false: // output vectors of image pointsstd::vector<cv::Point2f> imageCorners;// number of corners on the chessboardcv::Size boardSize(6,4);// Get the chessboard cornersbool found = cv::findChessboardCorners(image, boardSize, imageCorners); Note that this function accepts additional parameters if one needs to tune the algorithm, which are not discussed here. There is also a function that draws the detected corners on the chessboard image with lines connecting them in sequence: //Draw the cornerscv::drawChessboardCorners(image, boardSize, imageCorners, found); // corners have been found The image obtained is seen here: The lines connecting the points shows the order in which the points are listed in the vector of detected points. Now to calibrate the camera, we need to input a set of such image points together with the coordinate of the corresponding 3D points. Let's encapsulate the calibration process in a CameraCalibrator class: class CameraCalibrator { // input points: // the points in world coordinates std::vector<std::vector<cv::Point3f>> objectPoints; // the point positions in pixels std::vector<std::vector<cv::Point2f>> imagePoints; // output Matrices cv::Mat cameraMatrix; cv::Mat distCoeffs; // flag to specify how calibration is done int flag; // used in image undistortion cv::Mat map1,map2; bool mustInitUndistort; public: CameraCalibrator() : flag(0), mustInitUndistort(true) {}; As mentioned previously, the 3D coordinates of the points on the chessboard pattern can be easily determined if we conveniently place the reference frame on the board. The method that accomplishes this takes a vector of the chessboard image filename as input: // Open chessboard images and extract corner pointsint CameraCalibrator::addChessboardPoints( const std::vector<std::string>& filelist, cv::Size & boardSize) { // the points on the chessboard std::vector<cv::Point2f> imageCorners; std::vector<cv::Point3f> objectCorners; // 3D Scene Points: // Initialize the chessboard corners // in the chessboard reference frame // The corners are at 3D location (X,Y,Z)= (i,j,0) for (int i=0; i<boardSize.height; i++) { for (int j=0; j<boardSize.width; j++) { objectCorners.push_back(cv::Point3f(i, j, 0.0f)); } } // 2D Image points: cv::Mat image; // to contain chessboard image int successes = 0; // for all viewpoints for (int i=0; i<filelist.size(); i++) { // Open the image image = cv::imread(filelist[i],0); // Get the chessboard corners bool found = cv::findChessboardCorners( image, boardSize, imageCorners); // Get subpixel accuracy on the corners cv::cornerSubPix(image, imageCorners, cv::Size(5,5), cv::Size(-1,-1), cv::TermCriteria(cv::TermCriteria::MAX_ITER + cv::TermCriteria::EPS, 30, // max number of iterations 0.1)); // min accuracy //If we have a good board, add it to our data if (imageCorners.size() == boardSize.area()) { // Add image and scene points from one view addPoints(imageCorners, objectCorners); successes++; } } return successes;} The first loop inputs the 3D coordinates of the chessboard, which are specified in an arbitrary square size unit here. The corresponding image points are the ones provided by the cv::findChessboardCorners function. This is done for all available viewpoints. Moreover, in order to obtain a more accurate image point location, the function cv::cornerSubPix can be used and as the name suggests, the image points will then be localized at sub-pixel accuracy. The termination criterion that is specified by the cv::TermCriteria object defines a maximum number of iterations and a minimum accuracy in sub-pixel coordinates. The first of these two conditions that is reached will stop the corner refinement process. When a set of chessboard corners has been successfully detected, these points are added to our vector of image and scene points: // Add scene points and corresponding image pointsvoid CameraCalibrator::addPoints(const std::vector<cv::Point2f>&imageCorners, const std::vector<cv::Point3f>& objectCorners) { // 2D image points from one view imagePoints.push_back(imageCorners); // corresponding 3D scene points objectPoints.push_back(objectCorners);} The vectors contains std::vector instances. Indeed, each vector element being a vector of points from one view. Once a sufficient number of chessboard images have been processed (and consequently a large number of 3D scene point/2D image point correspondences are available), we can initiate the computation of the calibration parameters: // Calibrate the camera// returns the re-projection errordouble CameraCalibrator::calibrate(cv::Size &imageSize){ // undistorter must be reinitialized mustInitUndistort= true; //Output rotations and translations std::vector<cv::Mat> rvecs, tvecs; // start calibration return calibrateCamera(objectPoints, // the 3D points imagePoints, // the image points imageSize, // image size cameraMatrix,// output camera matrix distCoeffs, // output distortion matrix rvecs, tvecs,// Rs, Ts flag); // set options} In practice, 10 to 20 chessboard images are sufficient, but these must be taken from different viewpoints at different depths. The two important outputs of this function are the camera matrix and the distortion parameters. The camera matrix will be described in the next section. For now, let's consider the distortion parameters. So far, we have mentioned that with the pin-hole camera model, we can neglect the effect of the lens. But this is only possible if the lens used to capture an image does not introduce too important optical distortions. Unfortunately, this is often the case with lenses of lower quality or with lenses having a very short focal length. You may have already noticed that in the image we used for our example, the chessboard pattern shown is clearly distorted. The edges of the rectangular board being curved in the image. It can also be noticed that this distortion becomes more important as we move far from the center of the image. This is a typical distortion observed with fish-eye lens and it is called radial distortion. The lenses that are used in common digital cameras do not exhibit such a high degree of distortion, but in the case of the lens used here, these distortions cannot certainly be ignored. It is possible to compensate for these deformations by introducing an appropriate model. The idea is to represent the distortions induced by a lens by a set of mathematical equations. Once established, these equations can then be reverted in order to undo the distortions visible on the image. Fortunately, the exact parameters of the transformation that will correct the distortions can be obtained together with the other camera parameter during the calibration phase. Once this is done, any image from the newly calibrated camera can be undistorted: // remove distortion in an image (after calibration)cv::Mat CameraCalibrator::remap(const cv::Mat &image) { cv::Mat undistorted; if (mustInitUndistort) { // called once per calibration cv::initUndistortRectifyMap( cameraMatrix, // computed camera matrix distCoeffs, // computed distortion matrix cv::Mat(), // optional rectification (none) cv::Mat(), // camera matrix to generate undistorted image.size(), // size of undistorted CV_32FC1, // type of output map map1, map2); // the x and y mapping functions mustInitUndistort= false; } // Apply mapping functions cv::remap(image, undistorted, map1, map2, cv::INTER_LINEAR); // interpolation type return undistorted;} Which results in the following image: As you can see, once the image is undistorted, we obtain a regular perspective image. How it works... In order to explain the result of the calibration, we need to go back to the figure in the introduction which describes the pin-hole camera model. More specifically, we want to demonstrate the relation between a point in 3D at position (X,Y,Z) and its image (x,y) on a camera specified in pixel coordinates. Let's redraw this figure by adding a reference frame that we position at the center of the projection as seen here: Note that the Y-axis is pointing downward to get a coordinate system compatible with the usual convention that places the image origin at the upper-left corner. We learned previously that the point (X,Y,Z) will be projected onto the image plane at (fX/Z,fY/Z). Now, if we want to translate this coordinate into pixels, we need to divide the 2D image position by, respectively, the pixel width (px) and height (py). We notice that by dividing the focal length f given in world units (most often meters or millimeters) by px, then we obtain the focal length expressed in (horizontal) pixels. Let's then define this term as fx. Similarly, fy =f/py is defined as the focal length expressed in vertical pixel unit. The complete projective equation is therefore: Recall that (u0,v0) is the principal point that is added to the result in order to move the origin to the upper-left corner of the image. These equations can be rewritten in matrix form through the introduction of homogeneous coordinates in which 2D points are represented by 3-vectors, and 3D points represented by 4-vectors (the extra coordinate is simply an arbitrary scale factor that need to be removed when a 2D coordinate needs to be extracted from a homogeneous 3-vector). Here is the projective equation rewritten: The second matrix is a simple projection matrix. The first matrix includes all of the camera parameters which are called the intrinsic parameters of the camera. This 3x3 matrix is one of the output matrices returned by the cv::calibrateCamera function. There is also a function called cv::calibrationMatrixValues that returns the value of the intrinsic parameters given a calibration matrix. More generally, when the reference frame is not at the projection center of the camera, we will need to add a rotation (a 3x3 matrix) and a translation vector (3x1 matrix). These two matrices describe the rigid transformation that must be applied to the 3D points in order to bring them back to the camera reference frame. Therefore, we can rewrite the projection equation in its most general form: Remember that in our calibration example, the reference frame was placed on the chessboard. Therefore, there is a rigid transformation (rotation and translation) that must be computed for each view. These are in the output parameter list of the cv::calibrateCamera function. The rotation and translation components are often called the extrinsic parameters of the calibration and they are different for each view. The intrinsic parameters remain constant for a given camera/lens system. The intrinsic parameters of our test camera obtained from a calibration based on 20 chessboard images are fx=167, fy=178, u0=156, v0=119. These results are obtained by cv::calibrateCamera through an optimization process aimed at finding the intrinsic and extrinsic parameters that will minimize the difference between the predicted image point position, as computed from the projection of the 3D scene points, and the actual image point position, as observed on the image. The sum of this difference for all points specified during the calibration is called the re-projection error. To correct the distortion, OpenCV uses a polynomial function that is applied to the image point in order to move them at their undistorted position. By default, 5 coefficients are used; a model made of 8 coefficients is also available. Once these coefficients are obtained, it is possible to compute 2 mapping functions (one for the x coordinate and one for the y) that will give the new undistorted position of an image point on a distorted image. This is computed by the function cv::initUndistortRectifyMap and the function cv::remap remaps all of the points of an input image to a new image. Note that because of the non-linear transformation, some pixels of the input image now fall outside the boundary of the output image. You can expand the size of the output image to compensate for this loss of pixels, but you will now obtain output pixels that have no values in the input image (they will then be displayed as black pixels). There's more... When a good estimate of the camera intrinsic parameters are known, it could be advantageous to input them to the cv::calibrateCamera function. They will then be used as initial values in the optimization process. To do so, you just need to add the flag CV_CALIB_USE_INTRINSIC_GUESS and input these values in the calibration matrix parameter. It is also possible to impose a fixed value for the principal point (CV_CALIB_ FIX_PRINCIPAL_POINT), which can often be assumed to be the central pixel. You can also impose a fixed ratio for the focal lengths fx and fy (CV_CALIB_FIX_RATIO) in which case you assume pixels of square shape.
Read more
  • 0
  • 1
  • 7175

article-image-exporting-sap-businessobjects-dashboards-different-environments
Packt
02 Jun 2011
7 min read
Save for later

Exporting SAP BusinessObjects Dashboards into Different Environments

Packt
02 Jun 2011
7 min read
  SAP BusinessObjects Dashboards 4.0 Cookbook Introduction First, the visual model is compiled to a SWF file format. Compiling to a SWF file format ensures that the dashboard plays smoothly on different screen sizes and across different platforms. It also ensures that the users aren't given huge 10+ megabyte files. After compilation of the visual model to a SWF file, developers can then publish it to a format of their choice. The following are the available choices—Flash (SWF), AIR, SAP BusinessObjects Platform, HTML, PDF, PPT, Outlook, and Word. Once publishing is complete, the dashboard is ready to share! Exporting to a standard SWF, PPT, PDF, and so on After developing a Visual Model on Dashboard Design, we will need to somehow share it with users. We want to put it into a format that everyone can see on their machines. The simplest way is to export to a standard SWF file. One of the great features Dashboard Design has is to be able to embed dashboards into different office file formats. For example, a presenter could have a PowerPoint deck, and in the middle of the presentation, have a working dashboard that presents an important set of data values to the audience. Another example could be an executive level user who is viewing a Word document created by an analyst. The analyst could create a written document in Word and then embed a working dashboard with the most updated data to present important data values to the executive level user. You can choose to embed a dashboard in the following file types: PowerPoint Word PDF Outlook HTML Getting ready Make sure your visual model is complete and ready for sharing. How to do it... In the menu toolbar, go to File | Export | Flash (SWF). Select the directory in which you want the SWF to go to and the name of your SWF file How it works... Xcelsius compiles the visual model into an SWF file that everyone is able to see. Once the SWF file has been compiled, the dashboard will then be ready for sharing. It is mandatory that anyone viewing the dashboard have Adobe Flash installed. If not, they can download and install it from http://www.adobe.com/products/flashplayer/. If we export to PPT, we can then edit the PowerPoint file however we desire. If you have an existing PowerPoint presentation deck and want to append the dashboard to it, the easiest way is to first embed the dashboard SWF to a temporary PowerPoint file and then copy that slide to your existing PowerPoint file. There's more... Exporting to an SWF file makes it very easy for distribution, thus making the presentation of mockups great at a business level. Developers are able to work very closely with the business and iteratively come up with a visual model closest to the business goals. It is important though, when distributing SWF files, that everyone viewing the dashboards has the same version, otherwise confusion may occur. Thus, as a best practice, versioning every SWF that is distributed is very important. It is important to note that when the much anticipated Adobe Flash 10.1 was released, there were problems with embedding Dashboard Design dashboards in DOC, PPT, PDF, and so on. However, with the 10.1.82.76 Adobe Flash Player update, this has been fixed. Thus, it is important that if users have Adobe Flash Player 10.1+ installed, the version is higher than or equal to 10.1.82.76. When exporting to PDF, please take the following into account: In Dashboard Design 2008, the default format for exporting to PDF is Acrobat 9.0 (PDF 1.8). If Acrobat Reader 8.0 is installed, the default exported PDF cannot be opened. If using Acrobat Reader 8.0 or older, change the format to "Acrobat 6.0 (PDF 1.5)" before exporting to PDF. Exporting to SAP Business Objects Enterprise After Dashboard Design became a part of BusinessObjects, it was important to be able to export dashboards into the BusinessObjects Enterprise system. Once a dashboard is exported to BusinessObjects Enterprise, users can then easily access their dashboards through InfoView (now BI launch pad). On top of that, administrators are able control dashboard security. Getting ready Make sure your visual model is complete and ready for sharing. How to do it... From the menu toolbar, go to File | Export | Export to SAP BusinessObjects Platform. Enter your BusinessObjects login credentials and then select the location in the SAP BusinessObjects Enterprise system, where you want to store the SWF file, as shown in the following screenshot: Log into BI launch pad (formerly known as InfoView) and verify that you can access the dashboard. (Move the mouse over the image to enlarge.) How it works... When we export a dashboard to SAP BusinessObjects Enterprise, we basically place it in the SAP BusinessObjects Enterprise content management system. From there, we can control accessibility to the dashboard and make sure that we have one source of truth instead of sending out multiple dashboards through e-mail and possibly getting mixed up with what is the latest version. When we log into BI launch pad (formerly known as Infoview), it also passes the login token to the dashboard, so we don't have to enter our credentials again when connecting to SAP BusinessObjects Enterprise data. This is important because we don't have to manually create and pass any additional tokens once we have logged in. There's more... To give a true website type feel, developers can house their dashboards in a website type format using Dashboard Builder. This in turn provides a better experience for users, as they don't have to navigate through many folders in order to access the dashboard that they are looking for. Publishing to SAP BW This recipe shows you how to publish Dashboard Design dashboards to a SAP BW system. Once a dashboard is saved to the SAP BW system, it can be published within a SAP Enterprise Portal iView and made available for the users. Getting ready For this recipe, you will need an Dashboard Design dashboard model. This dashboard does not necessarily have to include a data connection to SAP BW. How to do it... Select Publish in the SAP menu. If you want to save the Xcelsius model with a different name, select the Publish As... option. If you are not yet connected to the SAP BW system, a pop up will appear. Select the appropriate system and fill in your username and password in the dialog box. If you want to disconnect from the SAP BW system and connect to a different system, select the Disconnect option from the SAP menu. Enter the Description and Technical Name of the dashboard. Select the location you want to save the dashboard to and click on Save. The dashboard is now published to the SAP BW system. To launch the dashboard and view it from the SAP BW environment, select the Launch option from the SAP menu. You will be asked to log in to the SAP BW system before you are able to view the dashboard. How it works... As we have seen in this recipe, the publishing of an Dashboard Design dashboard to SAP BW is quite straightforward. As the dashboard is part of the SAP BW environment after publishing, the model can be transported between SAP BW systems like all other SAP BW objects. There is more... After launching step 5, the Dashboard Design dashboard will load in your browser from the SAP BW server. You can add the displayed URL to an SAP Enterprise Portal iView to make the dashboard accessible for portal users.  
Read more
  • 0
  • 0
  • 7527

article-image-sap-businessobjects-customizing-dashboard
Packt
27 May 2011
4 min read
Save for later

SAP BusinessObjects: Customizing the Dashboard

Packt
27 May 2011
4 min read
SAP BusinessObjects Dashboards 4.0 Cookbook Over 90 simple and incredibly effective recipes for transforming your business data into exciting dashboards with SAP BusinessObjects Dashboards 4.0 Xcelsius Introduction In this article, we will go through certain techniques on how you can utilize the different cosmetic features Dashboard Design provides, in order to improve the look of your dashboard. Dashboard Design provides a powerful way to capture the audience versus other dashboard tools. It allows developers to build dashboards with the important 'wow' factor that other tools lack. Let's take, for example, two dashboards that have the exact same functionality, placement of charts, and others. However, one dashboard looks much more attractive than the other. In general, people looking at the nicer looking dashboard will be more interested and thus get more value of the data that comes out of it. Thus, not only does Dashboard Design provide a powerful and flexible way of presenting data, but it also provides the 'wow' factor to capture a user's interest. Changing the look of a chart This recipe will run through changing the look of a chart. Particularly, it will go through each tab in the appearance icon of the chart properties. We will then make modifications and see the resulting changes. Getting ready Insert a chart object onto the canvas. Prepare some data and bind it to the chart. How to do it... Double-click/right-click on the chart object on the canvas/object properties window to go into Chart Properties. In the Layout tab, uncheck Show Chart Background. (Move the mouse over the image to enlarge.) In the Series tab, click on the colored square box circled in the next screenshot to change the color of the bar to your desired color. Then change the width of each bar; click on the Marker Size area and change it to 35. Click on the colored boxes circled in red in the Axes tab and choose dark blue to modify the horizontal and vertical axes separately. Uncheck Show Minor Gridlines at the bottom so that we remove all the horizontal lines in between each of the major gridlines. Next, go to the Text and Color tabs, where you can make changes to all the different text areas of the chart. How it works... As you can see, the default chart looks plain and the bars are skinny so it's harder to visualize things. It is a good idea to remove the chart background if there is an underlying background so that the chart blends in better. In addition, the changes to the chart colors and text provide additional aesthetics that help improve the look of the chart. Adding a background to your dashboard This recipe shows the usefulness of backgrounds in the dashboard. It will show how backgrounds can help provide additional depth to objects and help to group certain areas together for better visualization. Getting ready Make sure you have all your objects such as charts and selectors ready on the canvas. Here's an example of the two charts before the makeover. Bind some data to the charts if you want to change the coloring of the series How to do it... Choose Background4 from the Art and Backgrounds tab of the Components window. Stretch the background so that it fills the size of the canvas. Make sure that ordering of the backgrounds is before the charts. To change the ordering of the background, go to the object browser, select the background object and then press the "-" key until the background object is behind the chart. Select Background1 from the Art and Backgrounds tab and put two of them under the charts, as shown in the following screenshot: When the backgrounds are in the proper place, open the properties window for the backgrounds and set the background color to your desired color. In this example we picked turquoise blue for each background. How it works... As you can see with the before and after pictures, having backgrounds can make a huge difference in terms of aesthetics. The objects are much more pleasant to look at now and there is certainly a lot of depth with the charts. The best way to choose the right backgrounds that fit your dashboard is to play around with the different background objects and their colors. If you are not very artistic, you can come up with a bunch of examples and demonstrate it to the business user to see which one they prefer the most.  
Read more
  • 0
  • 0
  • 7021
article-image-opencv-image-processing-using-morphological-filters
Packt
25 May 2011
6 min read
Save for later

OpenCV: Image Processing using Morphological Filters

Packt
25 May 2011
6 min read
  OpenCV 2 Computer Vision Application Programming Cookbook Over 50 recipes to master this library of programming functions for real-time computer vision         Read more about this book       Morphological filtering is a theory developed in the 1960s for the analysis and processing of discrete images. It defines a series of operators which transform an image by probing it with a predefined shape element. The way this shape element intersects the neighborhood of a pixel determines the result of the operation. This article presents the most important morphological operators. It also explores the problem of image segmentation using algorithms working on the image morphology. Eroding and dilating images using morphological filters Erosion and dilation are the most fundamental morphological operators. Therefore, we will present them in this first recipe. The fundamental instrument in mathematical morphology is the structuring element. A structuring element is simply defined as a configuration of pixels (a shape) on which an origin is defined (also called anchor point). Applying a morphological filter consists of probing each pixel of the image using this structuring element. When the origin of the structuring element is aligned with a given pixel, its intersection with the image defines a set of pixels on which a particular morphological operation is applied. In principle, the structuring element can be of any shape, but most often, a simple shape such as a square, circle, or diamond with the origin at the center is used (mainly for efficiency reasons). Getting ready As morphological filters usually work on binary images, we will use a binary image produced through thresholding. However, since in morphology, the convention is to have foreground objects represented by high (white) pixel values and background by low (black) pixel values, we have negated the image. How to do it... Erosion and dilation are implemented in OpenCV as simple functions which are cv::erode and cv::dilate. Their use is straightforward: // Read input imagecv::Mat image= cv::imread("binary.bmp");// Erode the imagecv::Mat eroded; // the destination imagecv::erode(image,eroded,cv::Mat());// Display the eroded imagecv::namedWindow("Eroded Image");");cv::imshow("Eroded Image",eroded);// Dilate the imagecv::Mat dilated; // the destination imagecv::dilate(image,dilated,cv::Mat());// Display the dilated imagecv::namedWindow("Dilated Image");cv::imshow("Dilated Image",dilated); The two images produced by these function calls are seen in the following screenshot. Erosion is shown first: Followed by the dilation result: How it works... As with all other morphological filters, the two filters of this recipe operate on the set of pixels (or neighborhood) around each pixel, as defined by the structuring element. Recall that when applied to a given pixel, the anchor point of the structuring element is aligned with this pixel location, and all pixels intersecting the structuring element are included in the current set. Erosion replaces the current pixel with the minimum pixel value found in the defined pixel set. Dilation is the complementary operator, and it replaces the current pixel with the maximum pixel value found in the defined pixel set. Since the input binary image contains only black (0) and white (255) pixels, each pixel is replaced by either a white or black pixel. A good way to picture the effect of these two operators is to think in terms of background (black) and foreground (white) objects. With erosion, if the structuring element when placed at a given pixel location touches the background (that is, one of the pixels in the intersecting set is black), then this pixel will be sent to background. While in the case of dilation, if the structuring element on a background pixel touches a foreground object, then this pixel will be assigned a white value. This explains why in the eroded image, the size of the objects has been reduced. Observe how some of the very small objects (that can be considered as "noisy" background pixels) have also been completely eliminated. Similarly, the dilated objects are now larger and some of the "holes" inside of them have been filled. By default, OpenCV uses a 3x3 square structuring element. This default structuring element is obtained when an empty matrix (that is cv::Mat()) is specified as the third argument in the function call, as it was done in the preceding example. You can also specify a structuring element of the size (and shape) you want by providing a matrix in which the non-zero element defines the structuring element. In the following example, a 7x7 structuring element is applied: cv::Mat element(7,7,CV_8U,cv::Scalar(1));cv::erode(image,eroded,element); The effect is obviously much more destructive in this case as seen here: Another way to obtain the same result is to repetitively apply the same structuring element on an image. The two functions have an optional parameter to specify the number of repetitions: // Erode the image 3 times.cv::erode(image,eroded,cv::Mat(),cv::Point(-1,-1),3); The origin argument cv::Point(-1,-1) means that the origin is at the center of the matrix (default), and it can be defined anywhere on the structuring element. The image obtained will be identical to the one we obtained with the 7x7 structuring element. Indeed, eroding an image twice is like eroding an image with a structuring element dilated with itself. This also applies to dilation. Finally, since the notion of background/foreground is arbitrary, we can make the following observation (which is a fundamental property of the erosion/dilation operators). Eroding foreground objects with a structuring element can be seen as a dilation of the background part of the image. Or more formally: The erosion of an image is equivalent to the complement of the dilation of the complement image. The dilation of an image is equivalent to the complement of the erosion of the complement image. There's more... It is important to note that even if we applied our morphological filters on binary images here, these can also be applied on gray-level images with the same definitions. Also note that the OpenCV morphological functions support in-place processing. This means you can use the input image as the destination image. So you can write: cv::erode(image,image,cv::Mat()); OpenCV creates the required temporary image for you for this to work properly.
Read more
  • 0
  • 0
  • 11793

article-image-sphinx-search-faqs
Packt
23 May 2011
9 min read
Save for later

Sphinx Search FAQs

Packt
23 May 2011
9 min read
Got questions on Sphinx, the open source search engine? Not sure if it's the right tool for you? You're in the right place - we've put together an FAQ on Sphinx. It should help you make the right decision about the software that powers your search. If you've got questions on the other kind of Sphinx, we recommend you look here instead. What is Sphinx? Sphinx is a full-text search engine (generally standalone) which provides fast, relevant, efficient full-text search functionality to third-party applications. It was especially created to facilitate searches on SQL databases and integrates very well with scripting languages; such as PHP, Python, Perl, Ruby, and Java. What are the major features of Sphinx? Some of the major features of Sphinx include: High indexing speed (up to 10 MB/sec on modern CPUs) High search speed (average query is under 0.1 sec on 2 to 4 GB of text collection) High scalability (up to 100 GB of text, up to 100 Million documents on a single CPU) Supports distributed searching (since v.0.9.6) Supports MySQL (MyISAM and InnoDB tables are both supported) and PostgreSQL natively Supports phrase searching Supports phrase proximity ranking, providing good relevance Supports English and Russian stemming Supports any number of document fields (weights can be changed on the fly) Supports document groups Supports stopwords, that is, that it indexes only what's most relevant from a given list of words Supports different search modes ("match extended", "match all", "match phrase" and "match any" as of v.0.9.5) Generic XML interface which greatly simplifies custom integration Pure-PHP (that is, NO module compiling and so on) search client API Which operating systems does Sphinx run on? Sphinx was developed and tested mostly on UNIX based systems. All modern UNIX based operating systems with an ANSI compliant compiler should be able to compile and run Sphinx without any issues. However, Sphinx has also been found running on the following operating systems without any issues. Linux (Kernel 2.4.x and 2.6.x of various distributions) Microsoft Windows 2000 and XP FreeBSD 4.x, 5.x, 6.x NetBSD 1.6, 3.0 Solaris 9, 11 Mac OS X What does the configure command do? The configure command gets the details of our machine and also checks for all dependencies. If any of the dependency is missing, it will throw an error. Which are the various options for the configure command? There are many options that can be passed to the configure command but we will take a look at a few important ones: prefix=/path: This option specifies the path to install the sphinx binaries. with-mysql=/path: Sphinx needs to know where to find MySQL's include and library files. It auto-detects this most of the time but if for any reason it fails, you can supply the path here. with-pgsql=/path: Same as –-with-mysql but for PostgreSQL. What is full-text search? Full-text search is one of the techniques for searching a document or database stored on a computer. While searching, the search engine goes through and examines all of the words stored in the document and tries to match the search query against those words. A complete examination of all the words (text) stored in the document is undertaken and hence it is called a full-text search. Full-text search excels in searching large volumes of unstructured text quickly and effectively. It returns pages based on how well they match the user's query. What are the advantages of full-text search? The following points are some of the major advantages of full-text search: It is quicker than traditional searches as it benefits from an index of words that is used to look up records instead of doing a full table scan It gives results that can be sorted by relevance to the searched phrase or term, with sophisticated ranking capabilities to find the best documents or records It performs very well on huge databases with millions of records It skips the common words such as the, an, for, and so on When should you use full-text search? You should use full-text search when: When there is a high volume of free-form text data to be searched When there is a need for highly optimized search results When there is a demand for flexible search querying Why use Sphinx for full-text search? If you're looking for a good Database Management System (DBMS), there are plenty of options available with support for full-text indexing and searches, such as MySQL, PostgreSQL, and SQL Server. There are also external full-text search engines, such as Lucene and Solr. Let's see the advantages of using Sphinx over the DBMS's full-text searching capabilities and other external search engines: It has a higher indexing speed. It is 50 to 100 times faster than MySQL FULLTEXT and 4 to 10 times faster than other external search engines. It also has higher searching speed since it depends heavily on the mode, Boolean vs. phrase, and additional processing. It is up to 500 times faster than MySQL FULLTEXT in cases involving a large result set with GROUP BY. It is more than two times faster in searching than other external search engines available. Relevancy is among the key features one expects when using a search engine, and Sphinx performs very well in this area. It has phrase-based ranking in addition to classic statistical BM25 ranking. Last but not the least, Sphinx has better scalability. It can be scaled vertically (utilizing many CPUs, many HDDs) or horizontally (utilizing many servers), and this comes out of the box with Sphinx. One of the biggest known Sphinx cluster has over 3 billion records with more than 2 terabytes of size. What are indexes? Indexes in Sphinx are a bit different from indexes we have in databases. The data that Sphinx indexes is a set of structured documents and each document has the same set of fields. This is very similar to SQL, where each row in the table corresponds to a document and each column to a field. Sphinx builds a special data structure that is optimized for answering full-text search queries. This structure is called an index and the process of creating an index from the data is called indexing. The indexes in Sphinx can also contain attributes that are highly optimized for filtering. These attributes are not full-text indexed and do not contribute to matching. However, they are very useful at filtering out the results we want based on attribute values. There can be different types of indexes suited for different tasks. The index type, which has been implemented in Sphinx, is designed for maximum indexing and searching speed. What are multi-value attributes (MVA)? MVAs are a special type of attribute in Sphinx that make it possible to attach multiple values to every document. These attributes are especially useful in cases where each document can have multiple values for the same property (field). How does weighting help? Weighting decides which document gets priority over other documents and appear at the top. In Sphinx, weighting depends on the search mode. Weight can also be referred to as ranking. There are two major parts which are used in weighting functions: Phrase rank: This is based on the length of Longest Common Subsequence (LCS) of search words between document body and query phrase. This means that the documents in which the queried phrase matches perfectly will have a higher phrase rank and the weight would be equal to the query word counts. Statistical rank: This is based on BM25 function which takes only the frequency of the queried words into account. So, if a word appears only one time in the whole document then its weight will be low. On the other hand if a word appears a lot in the document then its weight will be higher. The BM25 weight is a floating point number between 0 and 1. What is index merging, exactly? Index merging is more efficient than indexing the data from scratch, that is, all over again. In this technique we define a delta index in the Sphinx configuration file. The delta index always gets the new data to be indexed. However, the main index acts as an archive and holds data that never changes. What is SphinxQL? Programmers normally issue search queries using one or more client libraries that relate to the database on which the search is to be performed. Some programmers may also find it easier to write an SQL query than to use the Sphinx Client API library. SphinxQL is used to issue search queries in the form of SQL queries. These queries can be fired from any client of the database in question, and returns the results in the way that a normal query would. Currently MySQL binary network protocol is supported and this enables Sphinx to be accessed with the regular MySQL API. What do you mean by Geo-distance search? In a Geo-distance search, you can find geo coordinates nearby to the base anchor point. Thus you can use this technique to find the nearby places to the given location. It can be useful in many applications like hotel search, property search, restaurant search, tourist destination search etc. Sphinx makes it very easy to perform a geo-distance search by providing an API method wherein you can set the anchor point (if you have latitude and longitude in your index) and all searches performed thereafter will return the results with a magic attribute "@geodist" holding the values of distance from the anchor point. You can then filter or sort your results based on this attribute.   Further resources on this subject: Sphinx: Index Searching [Article] Getting Started with Sphinx Search [Article] Search Engine Optimization in Joomla! [Article] Blogger: Improving Your Blog with Google Analytics and Search Engine Optimization [Article] Drupal 6 Search Engine Optimization [Book]
Read more
  • 0
  • 0
  • 6671
Modal Close icon
Modal Close icon