Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-organizing-clarifying-and-communicating-r-data-analyses
Packt
29 Oct 2010
8 min read
Save for later

Organizing, Clarifying and Communicating the R Data Analyses

Packt
29 Oct 2010
8 min read
  Statistical Analysis with R Take control of your data and produce superior statistical analysis with R. An easy introduction for people who are new to R, with plenty of strong examples for you to work through This book will take you on a journey to learn R as the strategist for an ancient Chinese kingdom! A step by step guide to understand R, its benefits, and how to use it to maximize the impact of your data analysis A practical guide to conduct and communicate your data analysis with R in the most effective manner           Read more about this book       (For more resources on R, see here.) Retracing and refining a complete analysis For demonstration purposes, it will be assumed that a fire attack was chosen as the optimal battle strategy. Throughout this segment, we will retrace the steps that lead us to this decision. Meanwhile, we will make sure to organize and clarify our analyses so they can be easily communicated to others. Suppose we determined our fire attack will take place 225 miles away in Anding, which houses 10,000 Wei soldiers. We will deploy 2,500 soldiers for a period of 7 days and assume that they are able to successfully execute the plans. Let us return to the beginning to develop this strategy with R in a clear and concise manner. Time for action – first steps To begin our analysis, we must first launch R and set our working directory: Launch R. The R console will be displayed. Set your R working directory using the setwd(dir) function. The following code is a hypothetical example. Your working directory should be a relevant location on your own computer. > #set the R working directory using setwd(dir)> setwd("/Users/johnmquick/rBeginnersGuide/") Verify that your working directory has been set to the proper location using the getwd() command : > #verify the location of your working directory> getwd()[1] "/Users/johnmquick/rBeginnersGuide/" What just happened? We prepared R to begin our analysis by launching the soft ware and setting our working directory. At this point, you should be very comfortable completing these steps. Time for action – data setup Next, we need to import our battle data into R and isolate the portion pertaining to past fire attacks: Copy the battleHistory.csv file into your R working directory. This file contains data from 120 previous battles between the Shu and Wei forces. Read the contents of battleHistory.csv into an R variable named battleHistory using the read.table(...) command: > #read the contents of battleHistory.csv into an R variable> #battleHistory contains data from 120 previous battlesbetween the Shu and Wei forces> battleHistory <- read.table("battleHistory.csv", TRUE, ",") Create a subset using the subset(data, ...) function and save it to a new variable named subsetFire: > #use the subset(data, ...) function to create a subset ofthe battleHistory dataset that contains data only from battlesin which the fire attack strategy was employed> subsetFire <- subset(battleHistory, battleHistory$Method =="fire") Verify the contents of the new subset. Note that the console should return 30 rows, all of which contain fire in the Method column: > #display the fire attack data subset> subsetFire What just happened? We imported our dataset and then created a subset containing our fire attack data. However, we used a slightly different function, called read.table(...), to import our external data into R. read.table(...) U p to this point, we have always used the read.csv() function to import data into R. However, you should know that there are oft en many ways to accomplish the same objectives in R. For instance, read.table(...) is a generic data import function that can handle a variety of file types. While it accepts several arguments, the following three are required to properly import a CSV file, like the one containing our battle history data: file: t he name of the file to be imported, along with its extension, in quotes header: whether or not the file contains column headings; TRUE for yes, FALSE (default) for no sep: t he character used to separate values in the file, in quotes Using these arguments, we were able to import the data in our battleHistory.csv into R. Since our file contained headings, we used a value of TRUE for the header argument and because it is a comma-separated values file, we used "," for our sep argument: > battleHistory <- read.table("battleHistory.csv", TRUE, ",") This is just one example of how a different technique can be used to achieve a similar outcome in R. We will continue to explore new methods in our upcoming activities. Pop quiz Suppose you wanted to import the following dataset, named newData into R. Which of the following read.table(...) functions would be best to use? 4,55,96,12 read.table("newData", FALSE, ",") read.table("newData", TRUE, ",") read.table("newData.csv", FALSE, ",") read.table("newData.csv", TRUE, ",") Time for action – data exploration To begin our analysis, we will examine the summary statistics and correlations of our data. These will give us an overview of the data and inform our subsequent analyses: Generate a summary of the fire attack subset using summary(object): > #generate a summary of the fire subset> summaryFire <- summary(subsetFire)> #display the summary> summaryFire Before calculating correlations, we will have to convert our nonnumeric data from the Method, SuccessfullyExecuted, and Result columns into numeric form. Re code the Method column using as.numeric(data): > #represent categorical data numerically usingas.numeric(data)> #recode the Method column into Fire = 1> numericMethodFire <- as.numeric(subsetFire$Method) - 1 Recode the SuccessfullyExecuted column using as.numeric(data): > #recode the SuccessfullyExecuted column into N = 0 and Y = 1> numericExecutionFire <-as.numeric(subsetFire$SuccessfullyExecuted) - 1 Recode the Result column using as.numeric(data): > #recode the Result column into Defeat = 0 and Victory = 1> numericResultFire <- as.numeric(subsetFire$Result) - 1 With the Method, SuccessfullyExecuted, and Result columns coded into numeric form, let us now add them back into our fire dataset. Save the data in our recoded variables back into the original dataset: > #save the data in the numeric Method, SuccessfullyExecuted,and Result columns back into the fire attack dataset> subsetFire$Method <- numericMethodFire> subsetFire$SuccessfullyExecuted <- numericExecutionFire> subsetFire$Result <- numericResultFire Display the numeric version of the fire attack subset. Notice that all of the columns now contain numeric data; it will look like the following: Having replaced our original text values in the SuccessfullyExecuted and Result columns with numeric data, we can now calculate all of the correlations in the dataset using the cor(data) function: > #use cor(data) to calculate all of the correlations in thefire attack dataset> cor(subsetFire) Note that the error message and NA values in our correlation output result from the fact that our Method column contains only a single value. This is irrelevant to our analysis and can be ignored. What just happened? Initially, we calculated summary statistics for our fire attack dataset using the summary(object) function. From this information, we can derive the following useful insights about our past battles: The rating of the Shu army's performance in fire attacks has ranged from 10 to 100, with a mean of 45 Fire attack plans have been successfully executed 10 out of 30 times (33%) Fire attacks have resulted in victory 8 out of 30 times (27%) Successfully executed fire attacks have resulted in victory 8 out of 10 times (80%), while unsuccessful attacks have never resulted in victory The number of Shu soldiers engaged in fire attacks has ranged from 100 to 10,000 with a mean of 2,052 The number of Wei soldiers engaged in fire attacks has ranged from 1,500 to 50,000 with a mean of 12,333 The duration of fire attacks has ranged from 1 to 14 days with a mean of 7 Next, we recoded the text values in our dataset's Method, SuccessfullyExecuted, and Result columns into numeric form. Aft er adding the data from these variables back into our our original dataset, we were able to calculate all of its correlations. This allowed us to learn even more about our past battle data: The performance rating of a fire attack has been highly correlated with successful execution of the battle plans (0.92) and the battle's result (0.90), but not strongly correlated with the other variables. The execution of a fire attack has been moderately negatively correlated with the duration of the attack, such that a longer attack leads to a lesser chance of success (-0.46). The numbers of Shu and Wei soldiers engaged are highly correlated with each other (0.74), but not strongly correlated with the other variables. The insights gleaned from our summary statistics and correlations put us in a prime position to begin developing our regression model. Pop quiz Which of the following is a benefit of adding a text variable back into its original dataset aft er it has been recoded into numeric form? Calculation functions can be executed on the recoded variable. Calculation functions can be executed on the other variables in the dataset. Calculation functions can be executed on the entire dataset. There is no benefit.
Read more
  • 0
  • 0
  • 3229

article-image-testing-and-debugging-windows-workflow-foundation-40-wf-program
Packt
29 Sep 2010
2 min read
Save for later

Testing and Debugging Windows Workflow Foundation 4.0 (WF) Program

Packt
29 Sep 2010
2 min read
  Microsoft Windows Workflow Foundation 4.0 Cookbook Over 70 recipes with hands-on, ready to implement solutions for authoring workflows Customize Windows Workflow 4.0 applications to suit your needs A hands-on guide with real-world illustrations, screenshots, and step-by-step instructions Explore various functions that you can perform using WF 4.0 with running code examples A hands-on guide with real-world illustrations, screenshots, and step-by-step instructions Read more about this book (For more resources on this subject, see here.) Testing a WF program with unit test framework In this task, we will create a Test Project to do unit test for WF program. How to do it... Add a Test Project to the solution: Add a Test Project to Chapter01 solution and name the project as UnitTestForWFProgram as shown in the following screenshot: Add a workflow file to the Test Project:Add a workflow activity to this project. Right-click the newly created Test Project, then go to Add | New Items... | Workflow | Activity and name the activity as WorkflowForTest.xaml. In the opening WF designer, create an OutArgument as OutMessage. Next, drag an Assign activity to Designer panel and assign string "Test Message" to the OutMessage argument as shown in the the following screenshot: In WF4, workflow is actually an Activity class. We could see "Workflow" as a conception from a macroeconomic viewpoint, while consider "Activity" as a development concept. Create unit test code:Open UnitTest1.cs file and fill the file with following code: code 33 Run it:Set UnitTestForWorkflow as Startup project. Press Ctrl+F5 to build and run the test without debugging as shown in the following screenshot: How it works... In the preceding code snippet, [TestClass] indicates it is a unit test class, whereas [TestMethod] indicates a test method. When the Test Project runs, the test method will be executed automatically. There's more... In real application development, we can also create a separate Unit Test project and add a reference to the target project.
Read more
  • 0
  • 0
  • 3225

article-image-say-hi-tableau
Packt
21 Dec 2016
9 min read
Save for later

Say Hi to Tableau

Packt
21 Dec 2016
9 min read
In this article by Shweta Savale, the author of the book Tableau Cookbook- Recipes for Data Visualization, we will cover how you need to install My Tableau Repository and connecting to the sample data source. (For more resources related to this topic, see here.) Introduction to My Tableau Repository and connecting to the sample data source Tableau is a very versatile tool and it is used across various industries, businesses, and organizations, such as government and non-profit organizations, BFSI sector, consulting, construction, education, healthcare, manufacturing, retail, FMCG, software and technology, telecommunications, and many more. The good thing about Tableau is that it is industry and business vertical agnostic, and hence as long as we have data, we can analyze and visualize it. Tableau can connect to a wide variety of data sources and many of the data sources are implemented as native connections in Tableau. This ensures that the connections are as robust as possible. In order to view the comprehensive list of data sources that Tableau connects to, we can visit the technical specification page on the Tableau website by clicking on the following link: http://www.tableau.com/products/desktop?qt-product_tableau_desktop=1#qt-product_tableau_desktops. Getting ready Tableau provides some sample datasets with the Desktop edition. In this article, we will frequently be using the sample datasets that have been provided by Tableau. We can find these datasets in the Data sources directory in the My Tableau Repository folder, which gets created in our Documents folder when Tableau Desktop is installed on our machine. We can look for these data sources in the repository or we can quickly download it from the link mentioned and save it in a new folder called Tableau Cookbook data under Documents/My Tableau Repository/Datasources. The link for downloading the sample datasets is as follows: https://1drv.ms/f/s!Av5QCoyLTBpngihFyZaH55JpI5BN There are two files that have been uploaded. They are as follows: Microsoft Excel data called Sample - Superstore.xls Microsoft Access data called Sample - Coffee Chain.mdb In the following section, we will see how to connect to the sample data source. We will be connecting to the Excel data called Sample - Superstore.xls. This Excel file contains transactional data for a retail store. There are three worksheets in this Excel workbook. The first sheet, which is called the Orders sheet, contains the transaction details; the Returns sheet contains the status of returned orders, and the People sheet contains the region names and the names of managers associated with those regions. Refer to the following screenshot to get a glimpse of how the Excel data is structured: Now that we have taken a look at the Excel data, let us see how to connect to this Excel data in the following recipe. To begin with, we will work on the Orders sheet of the Sample - Superstore.xls data. This worksheet contains the order details in terms of the products purchased, the name of the customer, Sales, Profits, Discounts offered, day of purchase, order shipment date, among many other transactional details. How to do it… Let’s open Tableau Desktop by double-clicking on the Tableau 10.0 icon on our Desktop. We can also right-click on the icon and select Open. We will see the start page of Tableau, as shown in the following screenshot: We will select the Excel option from under the Connect header on the left-hand side of the screen. Once we do that, we will have to browse the Excel file called Sample - Superstore.xls, which is saved in Documents/My Tableau Repository/Datasources/Tableau Cookbook data. Once we are able to establish a connection to the referred Excel file, we will get a view as shown in the following screenshot: Annotation 1 in the preceding screenshot is the data that we have connected to, and annotation 2 is the list of worksheets/tables/views in our data. Double-click on the Orders sheet or drag and drop the Orders sheet from the left-hand side section into the blank space that says Drag sheets here. Refer to annotation 3 in the preceding screenshot. Once we have selected the Orders sheet, we will get to see the preview of our data, as highlighted in annotation 1 in the following screenshot. We will see the column headers, their data type (#, Abc, and so on), and the individual rows of data: While connecting to a data source, we can also read data from multiple tables/sheets from that data source. However, this is something that we will explore a little later. Further moving ahead, we will need to specify what type of connection we wish to maintain with the data source. Do we wish to connect to our data directly and maintain a Live connectivity with it, or do we wish to import the data into Tableau's data engine by creating an Extract? Refer to annotation 2 in the preceding screenshot. We will understand these options in detail in the next section. However, to begin with, we will select the Live option. Next, in order to get to our Tableau workspace where we can start building our visualizations, we will click on the Go to Worksheet option/ Sheet 1. Refer to annotation 3 in the preceding screenshot. This is how we can connect to data in Tableau. In case we have a database to connect to, then we can select the relevant data source from the list and fill in the necessary information in terms of server name, username, password, and so on. Refer to the following screenshot to see what options we get when we connect to Microsoft SQL Server: How it works… Before we connect to any data, we need to make sure that our data is clean and in the right format. The Excel file that we connected to was stored in a tabular format where the first row of the sheet contains all the column headers and every other row is basically a single transaction in the data. This is the ideal data structure for making the best use of Tableau. Typically, when we connect to databases, we would get columnar/tabular data. However, flat files such as Excel can have data even in cross-tab formats. Although Tableau can read cross-tab data, we may face certain limitations in terms of options for viewing, aggregating, and slicing and dicing our data in Tableau. Having said that, there may be situations where we have to deal with such cross-tab or pre-formatted Excel files. These files will essentially need cleaning up before we pull into Tableau. Refer to the following article to understand more about how we can clean up these files and make them Tableau ready: http://onlinehelp.tableau.com/current/pro/desktop/en-us/help.htm#data_tips.html In case it is a cross-tab file, then we will have to pivot it into normalized columns either at the data level or on the fly at Tableau level. We can do so by selecting multiple columns that we wish to pivot and then selecting the Pivot option from the dropdown that appears when we hover over any of the columns. Refer to the following screenshot: If the format of the data in our Excel file is not suitable for analysis in Tableau, then we can turn on the Data Interpreter option, which becomes available when Tableau detects any unique formatting or any extra information in our Excel file. For example, the Excel data may include some empty rows and columns, or extra headers and footers. Refer to the following screenshot: Data Interpreter can remove that extra information to help prepare our Tableau data source for analysis. Refer to the following screenshot: When we enable the Data Interpreter, the preceding view will change to what is shown in the following screenshot: This is how the Data Interpreter works in Tableau. Now many a times, there may also be situations where our data fields are compounded or clubbed in a single column. Refer to the following screenshot: In the preceding screenshot, the highlighted column is basically a concatenated field that has the Country, City, and State. For our analysis, we may want to break these and analyze each geographic level separately. In order to do so, we simply need to use the Split or Custom Split…option in Tableau. Refer to the following screenshot: Once we do that, our view would be as shown in the following screenshot: When preparing data for analysis, at times a list of fields may be easy to consume as against the preview of our data. The Metadata grid in Tableau allows us to do the same along with many other quick functions such as renaming fields, hiding columns, changing data types, changing aliases, creating calculations, splitting fields, merging fields, and also pivoting the data. Refer to the following screenshot: After having established the initial connectivity by pointing to the right data source, we need to specify as to how we wish to maintain that connectivity. We can choose between the Live option and Extract option. The Live option helps us connect to our data directly and maintains a live connection with the data source. Using this option allows Tableau to leverage the capabilities of our data source and in this case, the speed of our data source will determine the performance of our analysis. The Extract option on the other hand, helps us import the entire data source into Tableau's fast data engine as an extract. This option basically creates a .tde file, which stands for Tableau Data Extract. In case we wish to extract only a subset of our data, then we can select the Edit option, as highlighted in the following screenshot. The Add link in the right corner helps us add filters while fetching the data into Tableau. Refer to the following screenshot: A point to remember about Extract is that it is a snapshot of our data stored in a Tableau proprietary format and as opposed to a Live connection, the changes in the original data won't be reflected in our dashboard unless and until the extract is updated. Please note that we will have to decide between Live and Extract on a case to case basis. Please refer to the following article for more clarity: http://www.tableausoftware.com/learn/whitepapers/memory-or-live-data Summary This article thus helps us to install and connect to sample data sources which is very helpful to create effective dashboards in business environment for statistical purpose. Resources for Article: Further resources on this subject: Getting Started with Tableau Public [article] Data Modelling Challenges [article] Creating your first heat map in R [article]
Read more
  • 0
  • 0
  • 3225

article-image-ejb-31-controlling-security-programmatically-using-jaas
Packt
17 Jun 2011
5 min read
Save for later

EJB 3.1: Controlling Security Programmatically Using JAAS

Packt
17 Jun 2011
5 min read
  EJB 3.1 Cookbook Build real world EJB solutions with a collection of simple but incredibly effective recipes The reader is advised to refer the initial two recipies from the previous article on the process of handling security using annotations. Getting ready Programmatic security is affected by adding code within methods to determine who the caller is and then allowing certain actions to be performed based on their capabilities. There are two EJBContext interface methods available to support this type of security: getCallerPrincipal and isCallerInRole. The SessionContext object implements the EJBContext interface. The SessionContext's getCallerPrincipal method returns a Principal object which can be used to get the name or other attributes of the user. The isCallerInRole method takes a string representing a role and returns a Boolean value indicating whether the caller of the method is a member of the role or not. The steps for controlling security programmatically involve: Injecting a SessionContext instance Using either of the above two methods to effect security How to do it... To demonstrate these two methods we will modify the SecurityServlet to use the VoucherManager's approve method and then augment the approve method with code using these methods. First modify the SecurityServlet try block to use the following code. We create a voucher as usual and then follow with a call to the submit and approve methods. out.println("<html>"); out.println("<head>"); out.println("<title>Servlet SecurityServlet</title>"); out.println("</head>"); out.println("<body>"); voucherManager.createVoucher("Susan Billings", "SanFrancisco", BigDecimal.valueOf(2150.75)); voucherManager.submit(); boolean voucherApproved = voucherManager.approve(); if(voucherApproved) { out.println("<h3>Voucher was approved</h3>"); } else { out.println("<h3>Voucher was not approved</h3>"); } out.println("<h3>Voucher name: " + voucherManager.getName() + "</h3>"); out.println("</body>"); out.println("</html>"); Next, modify the VoucherManager EJB by injecting a SessionContext object using the @Resource annotation. public class VoucherManager { ... @Resource private SessionContext sessionContext; Let's look at the getCallerPrincipal method first. This method returns a Principal object (java.security.Principal) which has only one method of immediate interest: getName. This method returns the name of the principal. Modify the approve method so it uses the SessionContext object to get the Principal and then determines if the name of the principal is "mary" or not. If it is, then approve the voucher. public boolean approve() { Principal principal = sessionContext.getCallerPrincipal(); System.out.println("Principal: " + principal.getName()); if("mary".equals(principal.getName())) { voucher.setApproved(true); System.out.println("approve method returned true"); return true; } else { System.out.println("approve method returned false"); return false; } } Execute the SecurityApplication using "mary" as the user. The application should approve the voucher with the output as shown in the following screenshot: Execute the application again with a user of "sally". This execution will result in an exception. INFO: Access exception The getCallerPrincipal method simply returns the principal. This frequently results in the need to explicitly include the name of a user in code. The hard coding of user names is not recommended. Checking against each individual user can be time consuming. It is more efficient to check to see if a user is in a role. The isCallerInRole method allows us to determine whether the user is in a particular role or not. It returns a Boolean value indicating whether the user is in the role specified by the method's string argument. Rewrite the approve method to call the isCallerInRole method and pass the string "manager" to it. If the return value returns true, approve the voucher. public boolean approve() { if(sessionContext.isCallerInRole("manager")) { voucher.setApproved(true); System.out.println("approve method returned true"); return true; } else { System.out.println("approve method returned false"); return false; } } Execute the application using both "mary" and "sally". The results of the application should be the same as the previous example where the getCallerPrincipal method was used. How it works... The SessionContext class was used to obtain either a Principal object or to determine whether a user was in a particular role or not. This required the injection of a SessionContext instance and adding code to determine if the user was permitted to perform certain actions. This approach resulted in more code than the declarative approach. However, it provided more flexibility in controlling access to the application. These techniques provided the developer with choices as to how to best meet the needs of the application. There's more... It is possible to take different actions depending on the user's role using the isCallerInRole method. Let's assume we are using programmatic security with multiple roles. @DeclareRoles ({"employee", "manager","auditor"}) We can use a validateAllowance method to accept a travel allowance amount and determine whether it is appropriate based on the role of the user. public boolean validateAllowance(BigDecimal allowance) { if(sessionContext.isCallerInRole("manager")) { if(allowance.compareTo(BigDecimal.valueOf(2500)) <= 0) { return true; } else { return false; } } else if(sessionContext.isCallerInRole("employee")) { if(allowance.compareTo(BigDecimal.valueOf(1500)) <= 0) { return true; } else { return false; } } else if(sessionContext.isCallerInRole("auditor")) { if(allowance.compareTo(BigDecimal.valueOf(1000)) <= 0) { return true; } else { return false; } } else { return false; } } The compareTo method compares two BigDecimal values and returns one of three values: -1 – If the first number is less than the second number 0 – If the first and second numbers are equal 1 – If the first number is greater than the second number The valueOf static method converts a number to a BigDecimal value. The value is then compared to allowance. Summary This article covered programmatic EJB security based upon the Java Authentication and Authorization Service (JAAS) API. Further resources on this subject: EJB 3.1: Introduction to Interceptors [Article] EJB 3.1: Working with Interceptors [Article] Hands-on Tutorial on EJB 3.1 Security [Article] EJB 3 Entities [Article] Developing an EJB 3.0 entity in WebLogic Server [Article] Building an EJB 3.0 Persistence Model with Oracle JDeveloper [Article] NetBeans IDE 7: Building an EJB Application [Article]
Read more
  • 0
  • 0
  • 3223

article-image-working-templates-apache-roller-40
Packt
28 Dec 2009
5 min read
Save for later

Working with Templates in Apache Roller 4.0

Packt
28 Dec 2009
5 min read
Your first template In essence, a theme is a set of templates, and a template is composed of HTML and Velocity code. You can make your own templates to access your weblog's data and show this to your visitors in any way you want. Creating and editing templates In Apache Roller, you can create, edit, or delete templates via the Frontpage: Templates page. Let's see how to use this wonderful tool to create and edit your own templates! Time for action – creating your first template In this exercise, you'll learn to create and edit your first custom template via Roller's admin interface: Open your web browser, log into Roller, and go to the Templates page, under the Design tab: On the Add a new template panel, type mytemplate in the Name field, leave the default custom value in the Action field, and click on the Add button: The mytemplate template you've just created will show up in the templates list: Now click on the mytemplate link under the Name field, to open the mytemplate file for editing: Leave the mytemplate value for the Name field, type mytemplate in the Link field, and type My First Template in Apache Roller! in the Description field: Then replace the <html><body></body></html> line with the following HTML code: <html><body>Welcome to my blog, <b>$model.weblog.name</b> </br>This is my first template </br>My weblog's absolute URL is: <b>$url.absoluteSite</b> </br></body></html> This is shown in the following screenshot: Scroll down the page and click on the Save button to apply the changes to your new template. Roller will show the Template updated successfully message inside a green box to confirm that your changes were saved: Now click on the [launch] link under the Link field to open a new tab in your web browser and see your template in action: You can close this tab now, but leave the Frontpage: Templates window open for the next exercise. What just happened? Now you know how to create your own templates! Although the previous example is very simple, you can use it as a starting point to create very complex templates. As I said before, templates are composed of HTML and Velocity code. The template we wrote in the previous exercise uses a few basic HTML elements, or tags: HTML Tag Definition Tip <html> , </html> Defines the start/end of an HTML document. You must write this tags at the beginning/end of each Roller template. <body>, </body> Defines the start/end of an HTML document's body. All the code you will write for your templates must go between the <body> and </body> tags. <b>, </b> Shows text in bold. Example: <b>Hello</b> shows up as Hello </br> Indicates a line break. Example: Hello</br>World shows up as Hello World Also, there are some elements from the Velocity Template Language, along with an example from the previous exercise:   Velocity Element Definition Example $model.weblog.name Shows the name of your weblog. <b>$model.weblog.name</b> shows up as Ibacsoft's Weblog $url.absoluteSite Shows the absolute URL of your weblog <b>$url.absoluteSite</b> shows up as http://alromero.no-ip.org/roller   These are just some of the basic HTML tags and Velocity elements you'll learn to use for your templates. In the following sections, we'll see some more, along with elements from the Velocity Template Language. The Velocity template language All templates in Roller use HTML tags, along with Velocity code. In the next subsections, you'll learn about some of the most widely used Velocity elements in your Roller templates. Using Velocity macros in your Roller weblog A macro in Velocity is a set of instructions that generate HTML code based on data from your weblog. They are very helpful when you need to do the same task more than once. In the following exercise, you'll learn to use some macros included in Roller in order to show your weblog data to your visitors. Time for action – showing your weblog's blogroll and most recent entries Now you will use the Velocity Template Language to show your weblog's bookmarks (blogroll) in your custom template, along with the most recent entries: Go to your custom template editing page, and type the following code just above the </body></html> line: </br>These are my favorite Web sites: </br>#set($rootFolder = $model.weblog.getBookmarkFolder("/"))#showBookmarkLinksList($rootFolder false false)
Read more
  • 0
  • 0
  • 3222

article-image-anatomy-typo3-extension
Packt
14 Oct 2009
8 min read
Save for later

Anatomy of TYPO3 Extension

Packt
14 Oct 2009
8 min read
TYPO3 Extension Categories All TYPO3 extensions are classified into several predefined categories. These categories do not actually differentiate the extensions. They are more like hints for users about extension functionality. Often, it is difficult for the developer to decide which category an extension should belong to. The same extension can provide PHP code that fits into many categories. An extension can contain Frontend (FE) plugins, Backend (BE) modules, static data, and services, all at once. While it is not always the best solution to make such a monster extension, sometimes it is necessary. In this case, the extension author should choose the category that best fits the extension's purpose. For example, if an extension provides a reservation system for website visitors, it is probably FE related, even if it includes a BE module for viewing registrations. If an extension provides a service to log in users, it is most likely a service extension, even if it logs in FE users. It will be easier to decide where the extension fits after we review all the extension categories in this article. Choosing a category for an extension is mandatory. While the TYPO3 Extension Manager can still display extensions without a proper category, this may change and such extensions may be removed from TER (TYPO3 Extension Repository) in the future. The extension category is visible in several places. Firstly, extensions are sorted and grouped by category in the Extension Manager. Secondly, when an extension is clicked in the Extension Manager, its category is displayed in the extension details. If an extension's category is changed from one to another, it does not affect extension functionality. The Extension Manager will show the extension in a different category. So, categories are truly just hints for the user. They do not have any significant meaning in TYPO3. So, why do we care and talk about them? We do so because it is one of those things that make a good extension. If an extension developer starts making a new extension, they should do it properly from the very beginning. And one of the first things to do properly is to decide where an extension belongs. So, let's look into the various extension categories in more detail. Category: Frontend Extensions that belong to the Frontend category provide functionality related to the FE. It does not mean that they generate website output. Typically, extensions from the FE category extend FE functionality in other ways. For example, they can transform links from standard /index.php?id=12345 to /news/new-typo3-bookis-out.htm. Or, they can filter output and clean it up, compress, add or remove HTML comments, and so on. Often, these extensions use one or more hooks in the FE classes. For example, TSFE has hooks to process submitted data, or to post‑filter content (and many others). Examples of FE extensions are source_optimization and realurl. Category: Frontend plugins Frontend plugins is possibly the most popular extension category. Extensions from this category typically generate content for the website. They provide new content objects, or extend existing types of content objects. Typical examples of extensions from the Frontend plugins category are tt_news, comments, ratings, etc. Category: Backend Extensions from the Backend category provide additional functionality for TYPO3 Backend. Often, they are not seen inside TYPO3 BE, but they still do some work. Examples of such extensions are various debugging extensions (such as rlmp_ filedevlog) and extensions that add or change the pop-up menu in the BE (such as extra_page_cm_options system extension). This category is rarely used because extensions belonging to it are very special. Category: Backend module Extensions from this category provide additional modules for TYPO3 BE. Typical examples are system extensions such as beuser (provides Tools | Users module) or tstemplate (provides Web | Template module). Category: Services Services extend core TYPO3 functionality. Most known and most popular service extensions are authentication services. TYPO3 Extension Repository contains extensions to authenticate TYPO3 users over phpBB, vBulletine, or LDAP user databases. Services are somewhat special and will not be covered in this article. Extension developers who are interested in the development of services should consult appropriate documentation on the typo3.org website. Category: Examples Extensions from this category provide examples. There are not many, and are typically meant for beginners or for those who want to learn a specific feature of TYPO3, or features that another TYPO3 extension provides. Category: Templates Extensions from this category provide templates. Most often, they have preformatted HTML and CSS files in order to use them with the templateautoparser extension or map with TemplaVoila. Sometimes, they also contain TypoScript templates, for example, tmpl_andreas01 and tmpl_andreas09 extensions. Once installed, they provide pre‑mapped TemplaVoila templates for any website, making it easy to have a website up and running within minutes. Category: Documentation Documentation extensions provide TYPO3 documentation. Normally, TYPO3 extensions contain documentation within themselves, though sometimes, a document is too big to be shipped with extensions. In such cases, it is stored separately. There is an unofficial convention to start an extension key for such extensions with the doc_ prefix (that is, doc_indexed_search). Category: Miscellaneous Everything else that does not fit into any other category goes here; typical examples are skins. But do not put your extension here if you just cannot decide where it fits. In all probability, it should go into one of the other categories, not into Miscellaneous. Extension Files TYPO3 extensions consist of several files. Some of these files have predefined names, and serve a predefined purpose. Others provide code or data but also follow certain naming conventions. We will review all the predefined files in this article and see what purpose they serve. We will look into the files according to their logical grouping. While reading this section, you can take any extension from the typo3conf/ext/ directory at your TYPO3 installation and check the contents of each discussed file. Some files may be missing if the extension does not use them. There is only one file which is mandatory for any TYPO3 extension, ext_emconf.php. We will start examining files starting from this one. Common Files All files from this group have predefined names, and TYPO3 expects to find certain information in them. Hacking these files to serve another purpose or to have a different format usually results in incompatibility with other extensions or TYPO3 itself. While it may work in one installation, it may fail in others. So, avoid doing anything non-standard with these files. ext_emconf.php This is the only required file for any TYPO3 extension. And this is the only file that should be modified with great care. If it is corrupt, TYPO3 will not load any extension. This file contains information on the TYPO3 Extension Manager. This information tells the Extension Manager what the extension does, provides, requires, and conflicts with. It also contains a checksum for each file in the extension. This checksum is updated automatically when the extension is sent to TER (TYPO3 Extension Repository). The server administrator can easily check if anyone has hijacked the extension files by looking into the extension details in the Extension Manager. The modified files are shown in red. Here is a tip. If you (as an extension developer) send your own extension directly to the customer (bypassing TER upload), or plan to use it on your own server, always update the ext_emconf.php file using the Backup/Delete function of the Extension Manager. This will ensure that TYPO3 shows up-to-date data in the Extension Manager. Here is an example of a ext_emconf.php file from the smoothuploader extension: <?php ############################################################# # Extension Manager/Repository config file for ext: ↵ # "smoothuploader" # Auto generated 29-02-2008 12:36 # Manual updates: # Only the data in the array - anything else is removed by ↵ # next write. # "version" and "dependencies" must not be touched! ############################################################# $EM_CONF[$_EXTKEY] = array( 'title' => 'SmoothGallery Uploader', 'description' => 'Uploads images to SmoothGallery', 'category' => 'plugin', 'author' => 'Dmitry Dulepov [Netcreators]', 'author_email' => 'dmitry@typo3.org', 'shy' => '', 'dependencies' => 'rgsmoothgallery', 'conflicts' => '', 'priority' => '', 'module' => '', 'state' => 'beta', 'internal' => '', 'uploadfolder' => 0, 'createDirs' => '', 'modify_tables' => 'tx_rgsmoothgallery_image', 'clearCacheOnLoad' => 0, 'lockType' => '', 'author_company' => 'Netcreators BV', 'version' => '0.3.0', 'constraints' => array( 'depends' => array( 'rgsmoothgallery' => '1.1.1-', ), 'conflicts' => array( ), 'suggests' => array( ), ), '_md5_values_when_last_written' => 'a:12:{s:9:...;}', 'suggests' => array( ), ); ?> The variable _md5_values_when_last_written is shortened in the listing above.
Read more
  • 0
  • 0
  • 3221
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-integrating-moodle-20-mahara-and-googledocs-business
Packt
29 Apr 2011
9 min read
Save for later

Integrating Moodle 2.0 with Mahara and GoogleDocs for Business

Packt
29 Apr 2011
9 min read
Moodle 2.0 for Business Beginner's Guide Implement Moodle in your business to streamline your interview, training, and internal communication processes         The Repository integration allows admins to set up external content management systems and use them to complement Moodle's own file management system. Using this integration you can now manage content outside of Moodle and publish it to the system once the document or other content is ready. The Portfolio integration enables users to store their Moodle content in an external e-portfolio system to share with evaluators, peers, and others. Using Google Docs as a repository for Moodle A growing number of organizations are using Google Docs as their primary office suite. Moodle allows you to add Google Docs as a repository so your course authors can link to word processing, spreadsheet, and presentation and form documents on Google Docs. Time for action - configuring the Google Docs plugin To use Google Docs as a repository for Moodle, we first need to configure the plugin like we did with Alfresco. Login to Moodle as a site administrator. From the Site Administration menu, select Plugins and then Repositories. Select Manage Repositories from the Repositories menu. Next to the Google Docs plugin, select Enabled and Visible from the Active menu. On the Configure Google Docs plugin page, give the plugin a different name if you refer to Google Docs as something different in your organization. Click on Save. What just happened You have now set up the Google Docs repository plugin. Each user will have access to their Google Docs account when they add content to Moodle. Time for action - adding a Google Doc to your Moodle course After you have configured the Google Docs plugin, you can add Google Docs to your course. Login to Moodle as a user with course editing privileges. Turn on the editing mode and select File from the Add a resource.. menu in the course section where you want the link to appear. Give the file a name. Remember the name will be the link the user selects to get the file, so be descriptive. Add a description of the file. In the Content section, click the Add.. button to bring up the file browser. Click the Google Docs plugin in the File Picker pop-up window. The first time you access Google Docs from Moodle, you will see a login button on the screen. Click the button and Moodle will take you to the Google Docs login page. Login to Google Docs. Docs will now display a security warning, letting you know an external application (Moodle) is trying to access your file repository. Click on the Grant Access button at the bottom of the screen. Now you will be taken back to the File Picker. Select the file you want to link to your course. If you want to rename the document when it is linked to Moodle, rename it in the Save As text box. Then edit the Author field if necessary and choose a copyright license. Click on Select this file. Select the other options for the file as described in Getting Started with Moodle 2.0 for Business. Click on Save and return to course. What just happened You have now added a Google Doc to your Moodle course. You can add any of the Google Doc types to your course and share them with Moodle users. Google Docs File Formats The Moodle Google Docs plugin makes a copy of the document in a standard office format (rtf, xls, or ppt). When you save the file, any edits to the document after you save it to Moodle will not be displayed. Have a go hero Try importing the other Google Docs file formats into your Moodle course and test the download. Time for reflection Using Google Docs effectively requires clear goals, planning, integration with organizational workflows, and training. If you want to link Moodle with an external content repository, how will you ensure the implementation is successful? What business processes could you automate by using one of these content services? Exporting content to e-portfolios Now that we've integrated Moodle with external content repositories it's time to turn our attention to exporting content from Moodle. The Moodle 2 portfolio system allows users to export Moodle content in standard formats, so they can share their work with other people outside of Moodle, or organize their work into portfolios aimed at a variety of audiences. In a corporate environment, portfolios can be used to demonstrate competency for promotion or performance measurement. They can also be used as a directory of expertise within a company, so others can find people they need for special projects. One of the more popular open source portfolio systems is called Mahara. Mahara is a dedicated e-portfolio system for creating collections of work and then creating multiple views on those collections for specific audiences. It also includes a blogging platform, resume builder, and social networking tools. In recent versions, Mahara has begun to incorporate social networking features to enable users to find others with similar interests or specific skill sets. To start, we'll briefly look at installing Mahara, then work through the integration of Moodle with Mahara. Once we've got the two systems talking to each other, we can look at how to export content from Moodle to Mahara and then display it in an e-portfolio. Time for action - installing Mahara Mahara is a PHP and MySQL application like Moodle. Mahara and Moodle share a very similar architecture, and are designed to be complementary in many respects. You can use the same server setup we've already created for Moodle in Getting Started with Moodle 2.0 for Business. However, we need to create a new database to house the Mahara data as well as ensure Mahara has its own space to operate. Go to http://mahara.org. There is a Download link on the right side of the screen. Download the latest stable version (version 1.3 as of this writing). You will need version 1.3 or later to fully integrate with Moodle 2. For the best results, follow the instructions on the Installing Mahara wiki page, http://wiki.mahara.org/System_Administrator%27s_Guide/Installing_Mahara. If you are installing Mahara on the same personal machine as Moodle, be sure to put the Mahara folder at your web server's root level and keep it separate from Moodle. Your URL for Mahara should be similar to your URL for Moodle. What just happened You have now installed Mahara on your test system. Once you have Mahara up and running on your test server, you can begin to integrate Mahara with Moodle. Time for action - configuring the networking and SSO To begin the process of configuring Moodle and Mahara to work together, we need to enable Moodle Networking. You will need to make sure you have xmlrpc, curl, and openssl installed and configured in your PHP build. Networking allows Moodle to share users and authentication with another system. In this case, we are configuring Moodle to allow Moodle users to automatically login to Mahara when they login to Moodle. This will create a more seamless experience for the users and enable them to move back and forth between the systems. The steps to configure the Mahara portfolio plugin are as follows: From the Site administration menu, select Advanced features. Find the Networking option and set it to On. Select Save changes. The Networking option will then appear in the site admin menu. Select Networking, then Manage Peers. In the Add a new host form, copy the URL of your Mahara site into the hostname field and then select Mahara as the server type. Open a new window and login to your Mahara site as the site admin. Select the Site Admin tab. On your Mahara site, select Configure Site. Then select Networking. Copy the public key from the BEGIN tag to the END CERTIFICATE and paste it into the Public Key field in the Moodle networking form. On the resulting page, select the Services tab to set up the services necessary to integrate the portfolio. You will now need to configure the SSO services. Moodle and Mahara can make the following services available for the other system to consume. Moodle/Mahara Services Descriptions Remote enrollment service: Publish: If you Publish the Remote Enrollment Service, Mahara admins will be able to enroll students in Moodle courses. To enable this, you must also publish to the Single Sign On Service Provider service. Subscribe: Subscribe allows you to remotely enroll students in courses on the remote server. It doesn't apply in the context of Mahara. Portfolio Services: You must enable both Publish and Subscribe to allow users to send content to Mahara. SSO: (Identity Provider) If you Publish the SSO service, users can go from Moodle to Mahara without having to login again. If you Subscribe to this service, users can go from Mahara to Moodle without having to login again. SSO: (Service Provider) This is the converse of Identity Provider service. If you enabled Publish previously, you must enable Subscribe here. If you enabled Subscribe previously, you must enable Publish here. Click on Save changes. What just happened You have just enabled Single Sign-On between Moodle and Mahara. We are now halfway through the setup and now we can configure the Mahara to listen for Moodle users. Have a go hero Moodle Networking is also used to enable Moodle servers to communicate with each other. The Moodle Hub system is designed on top of Moodle networking to enable teachers to share courses with each other, and enable multiple Moodle servers to share users. How could you use this feature to spread Moodle within your organization? Could you create an internal and an external facing Moodle and have them talk to each other? Could different departments each use a Moodle and share access to courses using Moodle networking? For your "have a go hero" activity, design a plan to use Moodle networking within your organization.
Read more
  • 0
  • 0
  • 3221

article-image-data-warehouse-design
Packt
20 May 2014
14 min read
Save for later

Data Warehouse Design

Packt
20 May 2014
14 min read
(For more resources related to this topic, see here.) Most companies are establishing or planning to establish a Business Intelligence system and a data warehouse (DW). Knowledge related to the BI and data warehouse are in great demand in the job market. This article gives you an understanding of what Business Intelligence and data warehouse is, what the main components of the BI system are, and what the steps to create the data warehouse are. This article focuses on the designing of the data warehouse, which is the core of a BI system. A data warehouse is a database designed for analysis, and this definition indicates that designing a data warehouse is different from modeling a transactional database. Designing the data warehouse is also called dimensional modeling. In this article, you will learn about the concepts of dimensional modeling. Understanding Business Intelligence Based on Gartner's definition (http://www.gartner.com/it-glossary/business-intelligence-bi/), Business Intelligence is defined as follows: Business Intelligence is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance. As the definition states, the main purpose of a BI system is to help decision makers to make proper decisions based on the results of data analysis provided by the BI system. Nowadays, there are many operational systems in each industry. Businesses use multiple operational systems to simplify, standardize, and automate their everyday jobs and requirements. Each of these systems may have their own database, some of which may work with SQL Server, some with Oracle. Some of the legacy systems may work with legacy databases or even file operations. There are also systems that work through the Web via web services and XML. Operational systems are very useful in helping with day-to-day business operations such as the process of hiring a person in the human resources department, and sale operations through a retail store and handling financial transactions. The rising number of operational systems also adds another requirement, which is the integration of systems together. Business owners and decision makers not only need integrated data but also require an analysis of the integrated data. As an example, it is a common requirement for the decision makers of an organization to compare their hiring rate with the level of service provided by a business and the customer satisfaction based on that level of service. As you can see, this requirement deals with multiple operational systems such as CRM and human resources. The requirement might also need some data from sales and inventory if the decision makers want to bring sales and inventory factors into their decisions. As a supermarket owner or decision maker, it would be very important to understand what products in which branches were in higher demand. This kind of information helps you to provide enough products to cover demand, and you may even think about creating another branch in some regions. The requirement of integrating multiple operational systems together in order to create consolidated reports and dashboards that help decision makers to make a proper decision is the main directive for Business Intelligence. Some organizations and businesses use ERP systems that are integrated, so a question may appear in your mind that there won't be a requirement for integrating data because consolidated reports can be produced easily from these systems. So does that mean that these systems still require a BI solution? The answer in most cases is yes. The companies or businesses might not require a separate BI system for internal and parts of the operations that implemented it through ERP. However, they might require getting some data from outside, for example, getting some data from another vendor's web service or many other protocols and channels to send and receive information. This indicates that there would be a requirement for consolidated analysis for such information, which brings the BI requirement back to the table. The architecture and components of a BI system After understanding what the BI system is, it's time to discover more about its components and understand how these components work with each other. There are also some BI tools that help to implement one or more components. The following diagram shows an illustration of the architecture and main components of the Business Intelligence system: The BI architecture and components differ based on the tools, environment, and so on. The architecture shown in the preceding diagram contains components that are common in most of the BI systems. In the following sections, you will learn more about each component. The data warehouse The data warehouse is the core of the BI system. A data warehouse is a database built for the purpose of data analysis and reporting. This purpose changes the design of this database as well. As you know, operational databases are built on normalization standards, which are efficient for transactional systems, for example, to reduce redundancy. As you probably know, a 3NF-designed database for a sales system contains many tables related to each other. So, for example, a report on sales information may consume more than 10 joined conditions, which slows down the response time of the query and report. A data warehouse comes with a new design that reduces the response time and increases the performance of queries for reports and analytics. You will learn more about the design of a data warehouse (which is called dimensional modeling) later in this article. Extract Transform Load It is very likely that more than one system acts as the source of data required for the BI system. So there is a requirement for data consolidation that extracts data from different sources and transforms it into the shape that fits into the data warehouse, and finally, loads it into the data warehouse; this process is called Extract Transform Load (ETL). There are many challenges in the ETL process, out of which some will be revealed (conceptually) later in this article. According to the definition of states, ETL is not just a data integration phase. Let's discover more about it with an example; in an operational sales database, you may have dozen of tables that provide sale transactional data. When you design that sales data into your data warehouse, you can denormalize it and build one or two tables for it. So, the ETL process should extract data from the sales database and transform it (combine, match, and so on) to fit it into the model of data warehouse tables. There are some ETL tools in the market that perform the extract, transform, and load operations. The Microsoft solution for ETL is SQL Server Integration Service (SSIS), which is one of the best ETL tools in the market. SSIS can connect to multiple data sources such as Oracle, DB2, Text Files, XML, Web services, SQL Server, and so on. SSIS also has many built-in transformations to transform the data as required. Data model – BISM A data warehouse is designed to be the source of analysis and reports, so it works much faster than operational systems for producing reports. However, a DW is not that fast to cover all requirements because it is still a relational database, and databases have many constraints that reduce the response time of a query. The requirement for faster processing and a lower response time on one hand, and aggregated information on another hand causes the creation of another layer in BI systems. This layer, which we call the data model, contains a file-based or memory-based model of the data for producing very quick responses to reports. Microsoft's solution for the data model is split into two technologies: the OLAP cube and the In-memory tabular model. The OLAP cube is a file-based data storage that loads data from a data warehouse into a cube model. The cube contains descriptive information as dimensions (for example, customer and product) and cells (for example, facts and measures, such as sales and discount). The following diagram shows a sample OLAP cube: In the preceding diagram, the illustrated cube has three dimensions: Product, Customer, and Time. Each cell in the cube shows a junction of these three dimensions. For example, if we store the sales amount in each cell, then the green cell shows that Devin paid 23$ for a Hat on June 5. Aggregated data can be fetched easily as well within the cube structure. For example, the orange set of cells shows how much Mark paid on June 1 for all products. As you can see, the cube structure makes it easier and faster to access the required information. Microsoft SQL Server Analysis Services 2012 comes with two different types of modeling: multidimensional and tabular. Multidimensional modeling is based on the OLAP cube and is fitted with measures and dimensions, as you can see in the preceding diagram. The tabular model is based on a new In-memory engine for tables. The In-memory engine loads all data rows from tables into the memory and responds to queries directly from the memory. This is very fast in terms of the response time. The BI semantic model (BISM) provided by Microsoft is a combination of SSAS Tabular and Multidimensional solutions. Data visualization The frontend of a BI system is data visualization. In other words, data visualization is a part of the BI system that users can see. There are different methods for visualizing information, such as strategic and tactical dashboards, Key Performance Indicators (KPIs), and detailed or consolidated reports. As you probably know, there are many reporting and visualizing tools on the market. Microsoft has provided a set of visualization tools to cover dashboards, KPIs, scorecards, and reports required in a BI application. PerformancePoint, as part of Microsoft SharePoint, is a dashboard tool that performs best when connected to SSAS Multidimensional OLAP cube. Microsoft's SQL Server Reporting Services (SSRS) is a great reporting tool for creating detailed and consolidated reports. Excel is also a great slicing and dicing tool especially for power users. There are also components in Excel such as Power View, which are designed to build performance dashboards. Master Data Management Every organization has a part of its business that is common between different systems. That part of the data in the business can be managed and maintained as master data. For example, an organization may receive customer information from an online web application form or from a retail store's spreadsheets, or based on a web service provided by other vendors. Master Data Management (MDM) is the process of maintaining the single version of truth for master data entities through multiple systems. Microsoft's solution for MDM is Master Data Services (MDS). Master data can be stored in the MDS entities and it can be maintained and changed through the MDS Web UI or Excel UI. Other systems such as CRM, AX, and even DW can be subscribers of the master data entities. Even if one or more systems are able to change the master data, they can write back their changes into MDS through the staging architecture. Data Quality Services The quality of data is different in each operational system, especially when we deal with legacy systems or systems that have a high dependence on user inputs. As the BI system is based on data, the better the quality of data, the better the output of the BI solution. Because of this fact, working on data quality is one of the components of the BI systems. As an example, Auckland might be written as "Auckland" in some Excel files or be typed as "Aukland" by the user in the input form. As a solution to improve the quality of data, Microsoft provided users with DQS. DQS works based on Knowledge Base domains, which means a Knowledge Base can be created for different domains, and the Knowledge Base will be maintained and improved by a data steward as time passes. There are also matching policies that can be used to apply standardization on the data. Building the data warehouse A data warehouse is a database built for analysis and reporting. In other words, a data warehouse is a database in which the only data entry point is through ETL, and its primary purpose is to cover reporting and data analysis requirements. This definition clarifies that a data warehouse is not like other transactional databases that operational systems write data into. When there is no operational system that works directly with a data warehouse, and when the main purpose of this database is for reporting, then the design of the data warehouse will be different from that of transactional databases. If you recall from the database normalization concepts, the main purpose of normalization is to reduce the redundancy and dependency. The following table shows customers' data with their geographical information: Customer First Name Last Name Suburb City State Country Devin Batler Remuera Auckland Auckland New Zealand Peter Blade Remuera Auckland Auckland New Zealand Lance Martin City Center Sydney NSW Australia Let's elaborate on this example. As you can see from the preceding list, the geographical information in the records is redundant. This redundancy makes it difficult to apply changes. For example, in the structure, if Remuera, for any reason, is no longer part of the Auckland city, then the change should be applied on every record that has Remuera as part of its suburb. The following screenshot shows the tables of geographical information: So, a normalized approach is to retrieve the geographical information from the customer table and put it into another table. Then, only a key to that table would be pointed from the customer table. In this way, every time the value Remuera changes, only one record in the geographical region changes and the key number remains unchanged. So, you can see that normalization is highly efficient in transactional systems. This normalization approach is not that effective on analytical databases. If you consider a sales database with many tables related to each other and normalized at least up to the third normalized form (3NF), then analytical queries on such databases may require more than 10 join conditions, which slows down the query response. In other words, from the point of view of reporting, it would be better to denormalize data and flatten it in order to make it easier to query data as much as possible. This means the first design in the preceding table might be better for reporting. However, the query and reporting requirements are not that simple, and the business domains in the database are not as small as two or three tables. So real-world problems can be solved with a special design method for the data warehouse called dimensional modeling. There are two well-known methods for designing the data warehouse: the Kimball and Inmon methodologies. The Inmon and Kimball methods are named after the owners of these methodologies. Both of these methods are in use nowadays. The main difference between these methods is that Inmon is top-down and Kimball is bottom-up. In this article, we will explain the Kimball method. You can read more about the Inmon methodology in Building the Data Warehouse, William H. Inmon, Wiley (http://www.amazon.com/Building-Data-Warehouse-W-Inmon/dp/0764599445), and about the Kimball methodology in The Data Warehouse Toolkit, Ralph Kimball, Wiley (http://www.amazon.com/The-Data-Warehouse-Toolkit-Dimensional/dp/0471200247). Both of these books are must-read books for BI and DW professionals and are reference books that are recommended to be on the bookshelf of all BI teams. This article is referenced from The Data Warehouse Toolkit, so for a detailed discussion, read the referenced book. Dimensional modeling To gain an understanding of data warehouse design and dimensional modeling, it's better to learn about the components and terminologies of a DW. A DW consists of Fact tables and dimensions. The relationship between a Fact table and dimensions are based on the foreign key and primary key (the primary key of the dimension table is addressed in the fact table as the foreign key). Summary This article explains the first steps in thinking and designing a BI system. As the first step, a developer needs to design the data warehouse (DW) and needs an understanding of the key concepts of the design and methodologies to create the data warehouse. Resources for Article: Further resources on this subject: Self-service Business Intelligence, Creating Value from Data [Article] Oracle Business Intelligence : Getting Business Information from Data [Article] Business Intelligence and Data Warehouse Solution - Architecture and Design [Article]
Read more
  • 0
  • 0
  • 3221

article-image-lets-start-extending-docker
Packt
13 Oct 2016
8 min read
Save for later

Let's start with Extending Docker

Packt
13 Oct 2016
8 min read
In this article by Russ McKendrick author of the book Extending Docker, we will discuss the following topics: Why Docker has been so widely accepted by the entire industry What does a typical container's life cycle look like? What will you need for the remainder of the chapters? (For more resources related to this topic, see here.) The rise of Docker Not very often does a technology come along that is adopted so widely across an entire industry. Since its first public release in March 2013, Docker has not only gained the support of both end users, like you and I, but also industry leaders such as Amazon, Microsoft, and Google. Docker is currently using the following sentence on their website to describe why you would want to use it: Docker provides an integrated technology suite that enables development and IT operations teams to build, ship, and run distributed applications anywhere. There is a meme, based on the disaster girl photo, which sums up why such a seemingly simple explanation is actually quite important: So as simple as Docker's description sounds, it's actually a been utopia for most developers and IT operations teams for a number of years to have tool that can ensure that an application can consistently work across the following three main stages of an application's life cycle: Development Staging and Preproduction Production To illustrate why this used to be a problem before Docker arrived at the scene, let's look at how the services were traditionally configured and deployed. People tended to typically use a mixture of dedicated machines and virtual machines. So let's look at these in more detail. While this is possible using configuration management tools, such as Puppet,or orchestration tools, such as Ansible, to maintain consistency between server environments, it is difficult to enforce these across both servers and a developer's workstation. Dedicated machines Traditionally, these are a single piece of hardware that have been configured to run your application, while the applications have direct access to the hardware, you are constrained by the binaries and libraries you can install on a dedicated machine, as they have to be shared across the entire machine. To illustrate one potential problem Docker has fixed, let's say you had a single dedicated server that was running your PHP application. When you initially deployed the dedicated machine, all three of the applications, which make up your e-commerce website, worked with PHP 5.6, so there was no problem with compatibility. Your development team has been slowly working through the three PHP applications. You have deployed it on your host to make them work with PHP 7, as this will give them a good boost in performance. However, there is a single bug that they have not been able to resolve with App2, which means that it will not run under PHP 7 without crashing when a user adds an item to their shopping cart. If you have a single host running your three applications, you will not be able to upgrade from PHP 5.6 to PHP 7 until your development team has resolved the bug with App2, unless you do one of the following: Deploy a new host running PHP 7 and migrate App1 and App3 to it; this could be both time consuming and expensive Deploy a new host running PHP 5.6 and migrate App2 to it; again this could be both time consuming and expensive Wait until the bug has been fixed; the performance improvements that the upgrade from PHP 5.6 to PHP 7 bring to the application could increase the sales and there is no ETA for the fix If you go for the first two options, you also need to ensure that the new dedicated machine either matches the developer's PHP 7 environment or that a new dedicated machine is configured in exactly the same way as your existing environment; after all, you don't want to introduce further problems by having a poorly configured machine. Virtual machines One solution to the scenario detailed earlier would be to slice up your dedicated machine's resources and make them available to the application by installing a hypervisor such as the following: KVM: http://www.linux-kvm.org/ XenSource: http://www.xenproject.org/ VMware vSphere: http://www.vmware.com/uk/products/vsphere- hypervisor/ Once installed, you can then install your binaries and libraries on each of the different virtual hosts and also install your applications on each one. Going back to the scenario given in the dedicated machine section, you will be able to upgrade to PHP 7 on the virtual machines with App1 and App2 installed, while leaving App2 untouched and functional while the development work on the fix. Great, so what is the catch? From the developer's view, there is none as they have their applications running with the PHP versions, which work best for them; however, from an IT operations point of view: More CPU, RAM, and disk space: Each of the virtual machines will require additional resources as the overhead of running three guest OS, as well as the three applications have to be taken into account More management: IT operations now need to patch, monitor, and maintain four machines, the dedicated host machine along with three virtual machines, where as before they only had a single dedicated host. As earlier, you also need to ensure that the configuration of the three virtual machines that are hosting your applications match the configuration that the developers have been using during the development process; again, you do not want to introduce additional problems due to configuration and process drift between departments. Dedicated versus virtual machines The following diagram shows the how a typical dedicated and virtual machine host would be configured: As you can see, the biggest differences between the two are quite clear. You are making a trade-off between resource utilization and being able to run your applications using different binaries/libraries. Containers Now we have covered the way in which our applications have been traditionally deployed. Let's look at what Docker adds to the mix. Back to our scenario of the three applications running on a single host machine. Installing Docker on the host and then deploying each of the applications as a container on this host gives you the benefits of the virtual machine, while vastly reducing the footprint, that is, removing the need for the hypervisor and guest operating system completely, and replacing them with a SlimLine interface directly into the host machines kernel. The advantages this gives both the IT operations and development teams are as follows: Low overhead: As mentioned already, the resource and management for the IT operations team is lower Development provide the containers: Rather than relying on the IT operations team to configure each of the three applications environments to machine the development environment, they can simply pass their containers to be put into production As you can see from the following diagram, the layers between the application and host operating system have been reduced: This may seem too good to be true, and to be honest, there is a "but". For most web applications or applications that are pre-compiled static binaries, you shouldn't have a problem. However, as Docker shares resources with the underlying host machine, such as the Kernel version, if your application needs to be compiled or have a reliance on certain libraries that are only compatible with the shared resources, then you will have to deploy your containers on a like-for-like operating system, and in some cases, hardware. Docker has tried to address this issue with the acquisition of a company called Unikernel Systems in January 2016. At the time of writing this book, not a lot is known about how Docker is planning to integrate this technology into their core product, if at all. You can find out more about this technology at https://blog. docker.com/2016/01/unikernel/. Summary In this article we got to know that not very often does a technology come along that is adopted so widely across an entire industry. Since its first public release in March 2013, Docker has not only gained the support of both end users, like you and I, but also industry leaders such as Amazon, Microsoft, and Google. Docker is currently using the following sentence on their website to describe why you would want to use it: Resources for Article: Further resources on this subject: Introduction to Docker [article] CoreOS Networking and Flannel Internals [article] Virtualizing Hosts and Applications [article]
Read more
  • 0
  • 0
  • 3219

article-image-angular-zen
Packt
19 Sep 2013
5 min read
Save for later

Angular Zen

Packt
19 Sep 2013
5 min read
(For more resources related to this topic, see here.) Meet AngularJS AngularJS is a client-side MVC framework written in JavaScript. It runs in a web browser and greatly helps us (developers) to write modern, single-page, AJAX-style web applications. It is a general purpose framework, but it shines when used to write CRUD (Create Read Update Delete) type web applications. Getting familiar with the framework AngularJS is a recent addition to the client-side MVC frameworks list, yet it has managed to attract a lot of attention, mostly due to its innovative templating system, ease of development, and very solid engineering practices. Indeed, its templating system is unique in many respects: It uses HTML as the templating language It doesn't require an explicit DOM refresh, as AngularJS is capable of tracking user actions, browser events, and model changes to figure out when and which templates to refresh It has a very interesting and extensible components subsystem, and it is possible to teach a browser how to interpret new HTML tags and attributes The templating subsystem might be the most visible part of AngularJS, but don't be mistaken that AngularJS is a complete framework packed with several utilities and services typically needed in single-page web applications. AngularJS also has some hidden treasures, dependency injection (DI) and strong focus on testability. The built-in support for DI makes it easy to assemble a web application from smaller, thoroughly tested services. The design of the framework and the tooling around it promote testing practices at each stage of the development process. Finding your way in the project AngularJS is a relatively new actor on the client-side MVC frameworks scene; its 1.0 version was released only in June 2012. In reality, the work on this framework started in 2009 as a personal project of Miško Hevery, a Google employee. The initial idea turned out to be so good that, at the time of writing, the project was officially backed by Google Inc., and there is a whole team at Google working full-time on the framework. AngularJS is an open source project hosted on GitHub (https://github.com/angular/angular.js) and licensed by Google, Inc. under the terms of the MIT license. The community At the end of the day, no project would survive without people standing behind it. Fortunately, AngularJS has a great, supportive community. The following are some of the communication channels where one can discuss design issues and request help: angular@googlegroups.com mailing list (Google group) Google + community at https://plus.google.com/u/0/communities/115368820700870330756 #angularjs IRC channel [angularjs] tag at http://stackoverflow.com AngularJS teams stay in touch with the community by maintaining a blog (http://blog.angularjs.org/) and being present in the social media, Google + (+ AngularJS), and Twitter (@angularjs). There are also community meet ups being organized around the world; if one happens to be hosted near a place you live, it is definitely worth attending! Online learning resources AngularJS has its own dedicated website (http://www.angularjs.org) where we can find everything that one would expect from a respectable framework: conceptual overview, tutorials, developer's guide, API reference, and so on. Source code for all released AngularJS versions can be downloaded from http://code.angularjs.org. People looking for code examples won't be disappointed, as AngularJS documentation itself has plenty of code snippets. On top of this, we can browse a gallery of applications built with AngularJS (http://builtwith.angularjs.org). A dedicated YouTube channel (http://www.youtube.com/user/angularjs) has recordings from many past events as well as some very useful video tutorials. Libraries and extensions While AngularJS core is packed with functionality, the active community keeps adding new extensions almost every day. Many of those are listed on a dedicated website: http://ngmodules.org. Tools AngularJS is built on top of HTML and JavaScript, two technologies that we've been using in web development for years. Thanks to this, we can continue using our favorite editors and IDEs, browser extensions, and so on without any issues. Additionally, the AngularJS community has contributed several interesting additions to the existing HTML/JavaScript toolbox. Batarang Batarang is a Chrome developer tool extension for inspecting the AngularJS web applications. Batarang is very handy for visualizing and examining the runtime characteristics of AngularJS applications. We are going to use it extensively in this article to peek under the hood of a running application. Batarang can be installed from the Chrome's Web Store (AngularJS Batarang) as any other Chrome extension. Plunker and jsFiddle Both Plunker (http://plnkr.co) and jsFiddle (http://jsfiddle.net) make it very easy to share live-code snippets (JavaScript, CSS, and HTML). While those tools are not strictly reserved for usage with AngularJS, they were quickly adopted by the AngularJS community to share the small-code examples, scenarios to reproduce bugs, and so on. Plunker deserves special mentioning as it was written in AngularJS, and is a very popular tool in the community. IDE extensions and plugins Each one of us has a favorite IDE or an editor. The good news is that there are existing plugins/extensions for several popular IDEs such as Sublime Text 2 (https://github.com/angular-ui/AngularJS-sublime-package), Jet Brains' products (http://plugins.jetbrains.com/plugin?pr=idea&pluginId=6971), and so on.
Read more
  • 0
  • 0
  • 3216
article-image-osworkflow-and-quartz-task-scheduler
Packt
21 Oct 2009
10 min read
Save for later

OSWorkflow and the Quartz Task Scheduler

Packt
21 Oct 2009
10 min read
Task Scheduling with Quartz Both people-oriented and system-oriented BPM systems need a mechanism to execute tasks within an event or temporal constraint, for example, every time a state change occurs or every two weeks. BPM suites address these requirements with a job-scheduling component responsible for executing tasks at a given time. OSWorkflow, the core of our open-source BPM solution, doesn't include these temporal capabilities by default. Thus, we can enhance OSWorkflow by adding the features present in the Quartz open-source project. What is Quartz? Quartz is a Java job-scheduling system capable of scheduling and executing jobs in a very flexible manner. The latest stable Quartz version is 1.6. You can download Quartz from http://www.opensymphony.com/quartz/download.action. Installing The only file you need in order to use Quartz out of the box is quartz.jar. It contains everything you need for basic usage. Quartz configuration is in the quartz. properties file, which you must put in your application's classpath. Basic Concepts The Quartz API is very simple and easy to use. The first concept that you need to be familiar with is the scheduler. The scheduler is the most important part of Quartz, managing as the word implies the scheduling and unscheduling of jobs and the firing of triggers. Task Scheduling with Quartz A job is a Java class containing the task to be executed and the trigger is the temporal specification of when to execute the job. A job is associated with one or more triggers and when a trigger fires, it executes all its related jobs. That's all you need to know to execute our Hello World job. Integration with OSWorkflow By complementing the features of OSWorkflow with the temporal capabilities of Quartz, our open-source BPM solution greatly enhances its usefulness. The Quartz-OSWorkflow integration can be done in two ways—Quartz calling OSWorkflow workflow instances and OSWorkflow scheduling and unscheduling Quartz jobs. We will cover the former first, by using trigger-functions, and the latter with the ScheduleJob function provider. Creating a Custom Job Job's are built by implementing the org.quartz.Job interface as follows: public void execute(JobExecutionContext context) throws JobExecutionException; The interface is very simple and concise, with just one method to be implemented. The Scheduler will invoke the execute method when the trigger associated with the job fires. The JobExecutionContext object passed as an argument has all the context and environment data for the job, such as the JobDataMap. The JobDataMap is very similar to a Java map but provides strongly typed put and get methods. This JobDataMap is set in the JobDetail file before scheduling the job and can be retrieved later during the execution of the job via the JobExecutionContext's getJobDetail().getJobDataMap() method. Trigger Functions trigger-functions are a special type of OSWorkflow function designed specifically for job scheduling and external triggering. These functions are executed when the Quartz trigger fires, thus the name. trigger-functions are not associated with an action and they have a unique ID. You shouldn't execute a trigger-function in your code. To define a trigger-function in the definition, put the trigger-functions declaration before the initial-actions element. ... <trigger-functions> <trigger-function id="10"> <function type="beanshell"> <arg name="script"> propertySet.setString("triggered", "true"); </arg> </function> </trigger-function> </trigger-functions> <initial-actions> ... This XML definition fragment declares a trigger-function (having an ID of 10), which executes a beanshell script. This script will put a named property inside the PropertySet of the instance but you can define a trigger-function just like any other Java- or BeanShell-based function. To invoke this trigger-function, you will need an OSWorkflow built-in function provider to execute trigger-functions and to schedule a custom job—the ScheduleJob FunctionProvider. More about Triggers Quartz's triggers are of two types—the SimpleTrigger and the CronTrigger. The former, as its name implies, serves for very simple purposes while the latter is more complex and powerful; it allows for unlimited flexibility for specifying time periods. SimpleTrigger SimpleTrigger is more suited for job firing at specific points in time, such as Saturday 1st at 3.00 PM, or at an exact point in time repeating the triggering at fixed intervals. The properties for this trigger are the shown in the following table: s. The properties for this trigger are the shown in the following table: Property Description Start time The fire time of the trigger. End time The end time of the trigger. If it is specified, then it overrides the repeat count. Repeat interval The interval time between repetitions. It can be 0 or a positive integer. If it is 0, then the repeat count will happen in parallel. Repeat count How many times the trigger will fire. It can be 0, a positive integer, or SimpleTrigger.REPEAT_INDEFINITELY.     CronTrigger The CronTrigger is based on the concept of the UN*X Cron utility. It lets you specify complex schedules, like every Wednesday at 5.00 AM, or every twenty minutes, or every 5 seconds on Monday. Like the SimpleTrigger, the CronTrigger has a start time property and an optional end time. A CronExpression is made of seven parts, each representing a time component: Each number represents a time part: 1 represents seconds 2 represents minutes 3 represents hours 4 represents the day-of-month 5 represents month 6 represents the day-of-week 7 represents year (optional field) Here are a couple of examples of cron expression: 0 0 6 ? * MON: This CronExpression means "Every Monday at 6 AM". 0 0 6 * *: This CronExpression mans "Every day at 6 am". For more information about CronExpressions refer to the following website: http://www.opensymphony.com/quartz/wikidocs/CronTriggers%20Tutorial.html. Scheduling a Job We will get a first taste of Quartz, by executing a very simple job. The following snippet of code shows how easy it is to schedule a job.     SchedulerFactory schedFact = new                                             org.quartz.impl.StdSchedulerFactory();     Scheduler sched = schedFact.getScheduler();     sched.start();        JobDetail jobDetail = new JobDetail("myJob", null, HelloJob.class);        Trigger trigger = TriggerUtils.makeHourlyTrigger();                                                       // fire every hour        trigger.setStartTime(TriggerUtils.getEvenHourDate(new Date()));                                                     // start on the next even hour        trigger.setName("myTrigger");        sched.scheduleJob(jobDetail, trigger); The following code assumes a HelloJob class exists. It is a very simple class that implements the job interface and just prints a message to the console. package packtpub.osw; import org.quartz.Job; import org.quartz.JobExecutionContext; import org.quartz.JobExecutionException; /** * Hello world job. */ public class HelloJob implements Job { public void execute(JobExecutionContext ctx) throws JobExecutionException { System.out.println("Hello Quartz world."); } }   The first three lines of the following code create a SchedulerFactory, an object that creates Schedulers, and then proceed to create and start a new Scheduler. SchedulerFactory schedFact = new org.quartz.impl.StdSchedulerFactory(); Scheduler sched = schedFact.getScheduler(); sched.start(); This Scheduler will fire the trigger and subsequently the jobs associated with the trigger. After creating the Scheduler, we must create a JobDetail object that contains information about the job to be executed, the job group to which it belongs, and other administrative data. JobDetail jobDetail = new JobDetail("myJob", null, HelloJob.class); This JobDetail tells the Scheduler to instantiate a HelloJob object when appropriate, has a null JobGroup, and has a Job name of "myJob". After defining the JobDetail, we must create and define the Trigger, that is, when the Job will be executed and how many times, etc. Trigger trigger = TriggerUtils.makeHourlyTrigger(); // fire every hour trigger.setStartTime(TriggerUtils.getEvenHourDate(new Date())); // start on the next even hour trigger.setName("myTrigger"); The TriggerUtils is a helper object used to simplify the trigger code. With the help of the TriggerUtils, we will create a trigger that will fire every hour. This trigger will start firing the next even hour after the trigger is registered with the Scheduler. The last line of code puts a name to the trigger for housekeeping purposes. Finally, the last line of code associates the trigger with the job and puts them under the control of the Scheduler. sched.scheduleJob(jobDetail, trigger); When the next even hour arrives after this line of code is executed, the Scheduler will fire the trigger and it will execute the job by reading the JobDetail and instantiating the HelloJob.class. This requires that the class implementing the job interface must have a no-arguments constructor. An alternative method is to use an XML file for declaring the jobs and triggers. This will not be covered in the book, but you can find more information about it in the Quartz documentation. Scheduling from a Workflow Definition The ScheduleJob FunctionProvider has two modes of operation, depending on whether you specify the jobClass parameter or not. If you declare the jobClass parameter, ScheduleJob will create a JobDetail with jobClass as the class implementing the job interface. <pre-functions> <function type="class"> <arg name="class.name">com.opensymphony.workflow.util.ScheduleJob </arg> <arg name="jobName">Scheduler Test </arg> <arg name="triggerName">SchedulerTestTrigger</arg> <arg name="triggerId">10 </arg> <arg name="jobClass">packtpub.osw.SendMailIfActive </arg> <arg name="schedulerStart">true </arg> <arg name="local">true </arg> </function> </pre-functions> This fragment will schedule a job based on the SendMailIfActive class with the current time as the start time. The ScheduleJob like any FunctionProvider can be declared as a pre or a post function. On the other hand, if you don't declare the jobClass, ScheduleJob will use the WorkflowJob.class as the class implementing the job interface. This job executes a trigger-function on the instance that scheduled it when fired.    <pre-functions> <function type="class"> <arg name="class.name">com.opensymphony.workflow.util.ScheduleJob </arg> <arg name="jobName">Scheduler Test </arg> <arg name="triggerName">SchedulerTestTrigger </arg> <arg name="triggerId">10 </arg> <arg name="schedulerStart">true </arg> <arg name="local">true </arg> </function> </pre-functions>   This definition fragment will execute the trigger-function with ID 10 as soon as possible, because no CronExpression or start time arguments have been specified. This FunctionProvider has the arguments shown in the following table:
Read more
  • 0
  • 0
  • 3216

article-image-basic-coding-hornetq-creating-and-consuming-messages
Packt
28 Nov 2012
4 min read
Save for later

Basic Coding with HornetQ: Creating and Consuming Messages

Packt
28 Nov 2012
4 min read
(For more resources related to this topic, see here.) Installing Eclipse on Windows You can download the Eclipse IDE for Java EE developers (in our case the ZIP file eclipse-jee-indigo-SR1-win32.zip) from http://www.eclipse.org/downloads/. Once downloaded, you have to unzip the eclipse folder inside the archive to the destination folder so that you have a folder structure like the one illustrated in the following screenshot: Now a double-click on the eclipse.exe file will fire the first run of Eclipse. Installing NetBeans on Windows NetBeans is one of the most frequently used IDE for Java development purposes. It mimics the Eclipse plugin module's installation, so you could download the J2EE version from the URL http://netbeans.org/downloads/.But remember that this version also comes with an integrated GlassFish application server and a Tomcat server. Even in this case you only need to download the .exe file (java_ee_sdk-6u3-jdk7-windows.exe, in our case) and launch the installer. Once finished, you should be able to run the IDE by clicking on the NetBeans icon in your Windows Start menu. Installing NetBeans on Linux If you are using a Debian-based version of Linux like Ubuntu, installing both NetBeans and Eclipse is nothing more than typing a command from the bash shell and waiting for the installation process to finish. As we are using Ubuntu Version 11, we will type the following command from a non-root user account to install Eclipse: sudo apt-get install eclipse The NetBeans installation procedure is slightly different due to the fact that the Ubuntu repositories do not have a package for a NetBeans installation. So, for installing NetBeans you have to download a script and then run it. If you are using a non-root user account, you need to type the following commands on a terminal: sudo wget http://download.netbeans.org/netbeans/7.1.1/final/bundles/ netbeans-7.1.1-ml-javaee-linux.sh sudo chmod +x netbeans-7.1.1-ml-javaee-linux.sh ./netbeans-7.1.1-ml-javaee-linux.sh During the first run of the IDE, Eclipse will ask which default workspace the new projects should be stored in. Choose the one suggested, and in case you are not planning to change it, check the Use this as the default and do not ask again checkbox for not re-proposing the question, as shown in the following screenshot: The same happens with NetBeans, but during the installation procedure. Post installation Both Eclipse and NetBeans have an integrated system for upgrading them to the latest version, so when you have correctly launched the first-time run, keep your IDE updated. For Eclipse, you can access the Update window by using the menu Help | Check for updates. This will pop up the window, as shown in this screenshot: NetBeans has the same functionality, which can be launched from the menu. A 10,000 foot view of HornetQ Before moving on with the coding phase, it is time to recover some concepts to allow the user and the coder to better understand how HornetQ manages messages. HornetQ is only a set of Plain Old Java Objects (POJOs) compiled and grouped into JAR files. The software developer could easily grasp that this characteristic leads to HornetQ having no dependency on third-party libraries. It is possible to use and even start HornetQ from any Java class; this is a great advantage over other frameworks. HornetQ deals internally only with its own set of classes, called the HornetQ core, avoiding any dependency on JMS dialect and specifications. Nevertheless, the client that connects with the HornetQ server can speak the JMS language. So the HornetQ server also uses a JMS to core HornetQ API translator. This means that when you send a JMS message to a HornetQ server, it is received as JMS and then translated into the core API dialect to be managed internally by HornetQ. The following figure illustrates this concept: The core messaging concepts of HornetQ are somewhat simpler than those of JMS: Message: This is a unit of data that can be sent/delivered from a consumer to a producer. Messages have various possibilities. But only to cite them, a message can have: durability, priority, expiry time, time, and dimension. Address: HornetQ maintains an association between an address (IP address of the server) and the queues available at that address. So the message is bound to the address. Queue: This is nothing more than a set of messages. Like messages, queues have attributes such as durability, temporary, and filtering expressions.
Read more
  • 0
  • 0
  • 3216

article-image-backup-postgresql-9
Packt
25 Oct 2010
11 min read
Save for later

Backup in PostgreSQL 9

Packt
25 Oct 2010
11 min read
Most people admit that backups are essential, though they also devote only a very small amount of time to thinking about the topic. The first recipe is about understanding and controlling crash recovery. We need to understand what happens if the database server crashes, so we can understand when we might need to recover. The next recipe is all about planning. That's really the best place to start before you go charging ahead to do backups. Understanding and controlling crash recovery Crash recovery is the PostgreSQL subsystem that saves us if the server should crash, or fail as a part of a system crash. It's good to understand a little about it and to do what we can to control it in our favor. How to do it... If PostgreSQL crashes there will be a message in the server log with severity-level PANIC. PostgreSQL will immediately restart and attempt to recover using the transaction log or Write Ahead Log (WAL). The WAL consists of a series of files written to the pg_xlog subdirectory of the PostgreSQL data directory. Each change made to the database is recorded first in WAL, hence the name "write-ahead" log. When a transaction commits, the default and safe behavior is to force the WAL records to disk. If PostgreSQL should crash, the WAL will be replayed, which returns the database to the point of the last committed transaction, and thus ensures the durability of any database changes. Note that the database changes themselves aren't written to disk at transaction commit. Those changes are written to disk sometime later by the "background writer" on a well-tuned server. Crash recovery replays the WAL, though from what point does it start to recover? Recovery starts from points in the WAL known as "checkpoints". The duration of crash recovery depends upon the number of changes in the transaction log since the last checkpoint. A checkpoint is a known safe starting point for recovery, since at that time we write all currently outstanding database changes to disk. A checkpoint can become a performance bottleneck on busy database servers because of the number of writes required. There are a number of ways of tuning that, though please also understand the effect on crash recovery that those tuning options may cause. Two parameters control the amount of WAL that can be written before the next checkpoint. The first is checkpoint_segments, which controls the number of 16 MB files that will be written before a checkpoint is triggered. The second is time-based, known as checkpoint_timeout, and is the number of seconds until the next checkpoint. A checkpoint is called whenever either of those two limits is reached. It's tempting to banish checkpoints as much as possible by setting the following parameters: checkpoint_segments = 1000 checkpoint_timeout = 3600 Though if you do you might give some thought to how long the recovery will be if you do and whether you want that. Also, you should make sure that the pg_xlog directory is mounted on disks with enough disk space for at least 3 x 16 MB x checkpoint_segments. Put another way, you need at least 32 GB of disk space for checkpoint_segments = 1000. If wal_keep_segments > 0 then the server can also use up to 16MB x (wal_keep_segments + checkpoint_segments). How it works... Recovery continues until the end of the transaction log. We are writing this continually, so there is no defined end point; it is literally the last correct record. Each WAL record is individually CRC checked, so we know whether a record is complete and valid before trying to process it. Each record contains a pointer to the previous record, so we can tell that the record forms a valid link in the chain of actions recorded in WAL. As a result of that, recovery always ends with some kind of error reading the next WAL record. That is normal. Recovery performance can be very fast, though it does depend upon the actions being recovered. The best way to test recovery performance is to setup a standby replication server. There's more... It's possible for a problem to be caused replaying the transaction log, and for the database server to fail to start. Some people's response to this is to use a utility named pg_resetxlog, which removes the current transaction log files and tidies up after that surgery has taken place. pg_resetxlog destroys data changes and that means data loss. If you do decide to run that utility, make sure you take a backup of the pg_xlog directory first. My advice is to seek immediate assistance rather than do this. You don't know for certain that doing this will fix a problem, though once you've done it, you will have difficulty going backwards. Planning backups This section is all about thinking ahead and planning. If you're reading this section before you take a backup, well done. The key thing to understand is that you should plan your recovery, not your backup. The type of backup you take influences the type of recovery that is possible, so you must give some thought to what you are trying to achieve beforehand. If you want to plan your recovery, then you need to consider the different types of failures that can occur. What type of recovery do you wish to perform? You need to consider the following main aspects: Full/Partial database? Everything or just object definitions only? Point In Time Recovery Restore performance We need to look at the characteristics of the utilities to understand what our backup and recovery options are. It's often beneficial to have multiple types of backup to cover the different types of failure possible. Your main backup options are logical backup—using pg_dump physical backup—file system backup pg_dump comes in two main flavors: pg_dump and pg_dumpall. pg_dump has a -F option to produce backups in various file formats. The file format is very important when it comes to restoring from backup, so you need to pay close attention to that. The following table shows the features available, depending upon the backup technique selected. Table of Backup/Recovery options: SQL dump to an archive file pg_dump -F cSQL dump to a script file pg_dump -F p or pg_dumpallFilesystem backup using pg_start_ backupBackup typeLogicalLogicalPhysicalRecover to point in time?NoNoYesBackup all databases?One at a timeYes (pg_dumpall)YesAll databases backed up at same time?NoNoYesSelective backup?YesYesNo (Note 3)Incremental backup?NoNoPossible (Note 4)Selective restore?YesPossible (Note 1)No (Note 5)DROP TABLE recoveryYes Yes Possible (Note 6) DROP TABLESPACE recovery Possible (Note 2)Possible (Note 6)Possible (Note 6)Compressed backup files?YesYesYesBackup is multiple files?NoNoYesParallel backup possible?NoNoYesParallel restore possible?YesNoYesRestore to later release?YesYesNoStandalone backup?YesYesYes (Note 7)Allows DDL during backupNoNoYes How to do it... If you've generated a script with pg_dump or pg_dumpall and need to restore just a single object, then you're going to need to go deep. You will need to write a Perl script (or similar) to read the file and extract out the parts you want. It's messy and time-consuming, but probably faster than restoring the whole thing to a second server, and then extracting just the parts you need with another pg_dump. See recipe Recovery of a dropped/damaged tablespace. Selective backup with physical backup is possible, though will cause later problems when you try to restore. Selective restore with physical backup isn't possible with currently supplied utilities. See recipe for Standalone hot physical backup How it works... To backup all databases, you may be told you need to use the pg_dumpall utility. I have four reasons why you shouldn't do that, which are as follows: If you use pg_dumpall, then the only output produced is into a script file. Script files can't use the parallel restore feature of pg_restore, so by taking your backup in this way you will be forcing the restore to be slower than it needs to be. pg_dumpall produces dumps of each database, one after another. This means that: pg_dumpall is slower than running multiple pg_dump tasks in parallel, one against each database. The dumps of individual databases are not consistent to a particular point in time. If you start the dump at 04:00 and it ends at 07:00 then we're not sure exactly when the dump relates to—sometime between 0400 and 07:00. Options for pg_dumpall are similar in many ways to pg_dump, though not all of them exist, so some things aren't possible. In summary, pg_dumpall is slower to backup, slow to restore, and gives you less control over the dump. I suggest you don't use it for those reasons. If you have multiple databases, then I suggest you take your backup by doing either. Dump global information for the database server using pg_dumpall -g. Then dump all databases in parallel using a separate pg_dump for each database, taking care to check for errors if they occur. Use the physical database backup technique instead. Hot logical backup of one database Logical backup makes a copy of the data in the database by dumping out the contents of each table. How to do it... The command to do this is simple and as follows: pg_dump -F c > dumpfile or pg_dump -F c –f dumpfile You can also do this through pgAdmin3 as shown in the following screenshot: How it works... pg_dump produces a single output file. The output file can use the split(1) command to separate the file into multiple pieces if required. pg_dump into the custom format is lightly compressed by default. Compression can be removed or made more aggressive. pg_dump runs by executing SQL statements against the database to unload data. When PostgreSQL runs an SQL statement we take a "snapshot" of currently running transactions, which freezes our viewpoint of the database. We can't (yet) share that snapshot across multiple sessions, so we cannot run an exactly consistent pg_dump in parallel in one database, nor across many databases. The time of the snapshot is the only time we can recover to—we can't recover to a time either before or after that time. Note that the snapshot time is the start of the backup, not the end. When pg_dump runs, it holds the very lowest kind of lock on the tables being dumped. Those are designed to prevent DDL from running against the tables while the dump takes place. If a dump is run at the point that other DDL are already running, then the dump will sit and wait. If you want to limit the waiting time you can do that by setting the –-lock-wait-timeout option. pg_dump allows you to make a selective backup of tables. The -t option also allows you to specify views and sequences. There's no way to dump other object types individually using pg_dump. You can use some supplied functions to extract individual snippets of information available at the following website: https://www.postgresql.org/docs/9.0/static/functions-info.html#FUNCTIONS-INFO-CATALOG-TABLE pg_dump works against earlier releases of PostgreSQL, so it can be used to migrate data between releases. pg_dump doesn't generally handle included modules very well. pg_dump isn't aware of additional tables that have been installed as part of an additional package, such as PostGIS or Slony, so it will dump those objects as well. That can cause difficulties if you then try to restore from the backup, as the additional tables may have been created as part of the software installation process in an empty server. There's more... What time was the pg_dump taken? The snapshot for a pg_dump is taken at the beginning of a run. The file modification time will tell you when the dump finished. The dump is consistent at the time of the snapshot, so you may want to know that time. If you are making a script dump, you can do a dump verbose as follows: pg_dump -v which then adds the time to the top of the script. Custom dumps store the start time as well and that can be accessed using the following: pg_restore --schema-only -v dumpfile | head | grep Started -- Started on 2010-06-03 09:05:46 BST See also Note that pg_dump does not dump the roles (such as users/groups) and tablespaces. Those two things are only dumped by pg_dumpall; see the next recipes for more detailed descriptions.
Read more
  • 0
  • 0
  • 3214
article-image-this-week-on-packt-hub-11-may-2018
Aarthi Kumaraswamy
11 May 2018
3 min read
Save for later

This week on Packt Hub – 11 May 2018

Aarthi Kumaraswamy
11 May 2018
3 min read
May’s continues on a high note. Plenty of big announcements and major new releases announced in two of the biggest events in the tech world: Google I/O, Microsoft Build and PyCon. Read about them and more in our tech news section. Here’s what you may have missed in the last 7 days – Tech news, insights and tutorials… Tech news Conferences in focus this week Top 5 Google I/O 2018 conference Day 1 Highlights What we learned from Qlik Qonnections 2018 Microsoft Build 2018 Day 1: Azure meets Artificial Intelligence Data news in depth Microsoft Open Sources ML.NET, a cross-platform machine learning framework Linux Foundation launches the Acumos Al Project to make AI accessible Nvidia’s Volta Tensor Core GPU hits performance milestones. But is it the best? Development & programming news in depth Google’s Android Things, developer preview 8: First look Put your game face on! Unity 2018.1 is now available What’s new in Vapor 3, the popular Swift based web framework Xamarin Forms 3, the popular cross-platform UI Toolkit, is here! Windows 10 IoT Core: What you need to know Google Daydream powered Lenovo Mirage solo hits the market GCC 8.1 Standards released! Google open sources Seurat to bring high precision graphics to Mobile VR Cloud & networking news in depth What to expect from vSphere 6.7 What’s new in Wireshark 2.6? Get DevOps eBooks and videos while supporting charity Microsoft’s Azure Container Service (ACS) is now Azure Kubernetes Services (AKS) Red Hat Enterprise Linux 7.5 (RHEL 7.5) now generally available Kali Linux 2018.2 released Tutorials Data tutorials Getting Started with Automated Machine Learning (AutoML) Analyzing CloudTrail Logs using Amazon Elasticsearch Tensor Processing Unit (TPU) 3.0: Google’s answer to cloud-ready Artificial Intelligence Distributed TensorFlow: Working with multiple GPUs and servers Implementing 3 Naive Bayes classifiers in scikit-learn Development & programming tutorials Web development tutorials Getting started with Angular CLI and build your first Angular Component How to implement Internationalization and localization in your Node.js app Programming tutorials How to install and configure TypeScript NativeScript: What is it, and how to set it up Building functional programs with F# Applying Single Responsibility principle from SOLID in .NET Core Unit Testing in .NET Core with Visual Studio 2017 for better code quality Cloud & Networking tutorials How to secure a private cloud using IAM How to create your own AWS CloudTrail How to secure ElasticCache in AWS This week’s opinions, analysis, and insights Data Insights Google News’ AI revolution strikes balance between personalization and the bigger picture Why Drive.ai is going to struggle to disrupt public transport 6 reasons to choose MySQL 8 for designing database solutions Are Recurrent Neural Networks capable of warping time? Development & Programming Insights 8 recipes to master Promises in ECMAScript 2018 Forget C and Java. Learn Kotlin: the next universal programming language
Read more
  • 0
  • 0
  • 3214

article-image-content-rules-syndication-and-advanced-features-plone-3-intranet
Packt
27 Jul 2010
8 min read
Save for later

Content Rules, Syndication, and Advanced Features of Plone 3 Intranet

Packt
27 Jul 2010
8 min read
(For more resources on Plone, see here.) Content rules Plone features a usability layer around Zope's event system, allowing plain users to create rules tied to the most used event handlers. These rules are composed of tasks that get triggered when an event is raised in our site. Content rules are defined site-wide in the Content rules configlet, and they are available for use in any folderish object in our site. Once the rule is created, it can be locally assigned to any folder object in the site. Rules play a very important role in intranets. We can use them as a mechanism for notification, and they also help in adding dynamism to our intranet. One of the most demanded features in an intranet is the ability to be aware when content is added, changed, or even deleted. The notification of this change to the users can be achieved via content rules assigned strategically, or by user demand in any folder or intranet application, such as forums or in a blog. We can use content types to help us model some of our corporate processes or daily tasks. Move or copy objects to other folders (done by users), just in case some of our processes require this kind of an action. We can find other interesting uses of content rules in our intranet, such as executing an action when a state transition is triggered. All these actions can be carried out programmatically, but the power of content rules lie in that they can be executed thorough the Plone UI and by any experienced user. We can access the manage rules form via the Rules tab in any folder. If we don't have any rules created, the form will address us to create them in the content rules configlet. This control panel configlet will aid us to create and manage content rules of our site: The form is divided into two parts. The first is dedicated to global settings applied to all rules. In this version, there is only one setting in this category to enable and disable the rules in the whole site. If deselected, the whole rule system is disabled and no rules will be executed in the site. The other part of the form is reserved for the rule management interface. Here we can find the already created rules, manage them, and create new ones. We can display them by type using the selector on the right. Adding a new rule Click on the Add content rule button. It will open a new form with the following fields: Title: Title of the rule. Description: Summary of the rule. Triggering event: Starts the execution of the rule. Enabled: Whether or not this rule is enabled. Stop executing rules: Defines if the engine should continue the execution of other rules. It is useful if we assign several rules to a container and the execution of a particular rule excludes any other rule execution. By default, these are the available events: Object added to this container Object modified Object removed from this container Workflow state changed After creating one rule at least, the configlet will let us manage the existing rules, allowing us to perform the standard edit, delete, enable, and disable actions. But this is only the first step. We've created the rule and assigned an event to it. Now it's time to configure the task, which the rule will perform. There are two items to configure—conditions and actions. We can add as many conditions as we want to, and modify the order in which they can be applied. We can add the following types of conditions: Content type: Apply the rule only if an object of this type has triggered the event File extension: Execute the action only if a file content type that has this extension has triggered the event Workflow state: Apply only if a content type in the workflow state specified has triggered the event User's group: Execute only if a user member of a specific group triggers the event User's role: Same as User's group, but by a user having a specific role in that context The actions that a rule can execute are limited but they cover the most useful use cases: Logger: Output a message to the message system log Notify user: Notify the user via a status message Copy to folder: The object that triggers the event is copied to the specified folder Move to folder: The object that triggers the event is moved to the specified folder Delete object: The object that triggers the event is deleted Transition workflow state: An attempt to change workflow of the object that triggers the event via the specified transition Send e-mail: Send e-mail to a specific user By default, only managers can define and apply new content rules, but we can allow more user roles to access their creation. Assigning rules to folderish objects Once the rule is created, we can assign them to any of Plone's folderish content types. Just go to any folderish object and click on the Rules tab. Just use the drop-down box Assign rule here to choose from the available rules and click on Add. We can review what rules are assigned in this container and manage them as well. We can enable, disable, and choose whether to apply them to subfolders or only to current folders, and of course, unassign them. Making any content type rule aware All folderish default content types of Plone are content rule aware. However, not all third-party content types are content rule aware. This is because either they are old or simply do not enable this feature in the content type declaration. In the case of third-party content types, which are not content rule aware, we can enable their awareness by following these instructions: Add an object of the desired content type anywhere in our site, if we haven't created it yet. Find it in the ZMI and access the Interfaces tab. Once there, find the interface plone.contentrules. engine.interfaces.IRuleAssignable in the Available marker interfaces fieldset. Select it and click on the Add button. By doing so, we are assigning an additional marker interface to that content type, which will enable (mark) this instance of the content type (that is, make it aware of the content rule). From this moment onwards, the selected object will have available the Rules tab, and in consequence, we can assign rules to it. Syndication Plone has always paid special attention to syndication, making its folderish content types syndicable. Collections export their contents automatically in a view that all collections have—RSS view. But we can also enable syndication for single folders on our site. Using RSS feeds in our intranet is the recommended approach for keeping our users posted about the changes in syndicated folders, if they are collections or plain folders. Enabling folder syndication For enabling syndication for a particular folder, we need to access the view, synPropertiesForm, from the folder we want to be syndicable. For example, if we want to access this view in the ITStaff folder, we should browse the URL: http://localhost:8080/intranet/ITStaff/synPropertiesForm This view is hidden by default, although we can make it visible in order to allow users to enable folder syndication by themselves. We can make it visible by accessing the portal_actions tool in the ZMI. Go to the object action category and choose syndication. Then just make this action visible by enabling the Visible attribute and choose who will be able to access this view by selecting the item permissions in the Permissions selection box. Once in the synPropertiesForm form, we should click on the Enable syndication button. Then another form is shown to allow us to configure how the publication of the feed will be performed. Following are the syndication details available: Update period: How often the feed will be updated Update frequency: How many times the update will occur inside the period specified in the previous field Update base: When the update will take place Maximum items: How many items the feed will show Accessing a secure RSS feed Syndication was conceived to access information from public resources. Inside an intranet, it will be very common that the folder we want to enable for syndication will be not published, and in consequence, the feed associated will be private. The problem is that there are few feed readers that support feed authentication and even using them. We will have to enable HTTP authentication in our site's PAS configuration, which is not recommended. So we propose two workarounds. We can use a feed enabled browser to browse our intranet and our feeds as well. With this approach, if we are logged in, then we will have access to authenticated feeds. Firefox and Internet Explorer already have this feature. The second approach is to have a special workflow state for the syndicated folders inside our site for being accessible without authentication as anonymous users. Obviously this workaround will make the folder content visible to anonymous users, and it's not an option when privacy of the contained information is a must.
Read more
  • 0
  • 0
  • 3211
Modal Close icon
Modal Close icon