Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7018 Articles
article-image-setting-single-width-column-system-simple
Packt
05 Sep 2013
3 min read
Save for later

Setting up a single-width column system (Simple)

Packt
05 Sep 2013
3 min read
(For more resources related to this topic, see here.) Getting ready To perform the steps listed in this article, we will need a text editor, a browser, and a copy of the Masonry plugin. Any text editor will do, but my browser of choice is Google Chrome, as the V8 JavaScript engine that ships with it generally performs better and supports CSS3 transitions, and as a result we see smoother animations when resizing the browser window. We need to make sure we have a copy of the most recent version of Masonry, which was Version 2.1.08 at the time of writing this article. This version is compatible with the most recent version of jQuery, which is Version 1.9.1. A production copy of Masonry can be found on the GitHub repository at the following address: https://github.com/desandro/masonry/blob/master/jquery.masonry.min.js For jQuery, we will be using a content delivery network (CDN) for ease of development. Open the basic single-column HTML file to follow along. You can download this file from the following location: http://www.packtpub.com/sites/default/files/downloads/1-single-column.zip How to do it... Set up the styling for the masonry-item class with the proper width, padding, and margins. We want our items to have a total width of 200 pixels, including the padding and margins. <style> .masonry-item { background: #FFA500; float: left; margin: 5px; padding: 5px; width: 180px; }</style> Set up the HTML structure on which you are going to use Masonry. At a minimum, we need a tagged Masonry container with the elements inside tagged as Masonry items. <div id='masonry-container'> <div class='masonry-item '> Maecenas faucibus mollis interdum. </div> <div class='masonry-item '> Maecenas faucibus mollis interdum. Donec sed odio dui. Nullamquis risus eget urna mollis ornare vel eu leo. Vestibulum idligula porta felis euismod semper. </div> <div class='masonry-item '> Nullam quis risus eget urna mollis ornare vel eu leo. Crasjusto odio, dapibus ac facilisis in, egestas eget quam. Aeneaneu leo quam. Pellentesque ornare sem lacinia quam venenatisvestibulum. </div></div> All Masonry options need not be included, but it is recommended (by David DeSandro, the creator of Masonry) to set itemSelector for single-column usage. We will be setting this every time we use Masonry. <script> $(function() { $('#masonry-container').masonry({ // options itemSelector : '.masonry-item', }); });</script> How it works... Using jQuery, we select our Masonry container and use the itemSelector option to select the elements that will be affected by Masonry. The size of the columns will be determined by the CSS code. Using the box model, we set our Masonry items to a width of 90 px (80-px wide, with a 5-px padding all around the item). The margin is our gutter between elements, which is also 5-px wide. With this setup, we can con firm that we have built the basic single-column grid system, with each column being 100-px wide. The end result should look like the following screenshot: Summary This article showed you how to set up the very basic Masonry single-width column system around which Masonry revolves. Resources for Article : Further resources on this subject: Designing Site Layouts in Inkscape [Article] New features in Domino Designer 8.5 [Article] Using jQuery and jQueryUI Widget Factory plugins with RequireJS [Article]
Read more
  • 0
  • 0
  • 1852

article-image-using-indexes-manipulate-pandas-objects
Packt
05 Sep 2013
4 min read
Save for later

Using indexes to manipulate pandas objects

Packt
05 Sep 2013
4 min read
(For more resources related to this topic, see here.) Getting ready A good understanding of indexes in pandas is crucial to quickly move the data around. From a business intelligence perspective, they create a distinction similar to that of metrics and dimensions in an OLAP cube. To illustrate this point, this recipe walks through getting stock data out of pandas, combining it, then reindexing it for easy chomping. How to do it... Use the DataReader object to transfer stock price information into a DataFrame and to explore the basic axis of Panel. > from pandas.i git push -u origin master o.data import DataReader > tickers = ['gs', 'ibm', 'f', 'ba', 'axp'] > dfs = {} > for ticker in tickers: dfs[ticker] = DataReader(ticker, "yahoo", '2006-01-01') # a yet undiscussed data structure, in the same way the a # DataFrame is a collection of Series, a Panel is a collection of # DataFrames > pan = pd.Panel(dfs) > pan <class 'pandas.core.panel.Panel'> Dimensions: 5 (items) x 1764 (major_axis) x 6 (minor_axis)Items axis: axp to ibm Major_axis axis: 2006-01-03 00:00:00 to 2013-01-04 00:00:00 Minor_axis axis: Open to Adj Close > pan.items Index([axp, ba, f, gs, ibm], dtype=object) > pan.minor_axis Index([Open, High, Low, Close, Volume, Adj Close], dtype=object) > pan.major_axis <class 'pandas.tseries.index.DatetimeIndex'>[2006-01-03 00:00:00, ..., 2013-01-04 00:00:00] Length: 1764, Freq: None, Timezone: None Use the axis selectors to easily compute different sets of summary statistics. > pan.minor_xs('Open').mean() axp 46.227466 ba 70.746451 f 9.135794 gs 151.655091 ibm 129.570969 # major axis is sliceable as well > day_slice = pan.major_axis[1] > pan.major_xs(day_slice)[['gs', 'ba']] ba gs Open 70.08 127.35 High 71.27 128.91 Low 69.86 126.38 Close 71.17 127.09 Volume 3165000.00 4861600.00 Adj Close 60.43 118.12 Convert the Panel to a DataFrame. > dfs = [] > for df in pan: idx = pan.major_axis idx = pd.MultiIndex.from_tuples(zip([df]*len(idx), idx)) idx.names = ['ticker', 'timestamp'] dfs.append(pd.DataFrame(pan[df].values, index=idx, columns=pan.minor_axis)) > df = pd.concat(dfs) > df Data columns: Open 8820 non-null values High 8820 non-null values Low 8820 non-null values Close 8820 non-null values Volume 8820 non-null values Adj Close 8820 non-null values dtypes: float64(6) Perform the analogous operations as in the preceding examples on the newly created DataFrame. # selecting from a MultiIndex isn't much different than the Panel # (output muted) > df.ix['gs':'ibm'] > df['Open'] How it works... The previous example was certainly contrived, but when indexing and statistical techniques are incorporated, the power of pandas begins to come through. Statistics will be covered in an upcoming recipe. pandas' indexes by themselves can be thought of as descriptors of a certain point in the DataFrame. When ticker and timestamp are the only indexes in a DataFrame, then the point is individualized by the ticker, timestamp, and column name. After the point is individualized, it's more convenient for aggregation and analysis. There's more... Indexes show up all over the place in pandas so it's worthwhile to see some other use cases as well. Advanced header indexes Hierarchical indexing isn't limited to rows. Headers can also be represented by MultiIndex, as shown in the following command line: > header_top = ['Price', 'Price', 'Price', 'Price', 'Volume', 'Price'] > df.columns = pd.MultiIndex.from_tuples(zip(header_top, df.columns) Performing aggregate operations with indexes As a prelude to the following sections, we'll do a single groupby function here since they work with indexes so well. > df.groupby(level=['tickers', 'day'])['Volume'].mean() This answers the question for each ticker and for each day (not date), that is, what was the mean volume over the life of the data. Summary This article talks about the use and importance of indexes in pandas. It also talks about different operations that can be done with indexes. Resources for Article : Further resources on this subject: Installing Panda3D [Article] Setting Up Panda3D and Configuring Development Tools [Article] Collision Detection and Physics in Panda3D Game Development [Article]
Read more
  • 0
  • 0
  • 3109

article-image-cocos2d-x-installation
Packt
05 Sep 2013
10 min read
Save for later

Cocos2d-x: Installation

Packt
05 Sep 2013
10 min read
(For more resources related to this topic, see here.) Download and installation All the examples in this article were developed on a Mac using Xcode. Although you can use Cocos2d-x to develop your games for other platforms, using different systems, the examples will focus on iOS and Mac. Xcode is free and can be downloaded from the Mac App store (https://developer.apple.com/xcode/index.php), but in order to test your code on an iOS device and publish your games, you will need a developer account with Apple, which will cost you USD 99 a year. You can find more information on their website: https://developer.apple.com/ So, assuming you have an internet connection, and that Xcode is ready to rock, let's begin! Time for action – downloading and installing Cocos2d-x We start by downloading the framework: Go to http://download.cocos2d-x.org/ and download the latest stable version of Cocos2d-x. For this article I'll be using version Cocos2d-2.0-x-2.0.4, which means the 2.0.4 C++ port of version 2.0 of Cocos2d. Uncompress the files somewhere on your machine. Open Terminal and type cd (that is cd and a space). Drag the uncompressed folder you just downloaded to the Terminal window. You should see the path to the folder added to the command line. Hit returnto go to that folder in Terminal. Now type: sudo ./install-templates-xcode.sh -u Hit return again and you're done. What just happened? You have successfully installed the Cocos2d-x templates in your machine. With these in place, you can select the type of Cocos2d-x application you wish to build inside Xcode, and the templates will take care of copying all the necessary files into your application. Next, open Xcode and select Create a new Xcode Project.You should see something like this: So let's build our first application. Hello-x World-x Let's create that old chestnut in computer programming: the hello world example. Time for action – creating an application Open Xcode and select File | New | Project... and follow these steps: In the dialogue box select cocos2d-x under the iOS menu and choose the cocos2dx template. Hit Next . Give the application a name, but not HelloWorld. I'll show you why in a second. You will be then asked to select a place to save the project and you are done. Once your application is ready, click Run to build it. After that, this is what you should see in the simulator: When you run a cocos2d-x application in Xcode it is quite common for the program to post some warnings regarding your code, or most likely the frameworks. These will mostly reference deprecated methods, or statements that do not precisely follow more recent, and stricter rules of the current SDK. But that's okay. These warnings, though certainly annoying, can be ignored. What just happened? You created your first Cocos2d-x application using the cocos2dx template, sometimes referred to as the basic template. The other template options include one with Box2D, one with Chipmunk (both related to physics simulation), one with JavaScript, and one with Lua. The last two options allow you to code some or all of your game using those script languages instead of the native C++; and they work just as you would expect a scripting language to work, meaning the commands written in either Javascript or Lua are actually replaced and interpreted as C++ commands by the compiler. Now if you look at the files created by the basic template you will see a HelloWorldScene class file. That's the reason I didn't want you to call your application HelloWorld, because I didn't want you to have the impression that the file name was based on your project name. It isn't. You will always get a HelloWorldScene file unless you change the template itself. Now let's go over the sample application and its files: The folder structure First you have the Resources folder, where you find the images used by the application. The ios folder has the necessary underlying connections between your app and iOS. For other platforms, you will have their necessary linkage files in separate folders targeting their respective platform (like an android folder the Android platform, for instance.) In the libs folder you have all the cocos2dx files, plus CocosDenshion files (for sound support) and a bunch of other extensions. Using a different template for your projects will result in a different folder structure here, based on what needs to be added to your project. So you will see a Box2D folder, for example, if you choose the Box2D template. In the Classes folder you have your application. In here, everything is written in C++ and this is the home for the part of your code that will hopefully not need to change, however many platforms you target with your application. Now let us go over the main classes of the basic application. The iOS linkage classes AppController and RootViewController are responsible for setting up OpenGL in iOS as well as telling the underlying operating system that your application is about to say Hello... To the World. These classes are written with a mix of Objective-C and C++, as all the nice brackets and the .mm extensions show. You will change very little if anything in these classes; and again that will reflect in changes to the way iOS handles your application. So other targets would require the same instructions or none at all depending on the target. In AppController for instance, I could add support for multitouch. And in RootViewController, I could limit the screen orientations supported by my application. The AppDelegate class This class marks the first time your C++ app will talk to the underlying OS. It attempts to map the main events that mobile devices wants to dispatch and listen to. From here on, all your application will be written in C++ (unless you need something else). In AppDelegate you should setup CCDirector (the cocos2d-x all powerful singleton manager object) to run your application just the way you want. You can: Get rid of the application status information Change the frame rate of your application Tell CCDirector where your high definition images are, and where your standard definition images are, as well as which to use You can change the overall scale of your application to suit different screens The AppDelegate class is also the best place to start any preloading process And, most importantly, it is here you tell the CCDirector object what CCScene to begin your application with Here too you will handle what happens to your application if the OS decides to kill it, push it aside, or hang it upside down to dry. All you need to do is place your logic inside the correct event handler: applicationDidEnterBackground or applicationWillEnterForeground. The HelloWorldScene class When you run the application you get a screen with the words Hello World and a bunch of numbers in one corner. These are the display stats you decided you wanted around in the AppDelegate class. The actual screen is created by the oddly named HelloWorldScene class. It is a Layer class that creates its own scene (don't worry if you don't know what a Layer class is, or a Scene class, you will soon enough). When it initializes, HelloWorldScene puts a button on screen that you can press to exit the application. The button is actually a Menu item, part of a Menu group consisting of one button, two image states for that button, and one callback event, triggered when the said button is pressed. The Menu group automatically handles touch events targeting its members, so you don't get to see any of that code floating about. There is also the necessary Label object to show the Hello World message and the background image. Who begets whom If you never worked with either Cocos2d or Cocos2d-x before, the way the initial scene() method is instantiated may lead to dizziness. To recap, in AppDelegate you have: CCScene *pScene = HelloWorld::scene(); pDirector->runWithScene(pScene); CCDirector needs a CCScene object to run, which you can think of as being your application, basically. CCScene needs something to show, which in this case is a CCLayer class. CCScene is then said to contain a CCLayer class. Here a CCScene object is created through a static method scene inside a CCLayer derived class. So the layer creates the scene, and the scene immediately adds the layer to itself. Huh? Relax. This incestuous-like instantiation will most likely only happen once, and you have nothing to do with it when it happens. So you can easily ignore all these funny goings-on and look the other way. I promise instantiations will be much easier after this first one. Further information Follow these steps to access one of the best sources for reference material on Cocos2d-x: its Test project. Time for action – running the test samples You open the test project just like you would do for any other Xcode project: Go inside the folder you downloaded for the framework, and navigate to samples/TestCpp/proj.ios/TestCpp.xcodeproj. Open that project in Xcode. When you run the project, you will see inside the simulator a long list of tests, all nicely organized by topic. Pick any one to review. Better yet, navigate to samples/TestCpp/Classes and if you have a program like TextWrangler or some equivalent, you can open that entire directory inside a Disk Browser window and have all that information ready for referencing right at your desktop. What just happened? With the test samples you can visualize most features in Cocos2d-x and see what they do, as well as some of the ways you can initialize and customize them. I will refer to the code found in the tests quite often. As usual with programming, there is always a different way to accomplish a given task, so sometimes after showing you one way, I'll refer to another one that you can find (and by then easily understand) inside the Test classes. The other tools Now comes the part where you may need to spend a bit more money to get some extremely helpful tools. In this articles examples I use four of them: A tool to help build sprite sheets: I'll use Texture Packer (http://www.codeandweb.com/texturepacker). There are other alternatives, like Zwoptex (http://zwopple.com/zwoptex/). And they usually offer some features for free. A tool to help build particle effects: I'll use Particle Designer (http://www.71squared.com/en/particledesigner). Depending on your operating system you may find free tools online for this. Cocos2d-x comes bundled with some common particle effects that you can customize. But to do it blindly is a process I do not recommend. A tool to help build bitmap fonts: I'll use Glyph Designer (http://www.71squared.com/en/glyphdesigner). But there are others: bmGlyph (which is not as expensive), FontBuilder (which is free). It is not extremely hard to build a Bitmap font by hand, not nearly as hard as building a particle effect from scratch, but doing it once is enough to convince you to get one of these tools fast. A tool to produce sound effects: No contest. cfxr for Mac or the original sfxr for Windows. Both are free (http://www.drpetter.se/project_sfxr.html and http://thirdcog.eu/apps/cfxr respectively). Summary You just learned how to install Cocos2d-x templates and create a basic application. You also learned enough of the structure of a basic Cocos2d-x application to get started to build your first game. Resources for Article: Further resources on this subject: Getting Started With Cocos2d [Article] Cocos2d: Working with Sprites [Article] Cocos2d for iPhone: Surfing Through Scenes [Article]
Read more
  • 0
  • 0
  • 8926

article-image-so-what-powershell-30-wmi
Packt
05 Sep 2013
5 min read
Save for later

So, what is PowerShell 3.0 WMI?

Packt
05 Sep 2013
5 min read
(For more resources related to this topic, see here.) Microsoft created Windows Management Instrumentation ( WMI ) as a management layer to the Windows operating system. This management layer allows you to retrieve information pertaining to the operating system or physical hardware on a system. It also allows you to manipulate components within the operating system. A good example for this section is a hard drive. WMI provides the ability to view the physical hard drive as well as the logical components of the hard drive. Using a call to the Win32_diskdrive class, you have the ability to view the physical aspects of the disk drive, such as the tracks, sectors, manufacturer, and even the serial number. The win32_logicaldisk class provides you with the ability to see the logical aspects of a drive, such as partitions, free space, and volume names. Not limited to just Windows operating system management, WMI also has the extensibility to allow third-party developers to create WMI providers for use within their projects. This means that you can create the management hooks within your code that will allow remote management through a common standardized framework. Many companies have adopted the use of WMI providers for items such as storage subsystems, application virtualization, hardware virtualization, and enhanced management of the hardware connected to a workstation or server system. Microsoft chose to follow the Common Information Model ( CIM ) industry standard for WMI. The preceding diagram takes a simplified look at Microsoft's implementation of WMI through the use of the CIM standard. There are three layers to their model, which are as follows: WMI consumers: The WMI consumers are exactly what their name states. They consume the available APIs to access the managed component. The WMI consumers are the real users of the C/C++ and .NET clients, and they use scripting languages, such as PowerShell 3.0, to access management data and interact with the managed components. In the hard drive example, the WMI consumer is the PowerShell code that calls information about the hard drive. This would look as follows: get-wmiobject –class win32_logicaldisk WMI infrastructure: The WMI infrastructure includes the CIM Object Manager ( CIMOM ), which stores a repository of the available WMI providers. If a third-party WMI provider doesn't register with the CIM Object Manager, the Windows operating system will not be able to manage the component through WMI. In the hard drive example, the CIMOM will import the Managed Object Format ( MOF) file of the hard drive into the WMI repository. This will register the hard drive's available WMI properties and methods into use by a WMI consumer. WMI providers: The last components, WMI providers, are made up of a driver (DLL) and a MOF file. These two components are responsible for returning the management data to the WMI consumer, through the WMI infrastructure. This allows the WMI consumer to interact with the managed components. In the hard drive example, the WMI consumer will access the hard drive through the WMI infrastructure utilizing the hard drive driver (DLL). The WMI consumer will have the ability to retrieve the information pertaining to the physical components on that hard drive. WMI integration with PowerShell 3.0 PowerShell has the ability to interact with WMI through the use of the built-in cmdlets. These cmdlets act as the WMI consumers and interact with the WMI. As WMI evolved with the release of new operating systems, PowerShell also needed to evolve in parallel to manage those systems. With the release of Windows 8 and Windows Server 2012, Microsoft created a new iteration of the Microsoft Windows Management Framework ( WMF ), Version 3.0. The new release of Windows Management Framework updates WMI to Version 3.0, PowerShell to Version 3.0, and installs the new Windows Remote Management ( WinRM ), OData IIS Extensions, and Server Manager CIM Provider. PowerShell 3.0 includes new WMI management cmdlets, displayed in the following table, which leverage the use of the new functionality within Windows Management Framework 3.0. The new CIM cmdlets provide a richer WMI experience leveraging stateful communications to the remote systems. They also provide the ability to create CIM calls through PowerShell 3.0 to non-Windows-based WMI systems that support Web Services-Management ( WSman ). This provides engineers the ability to tap into a variety of systems for management purposes. Get-CimAssociatedInstance New-CimSession Get-CimClass New-CimSessionOption Get-CimInstance Register-CimIndicationEvent Get-CimSession Remove-CimInstance Invoke-CimMethod Remove-CimSession New-CimInstance Set-CimInstance Using PowerShell in your environment PowerShell is quickly becoming the de-facto standard for managing the Windows-based systems in small and large organizations. It is being used for tasks, such as automated software deployments, dynamic system provisioning, and system maintenance. It is also being heavily used in Microsoft products, such as System Center 2012 - Service Manager, that allow for business process automation. PowerShell 3.0 has introduced a variety of new cmdlets that further simplify the administration of systems through the use of WMI. Whatever the administrative task, the PowerShell community has several examples of the ways to manage systems through the use of the WMI consumers. The examples provided in this article are just the tip of the iceberg compared to what you can accomplish utilizing PowerShell 3.0 and Windows Management Instrumentation. You may be surprised by the number of manual administration steps that can be automated by creating PowerShell scripts. The sky is the limit when it comes to scripting with PowerShell! Summary In this article, we learned what Windows Management Instrumentation (WMI) is, how PowerShell 3.0 utilizes it, and why it's applicable to systems engineers. Resources for Article : Further resources on this subject: Using the Windows Azure Platform PowerShell Cmdlets [Article] Accessing Oracle [Article] Inventorying Servers with PowerShell [Article]
Read more
  • 0
  • 0
  • 9339

article-image-chef-infrastructure
Packt
05 Sep 2013
10 min read
Save for later

Chef Infrastructure

Packt
05 Sep 2013
10 min read
(For more resources related to this topic, see here.) First, let's talk about the terminology used in the Chef universe. A cookbook is a collection of recipes – codifying the actual resources, which should be installed and configured on your node – and the files and configuration templates needed. Once you've written your cookbooks, you need a way to deploy them to the nodes you want to provision. Chef offers multiple ways for this task. The most widely used way is to use a central Chef Server. You can either run your own or sign up for Opscode's Hosted Chef. The Chef Server is the central registry where each node needs to get registered. The Chef Server distributes the cookbooks to the nodes based on their configuration settings. Knife is Chef's command-line tool called to interact with the Chef Server. You use it for uploading cookbooks and managing other aspects of Chef. On your nodes, you need to install Chef Client – the part that retrieves the cookbooks from the Chef Server and executes them on the node. In this article, we'll see the basic infrastructure components of your Chef setup at work and learn how to use the basic tools. Let's get started with having a look at how to use Git as a version control system for your cookbooks. Using version control Do you manually back up every file before you change it? And do you invent creative filename extensions like _me and _you when you try to collaborate on a file? If you answer yes to any of the preceding questions, it's time to rethink your process. A version control system (VCS) helps you stay sane when dealing with important files and collaborating on them. Using version control is a fundamental part of any infrastructure automation. There are multiple solutions (some free, some paid) for managing source version control including Git, SVN, Mercurial, and Perforce. Due to its popularity among the Chef community, we will be using Git. However, you could easily use any other version control system with Chef. Getting ready You'll need Git installed on your box. Either use your operating system's package manager (such as Apt on Ubuntu or Homebrew on OS X), or simply download the installer from www.git-scm.org. Git is a distributed version control system. This means that you don't necessarily need a central host for storing your repositories. But in practice, using GitHub as your central repository has proven to be very helpful. In this article, I'll assume that you're using GitHub. Therefore, you need to go to github.com and create a (free) account to follow the instructions given in this article. Make sure that you upload your SSH key following the instructions at https://help.github.com/articles/generating-ssh-keys, so that you're able to use the SSH protocol to interact with your GitHub account. As soon as you've created your GitHub account, you should create your repository by visiting https://github.com/new and using chef-repo as the repository name. How to do it... Before you can write any cookbooks, you need to set up your initial Git repository on your development box. Opscode provides an empty Chef repository to get you started. Let's see how you can set up your own Chef repository with Git using Opscode's skeleton. Download Opscode's skeleton Chef repository as a tarball: mma@laptop $ wget http://github.com/opscode/chef-repo/tarball/master...TRUNCATED OUTPUT...2013-07-05 20:54:24 (125 MB/s) - 'master' saved [9302/9302] Extract the downloaded tarball: mma@laptop $ tar zvf master Rename the directory. Replace 2c42c6a with whatever your downloaded tarball contained in its name: mma@laptop $ mv opscode-chef-repo-2c42c6a/ chef-repo Change into your newly created Chef repository: mma@laptop $ cd chef-repo/ Initialize a fresh Git repository: mma@laptop:~/chef-repo $ git init .Initialized empty Git repository in /Users/mma/work/chef-repo/.git/ Connect your local repository to your remote repository on github.com. Make sure to replace mmarschall with your own GitHub username: mma@laptop:~/chef-repo $ git remote add origin git@github.com:mmarschall/chef-repo.git Add and commit Opscode's default directory structure: mma@laptop:~/chef-repo $ git add .mma@laptop:~/chef-repo $ git commit -m "initial commit"[master (root-commit) 6148b20] initial commit10 files changed, 339 insertions(+), 0 deletions(-)create mode 100644 .gitignore...TRUNCATED OUTPUT...create mode 100644 roles/README.md Push your initialized repository to GitHub. This makes it available to all your co-workers to collaborate on it. mma@laptop:~/chef-repo $ git push -u origin master...TRUNCATED OUTPUT...To git@github.com:mmarschall/chef-repo.git* [new branch] master -> master How it works... You've downloaded a tarball containing Opscode's skeleton repository. Then, you've initialized your chef-repo and connected it to your own repository on GitHub. After that, you've added all the files from the tarball to your repository and committed them. This makes Git track your files and the changes you make later. As a last step, you've pushed your repository to GitHub, so that your co-workers can use your code too. There's more... Let's assume you're working on the same chef-repo repository together with your co-workers. They cloned your repository, added a new cookbook called other_cookbook, committed their changes locally, and pushed their changes back to GitHub. Now it's time for you to get the new cookbook down to your own laptop. Pull your co-workers, changes from GitHub. This will merge their changes into your local copy of the repository. mma@laptop:~/chef-repo $ git pull From github.com:mmarschall/chef-repo * branch master -> FETCH_HEAD ...TRUNCATED OUTPUT... create mode 100644 cookbooks/other_cookbook/recipes/default.rb In the case of any conflicting changes, Git will help you merge and resolve them. Installing Chef on your workstation If you want to use Chef, you'll need to install it on your local workstation first. You'll have to develop your configurations locally and use Chef to distribute them to your Chef Server. Opscode provides a fully packaged version, which does not have any external prerequisites. This fully packaged Chef is called the Omnibus Installer. We'll see how to use it in this section. Getting ready Make sure you've curl installed on your box by following the instructions available at http://curl.haxx.se/download.html. How to do it... Let's see how to install Chef on your local workstation using Opscode's Omnibus Chef installer: In your local shell, run the following command: mma@laptop:~/chef-repo $ curl -L https://www.opscode.com/chef/install.sh | sudo bashDownloading Chef......TRUNCATED OUTPUT...Thank you for installing Chef! Add the newly installed Ruby to your path: mma@laptop:~ $ echo 'export PATH="/opt/chef/embedded/bin:$PATH"'>> ~/.bash_profile && source ~/.bash_profile How it works... The Omnibus Installer will download Ruby and all the required Ruby gems into /opt/chef/embedded. By adding the /opt/chef/embedded/bin directory to your .bash_profile, the Chef command-line tools will be available in your shell. There's more... If you already have Ruby installed in your box, you can simply install the Chef Ruby gem by running mma@laptop:~ $ gem install chef. Using the Hosted Chef platform If you want to get started with Chef right away (without the need to install your own Chef Server) or want a third party to give you an Service Level Agreement (SLA) for your Chef Server, you can sign up for Hosted Chef by Opscode. Opscode operates Chef as a cloud service. It's quick to set up and gives you full control, using users and groups to control the access permissions to your Chef setup. We'll configure Knife, Chef's command-line tool to interact with Hosted Chef, so that you can start managing your nodes. Getting ready Before being able to use Hosted Chef, you need to sign up for the service. There is a free account for up to five nodes. Visit http://www.opscode.com/hosted-chef and register for a free trial or the free account. I registered as the user webops with an organization short-name of awo. After registering your account, it is time to prepare your organization to be used with your chef-repo repository. How to do it... Carry out the following steps to interact with the Hosted Chef: Navigate to http://manage.opscode.com/organizations. After logging in, you can start downloading your validation keys and configuration file. Select your organization to be able to see its contents using the web UI. Regenerate the validation key for your organization and save it as <your-organization-short-name>.pem in the .chef directory inside your chef-repo repository. Generate the Knife config and put the downloaded knife.rb into the .chef directory inside your chef-repo directory as well. Make sure you replace webops with the username you chose for Hosted Chef and awo with the short-name you chose for your organization: current_dir = File.dirname(__FILE__)log_level :infolog_location STDOUTnode_name "webops"client_key "#{current_dir}/webops.pem"validation_client_name "awo-validator"validation_key "#{current_dir}/awo-validator.pem"chef_server_url "https://api.opscode.com/organizations/awo"cache_type 'BasicFile'cache_options( :path => "#{ENV['HOME']}/.chef/checksums" )cookbook_path ["#{current_dir}/../cookbooks"] Use Knife to verify that you can connect to your hosted Chef organization. It should only have your validator client so far. Instead of awo, you'll see your organization's short-name: mma@laptop:~/chef-repo $ knife client listawo-validator How it works... Hosted Chef uses two private keys (called validators): one for the organization and the other for every user. You need to tell Knife where it can find these two keys in your knife.rb file. The following two lines of code in your knife.rb file tells Knife about which organization to use and where to find its private key: validation_client_name "awo-validator"validation_key "#{current_dir}/awo-validator.pem" The following line of code in your knife.rb file tells Knife about where to find your users' private key: client_key "#{current_dir}/webops.pem" And the following line of code in your knife.rb file tells Knife that you're using Hosted Chef. You will find your organization name as the last part of the URL: chef_server_url "https://api.opscode.com/organizations/awo" Using the knife.rb file and your two validators Knife can now connect to your organization hosted by Opscode. You do not need your own, self-hosted Chef Server, nor do you need to use Chef Solo in this setup. There's more... This setup is good for you if you do not want to worry about running, scaling, and updating your own Chef Server and if you're happy with saving all your configuration data in the cloud (under Opscode's control). If you need to have all your configuration data within your own network boundaries, you might sign up for Private Chef, which is a fully supported and enterprise-ready version of Chef Server. If you don't need any advanced enterprise features like role-based access control or multi-tenancy, then the open source version of Chef Server might be just right for you. Summary In this article, we learned about key concepts such as cookbooks, roles, and environments and how to use some basic tools such as Git, Knife, Chef Shell, Vagrant, and Berkshelf. Resources for Article: Further resources on this subject: Automating the Audio Parameters – How it Works [Article] Skype automation [Article] Cross-browser-distributed testing [Article]
Read more
  • 0
  • 0
  • 2688

article-image-using-linq-query-linqpad
Packt
05 Sep 2013
3 min read
Save for later

Using a LINQ query in LINQPad

Packt
05 Sep 2013
3 min read
(For more resources related to this topic, see here.) The standard version We are going to implement a simple scenario: given a deck of 52 cards, we want to pick a random number of cards, and then take out all of the hearts. From this stack of hearts, we will discard the first two and take the next five cards (if possible), and order them by their face value for display. You can try it in a C# program query in LINQPad: public static Random random = new Random();void Main(){ var deck = CreateDeck(); var randomCount = random.Next(52); var hearts = new Card[randomCount]; var j = 0; // take all hearts out for(var i=0;i<randomCount;i++) { if(deck[i].Suit == "Hearts") { hearts[j++] = deck[i]; } } // resize the array to avoid null references Array.Resize(ref hearts, j); // check that we have at least 2 cards. If not, stop if(hearts.Length <= 2) return; var count = 0; // check how many cards we can take count = hearts.Length - 2; // the most we need to take is 5 if(count > 5) { count = 5; } // take the cards var finalDeck = new Card[count]; Array.Copy(hearts, 2, finalDeck, 0, count); // now order the cards Array.Sort(finalDeck, new CardComparer()); // Display the result finalDeck.Dump();}public class Card{ public string Suit { get; set; } public int Value { get; set; }}// Create the cards' deckpublic Card[] CreateDeck(){ var suits = new [] { "Spades", "Clubs", "Hearts", "Diamonds" }; var deck = new Card[52]; for(var i = 0; i < 52; i++) { deck[i] = new Card { Suit = suits[i / 13], FaceValue = i-(13*(i/13))+1 }; } // randomly shuffle the deck for (var i = deck.Length - 1; i > 0; i--) { var j = random.Next(i + 1); var tmp = deck[j]; deck[j] = deck[i]; deck[i] = tmp; } return deck;}// CardComparer compare 2 cards against their face valuepublic class CardComparer : Comparer<Card>{ public override int Compare(Card x, Card y) { return x.FaceValue.CompareTo(y.FaceValue); }} Even if we didn't consider the CreateDeck() method, we had to do quite a few operations to produce the expected result (your values might be different as we are using random cards). The output is as follows: Depending on the data, LINQPad will add contextual information. For example, in this sample it will add the bottom row with the sum of all the values (here, only FaceValue). Also, if you click on the horizontal graph button, you will get a visual representation of your data, as shown in the following screenshot: This information is not always relevant but it can help you explore your data. Summary In this article we saw how LINQ queries can be used in LINQPad. The powerful query capabilitiesof LINQ has been utilized to the maximum in LINQPad. Resources for Article: Further resources on this subject: Displaying SQL Server Data using a Linq Data Source [Article] Binding MS Chart Control to LINQ Data Source Control [Article] LINQ to Objects [Article]
Read more
  • 0
  • 0
  • 2678
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-learning-musescore
Packt
04 Sep 2013
15 min read
Save for later

Learning MuseScore

Packt
04 Sep 2013
15 min read
(For more resources related to this topic, see here.) Entering notes In order to enter notes into our score, we need to enter Note Entry mode. MuseScore has various modes that we can use to accomplish special tasks. You can enter Note Entry mode by clicking on the N button in the toolbar. You can tell whether you are in Note Entry mode at any given time by checking whether the N button is depressed. You may also enter/exit Note Entry mode by pressing the N key. After you enter Note Entry mode, the quarter note should be selected by default. If you hover over the staff, you should see a light blue outline of a note appear. Clicking here will cause a quarter note of that pitch to be inserted. In the toolbar, you will see several notes of different lengths, such as half notes, eighth notes, and whole notes. This area is called the Note Entry toolbar, and indicates which note will be inserted when you click on the staff. Right now, the quarter note should be selected. Click on the half note, and then click an area of the staff on top of the rest that is immediately after the quarter note we just inserted. A half note of the pitch you chose will be added. In MuseScore, whenever we add notes, we must overwrite other notes. First, we overwrote a whole rest with a quarter note, which caused three beats of rest to be added after the quarter note. Then, we overwrote a quarter rest with a half note. Since the half note was longer than the quarter rest, it also overwrote one beat from the half rest following it, and changed the rest to a quarter rest to accommodate the size of the half note. To add an accidental, simply insert the note without the accidental, and then press the appropriate accidental button in the toolbar. For example, let's insert an F eighth note. We click on the eighth note button, then on the F line of the staff, and finally on the sharp button in the toolbar. We can insert dotted notes in a similar fashion by using the dot button on the Note Entry toolbar. In the next measure, let's add a G dotted quarter note by clicking on the quarter note in the Note Entry toolbar, then clicking on the dot button, and then clicking on a G in the staff. The dot will stay selected after you insert the note. If you would like to deselect the dot, you can click on it again. It is also automatically deselected when you change the note duration. Thus, you should always select the dot after you select the value of the note you would like to be dotted. It is possible to notate more quickly using keyboard shortcuts. The number keys 1 through 9 will select different durations, and the letters A through G will insert the designated note. The 0 key inserts a rest. Inserting notes this way will always insert the closest note with the desired pitch. If you hold Ctrl (or on Mac) while pressing the up or down arrow keys, MuseScore will move the last note you inserted up or down an octave. So, inserting a C half note and moving it up an octave can be accomplished by pressing the sequence 6 C Ctrl + ↑. Notes can be adjusted by a half step by pressing the up or down arrows without holding the Ctrl key. Hitting the up arrow will always create sharps, and the down arrow creates flats. This allows us to insert an F eighth note with the keystroke sequence 4 f ↑. While at first the keyboard shortcuts may seem complicated, as you get the hang of MuseScore, it is worthwhile to learn them. They will allow you to notate music extremely quickly and make your overall experience with MuseScore much more pleasurable. Making chords is also very straightforward. We just click on top of our previously inserted note after selecting a note of the same value. Be careful! If a different note length is selected, it will overwrite the previous note. Chords can also be inserted rapidly with keyboard shortcuts. Just start by inserting the first note of the chord normally. If you would like to insert a note of the chord above the previous note, hold Alt and press the interval above the previous note you would like to insert. To insert it below, hold Shift and do the same. Notes are always inserted in the present key signature. So to insert a C first inversion chord, press the sequence E Alt + 3 Alt + 4, or to insert a C second inversion chord, press the sequence G Alt + 4 Alt + 3. Alternatively, after inserting the first note, you can hold Shift and type the letter names of the notes to add to the chord. So pressing the sequence G Shift+C Shift+E would insert the same C second inversion chord. If you ever make a mistake, you can always undo your latest changes by going to the Edit menu and selecting Undo. You can also use the keyboard shortcut Ctrl + Z (or + Z on Mac). Let's put some notes and chords in some measures for both the trombone and piano parts so that we have something to work with. Inserting triplets To insert a triplet, first enter Note Entry mode. Then, from the Note Entry toolbar, choose the total duration that you would like all three triplets to sum to. Next, insert the first note of the triplet in the position you would like the triplet to occupy. After this, exit Note Entry mode, and from the Notes menu, under the Tuplets submenu, click on the Triplet option. A triplet will be created with the selected note as the first note. MuseScore will automatically enter Note Entry mode for you again, and select the correct duration of note needed to complete the triplet. From here, you can replace the two rests with notes by inserting the correct notes on top of them, as we did when we entered notes previously. Also, there is a keyboard shortcut to make this process easier. While in Note Entry mode, select the proper duration you would like the entire triplet to be, as before, but then hit Ctrl + 3 (or + 3 on Mac). The triplet will be inserted, and the proper note duration to fill in the triplet will be selected. You can now enter the notes of the triplet as you would enter normal notes. For instance, to insert a triplet arpeggio of an F major triad totaling one beat, we would press the sequence 5 Ctrl + 3 F A C. For a B major triad totaling two beats, we would similarly press 6 Ctrl + 3 B D ↑ F ↑. Inserting ties Ties are very easy to create in MuseScore. The simplest way to insert a tie is to insert both of the notes that you want to be tied together, exit Note Entry mode, click on the first note, and then click on the tie button in the toolbar, or press the + key. Make sure the two notes you are trying to tie together have the same pitch, or no tie will be inserted. This method works for individual notes, and also for chords. In order to have flexibility when tying chords, you must tie each note of the chord individually if you want the full chord to be tied. An easy way to do this is to ensure that you are not in Note Entry mode, hold Shift, click on the first note of the first chord so that the whole chord is selected, and press the + key. Again, for this to work, you must have two chords with identical pitches next to each other. If you are working with keyboard shortcuts, then there is also a faster way to enter ties that does not require the use of the mouse. After you enter a note in Note Entry mode, the note you just entered will be selected, and the cursor will be located on the right-hand side of this note, as shown in the following screenshot: Then, using the appropriate keyboard shortcut, select the duration of note you would like this note to be tied to. Finally, press the + key. MuseScore will insert a note of the selected duration tied to the previous note. So, pressing the sequence 5 C 4 + will insert a quarter note C tied to an eighth note. While this method is extremely convenient for single notes, it does not work for chords. Often, it is necessary to flip the tie for visual appeal, especially when tying chords. This can be accomplished by ensuring that you are not in Note Entry mode, clicking on a tie, and then pressing the X key. Even though ties look very similar to slurs in many situations, they are created differently. Slurs will be discussed later. Copying and pasting Suppose that we would like to repeat a measure in the bass line, or that the next measure in the melody is very similar to the previous measure. As in a word processor, we can copy and paste measures and fragments of music. First, let's copy and paste a measure. Exit Note Entry mode by ensuring the N button in the toolbar is not selected. Then, click on a portion of the measure where no notes are present. The measure should be selected, as indicated by the blue box around it. Now, either go to the Edit menu and click on Copy, or press Ctrl + C ( + C on Mac). The measure will be copied to the clipboard. Now, click on a portion of the target measure without any notes, and either click on Paste from the Edit menu, or press Ctrl + V ( + V on Mac). The notes will be inserted, and the target measure will be overwritten. It is also possible to copy any portion of your score, even if it spans partial measures or multiple staves. First, click on the note at the top-left of the region you want to copy. In the following example, this would be the E♭ in the right hand. Then, press and hold the Shift key, and click on the note at the bottom right corner of the region you would like to copy. Here, that would be the D in the left hand. MuseScore will select all of the notes in between. Once you have selected the region, you can copy it in the same way you copied the measure before. To paste the region, click on the first note or rest in the uppermost stave where you would like to paste it, and paste as we did with a single measure using either Ctrl + V or Paste from the Edit menu. If your selection has different measure breaks or is in a different meter than the destination, the selection will be reflowed to fit the destination, and ties will be added as necessary. Inserting and deleting measures Often, it is helpful to insert or delete a measure in your score. Luckily, MuseScore makes this extremely easy. To insert a measure, select the measure (as we did when we copied a measure) immediately after the location where you would like to insert the measure. Then, go to the Create menu, and under the Measures submenu, select Insert Measure. A measure will be inserted. To insert multiple measures, select Insert Measures. A dialog box will prompt you for how many measures to insert. If you would like to add measures to the end of the score, you can select Append Measures from under the Measures submenu within the Create menu. There is no need to select any measures to perform this operation. To delete measures, simply select the measure by clicking any blank area within the measure, and then go to the Edit menu, and click on Delete Selected Measures. Doing so will delete this measure position within all staves, not just the selected staff. You can also select multiple measures (as we did earlier when we were copying by selecting one measure, holding the Shift key, and selecting additional measures), and use the same menu button to delete all of the measures that you have selected. Chord symbols In jazz and popular music, it is very common to give musicians chord symbols to read from. To create a chord symbol, make sure you are not in Note Entry mode, and click on a note that you would like to add a chord symbol to. Then, either go to the Create menu, go to the Text submenu, and select Chord Name, or press Ctrl + K ( + K on Mac). A text box should appear that looks exactly like the ones we saw before. Now, you can type the name of the chord in the same way you would write it on paper. (For example, D minor would be Dm, and a G7 chord would just be G7.) All lowercase b characters will be converted into flat signs, and all # characters will be converted into sharps. To move to the next location in the measure, press the space bar. If you press the space bar repeatedly, you will move forward without inserting any chords. Now that our chords are inserted, we can optionally make them look stylized. To do this, go to the Style menu and click on Edit General Style. Then, click on the Chordnames option on the left-hand side. You should see a textbox appear on the right-hand side containing the text stdchords.xml. Change this to jazzchords.xml, and then press OK. The chords you entered should be appropriately stylized. Many styles of notation, especially within jazz music, use chord symbols and slashes to indicate improvisation. To create these slashes in MuseScore, insert four quarter notes on the middle line of the staff. Then, after exiting Note Entry mode, right-click on each note and select Note Properties. Check the box that says Stemless. Also, find the option labeled velocity type and choose user, and then change the value of the box velocity (0-127) to 0. Now press OK. Then, locate the section of the palette labeled Note Heads, and drag the parallelogram slash shape on top of each note. This will create the slash notation. Beaming The proper beaming of notes is a key feature of quality engraved scores that often goes unappreciated. It is extremely easy to change the beaming patterns to enhance the readability of your score. There are several utilities in the palette that allow for this. To start, go to the section of the palette labeled Beam Properties. Hovering over each icon will tell you what it does. These properties can be applied to different notes. The Start beam option is for notes in the middle of an existing beam. It breaks the existing beam at the specified note, and starts a new beam on that note. The Middle of the beam option will ensure that the selected note is beamed to the notes on both sides of it, and the No beam option will break any beams going to the selected note. Let's learn how to use these with a simple use case scenario. Suppose you enter three eighth notes followed by an eighth rest. MuseScore will automatically choose the following beaming: However, to a musician who is sight-reading, it may be easy to confuse this with a triplet. To correct this, simply drag the No beam icon on top of the third eighth note in the passage. The note should highlight red as you hover over it, before you drop it. Once you let go of the mouse button, MuseScore will automatically adjust the beam according to what you specified. Similarly, choosing the beaming wisely can make difficult passages easier to read. Let's consider the case of two sixteenth notes followed by two eighth notes and two more sixteenth notes. Especially with the sharps and flats in this example, it would not be easy to sight-read such a passage. However, dragging the Start beam option on top of the B♮ makes this passage much cleaner and easier to read. To undo any of these changes, ensure that you are not in Note Entry mode, and click on the note that you have changed. Then, in the Beam Properties section of the palette, double-click the A icon to reset it back to default. Though MuseScore uses standard conventions for whether to put the beam above or below the notes, if you would like to change this, simply ensure that you are not in Note Entry mode, click on the beam, and press the X key. The beam will flip to the other side of the staff. Summary In this article, we learned the basics of creating notes including ties and triplets, copying and pasting measures, creating chord symbols, and also changing the beaming patterns to enhance the readability of our score. Resources for Article: Further resources on this subject: Importing and Adding Background Music with Audacity 1.3 [Article] New iPad Features in iOS 6 [Article] Quick start – media files and XBMC [Article]
Read more
  • 0
  • 0
  • 4553

article-image-integrating-storm-and-hadoop
Packt
04 Sep 2013
17 min read
Save for later

Integrating Storm and Hadoop

Packt
04 Sep 2013
17 min read
(For more resources related to this topic, see here.) In this article, we will implement the Batch and Service layers to complete the architecture. There are some key concepts underlying this big data architecture: Immutable state Abstraction and composition Constrain complexity Immutable state is the key, in that it provides true fault-tolerance for the architecture. If a failure is experienced at any level, we can always rebuild the data from the original immutable data. This is in contrast to many existing data systems, where the paradigm is to act on mutable data. This approach may seem simple and logical; however, it exposes the system to a particular kind of risk in which the state is lost or corrupted. It also constrains the system, in that you can only work with the current view of the data; it isn't possible to derive new views of the data. When the architecture is based on a fundamentally immutable state, it becomes both flexible and fault-tolerant. Abstractions allow us to remove complexity in some cases, and in others they can introduce complexity. It is important to achieve an appropriate set of abstractions that increase our productivity and remove complexity, but at an appropriate cost. It must be noted that all abstractions leak, meaning that when failures occur at a lower abstraction, they will affect the higher-level abstractions. It is therefore often important to be able to make changes within the various layers and understand more than one layer of abstraction. The designs we choose to implement our abstractions must therefore not prevent us from reasoning about or working at the lower levels of abstraction when required. Open source projects are often good at this, because of the obvious access to the code of the lower level abstractions, but even with source code available, it is easy to convolute the abstraction to the extent that it becomes a risk. In a big data solution, we have to work at higher levels of abstraction in order to be productive and deal with the massive complexity, so we need to choose our abstractions carefully. In the case of Storm, Trident represents an appropriate abstraction for dealing with the data-processing complexity, but the lower level Storm API on which Trident is based isn't hidden from us. We are therefore able to easily reason about Trident based on an understanding of lower-level abstractions within Storm. Another key issue to consider when dealing with complexity and productivity is composition. Composition within a given layer of abstraction allows us to quickly build out a solution that is well tested and easy to reason about. Composition is fundamentally decoupled, while abstraction contains some inherent coupling to the lower-level abstractions—something that we need to be aware of. Finally, a big data solution needs to constrain complexity. Complexity always equates to risk and cost in the long run, both from a development perspective and from an operational perspective. Real-time solutions will always be more complex than batch-based systems; they also lack some of the qualities we require in terms of performance. Nathan Marz's Lambda architecture attempts to address this by combining the qualities of each type of system to constrain complexity and deliver a truly fault-tolerant architecture. We divided this flow into preprocessing and "at time" phases, using streams and DRPC streams respectively. We also introduced time windows that allowed us to segment the preprocessed data. In this article, we complete the entire architecture by implementing the Batch and Service layers. The Service layer is simply a store of a view of the data. In this case, we will store this view in Cassandra, as it is a convenient place to access the state alongside Trident's state. The preprocessed view is identical to the preprocessed view created by Trident, counted elements of the TF-IDF formula (D, DF, and TF), but in the batch case, the dataset is much larger, as it includes the entire history. The Batch layer is implemented in Hadoop using MapReduce to calculate the preprocessed view of the data. MapReduce is extremely powerful, but like the lower-level Storm API, is potentially too low-level for the problem at hand for the following reasons: We need to describe the problem as a data pipeline; MapReduce isn't congruent with such a way of thinking Productivity We would like to think of a data pipeline in terms of streams of data, tuples within the stream and predicates acting on those tuples. This allows us to easily describe a solution to a data processing problem, but it also promotes composability, in that predicates are fundamentally composable, but pipelines themselves can also be composed to form larger, more complex pipelines. Cascading provides such an abstraction for MapReduce in the same way as Trident does for Storm. With these tools, approaches, and considerations in place, we can now complete our real-time big data architecture. There are a number of elements, that we will update, and a number of elements that we will add. The following figure illustrates the final architecture, where the elements in light grey will be updated from the existing recipe, and the elements in dark grey will be added in this article: Implementing TF-IDF in Hadoop TF-IDF is a well-known problem in the MapReduce communities; it is well-documented and implemented, and it is interesting in that it is sufficiently complex to be useful and instructive at the same time. Cascading has a series of tutorials on TF-IDF at http://www.cascading.org/2012/07/31/cascading-for-the-impatient-part-5/, which documents this implementation well. For this recipe, we shall use a Clojure Domain Specific Language (DSL) called Cascalog that is implemented on top of Cascading. Cascalog has been chosen because it provides a set of abstractions that are very semantically similar to the Trident API and are very terse while still remaining very readable and easy to understand. Getting ready Before you begin, please ensure that you have installed Hadoop by following the instructions at http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/. How to do it… Start by creating the project using the lein command: lein new tfidf-cascalog Next, you need to edit the project.clj file to include the dependencies: (defproject tfidf-cascalog "0.1.0-SNAPSHOT" :dependencies [[org.clojure/clojure "1.4.0"] [cascalog "1.10.1"] [org.apache.cassandra/cassandra-all "1.1.5"] [clojurewerkz/cassaforte "1.0.0-beta11-SNAPSHOT"] [quintona/cascading-cassandra "0.0.7-SNAPSHOT"] [clj-time "0.5.0"] [cascading.avro/avro-scheme "2.2-SNAPSHOT"] [cascalog-more-taps "0.3.0"] [org.apache.httpcomponents/httpclient "4.2.3"]] :profiles{:dev{:dependencies[[org.apache.hadoop/hadoop-core "0.20.2-dev"] [lein-midje "3.0.1"] [cascalog/midje-cascalog "1.10.1"]]}}) It is always a good idea to validate your dependencies; to do this, execute lein deps and review any errors. In this particular case, cascading-cassandra has not been deployed to clojars, and so you will receive an error message. Simply download the source from https://github.com/quintona/cascading-cassandra and install it into your local repository using Maven. It is also good practice to understand your dependency tree. This is important to not only prevent duplicate classpath issues, but also to understand what licenses you are subject to. To do this, simply run lein pom, followed by mvn dependency:tree. You can then review the tree for conflicts. In this particular case, you will notice that there are two conflicting versions of Avro. You can fix this by adding the appropriate exclusions: [org.apache.cassandra/cassandra-all "1.1.5" :exclusions [org.apache.cassandra.deps/avro]] We then need to create the Clojure-based Cascade queries that will process the document data. We first need to create the query that will create the "D" view of the data; that is, the D portion of the TF-IDF function. This is achieved by defining a Cascalog function that will output a key and a value, which is composed of a set of predicates: (defn D [src] (let [src (select-fields src ["?doc-id"])] (<- [?key ?d-str] (src ?doc-id) (c/distinct-count ?doc-id :> ?n-docs) (str "twitter" :> ?key) (str ?n-docs :> ?d-str)))) You can define this and any of the following functions in the REPL, or add them to core.clj in your project. If you want to use the REPL, simply use lein repl from within the project folder. The required namespace (the use statement), require, and import definitions can be found in the source code bundle. We then need to add similar functions to calculate the TF and DF values: (defn DF [src] (<- [?key ?df-count-str] (src ?doc-id ?time ?df-word) (c/distinct-count ?doc-id ?df-word :> ?df-count) (str ?df-word :> ?key) (str ?df-count :> ?df-count-str))) (defn TF [src] (<- [?key ?tf-count-str] (src ?doc-id ?time ?tf-word) (c/count ?tf-count) (str ?doc-id ?tf-word :> ?key) (str ?tf-count :> ?tf-count-str))) This Batch layer is only interested in calculating views for all the data leading up to, but not including, the current hour. This is because the data for the current hour will be provided by Trident when it merges this batch view with the view it has calculated. In order to achieve this, we need to filter out all the records that are within the current hour. The following function makes that possible: (deffilterop timing-correct? [doc-time] (let [now (local-now) interval (in-minutes (interval (from-long doc-time) now))] (if (< interval 60) false true)) Each of the preceding query definitions require a clean stream of words. The text contained in the source documents isn't clean. It still contains stop words. In order to filter these and emit a clean set of words for these queries, we can compose a function that splits the text into words and filters them based on a list of stop words and the time function defined previously: (defn etl-docs-gen [rain stop] (<- [?doc-id ?time ?word] (rain ?doc-id ?time ?line) (split ?line :> ?word-dirty) ((c/comp s/trim s/lower-case) ?word-dirty :> ?word) (stop ?word :> false) (timing-correct? ?time))) We will be storing the outputs from our queries to Cassandra, which requires us to define a set of taps for these views: (defn create-tap [rowkey cassandra-ip] (let [keyspace storm_keyspace column-family "tfidfbatch" scheme (CassandraScheme. cassandra-ip "9160" keyspace column-family rowkey {"cassandra.inputPartitioner""org.apache.cassandra.dht.RandomPartitioner" "cassandra.outputPartitioner" "org.apache.cassandra.dht.RandomPartitioner"}) tap (CassandraTap. scheme)] tap)) (defn create-d-tap [cassandra-ip] (create-tap "d"cassandra-ip)) (defn create-df-tap [cassandra-ip] (create-tap "df" cassandra-ip)) (defn create-tf-tap [cassandra-ip] (create-tap "tf" cassandra-ip)) The way this schema is created means that it will use a static row key and persist name-value pairs from the tuples as column:value within that row. This is congruent with the approach used by the Trident Cassandra adaptor. This is a convenient approach, as it will make our lives easier later. We can complete the implementation by a providing a function that ties everything together and executes the queries: (defn execute [in stop cassandra-ip] (cc/connect! cassandra-ip) (sch/set-keyspace storm_keyspace) (let [input (tap/hfs-tap (AvroScheme. (load-schema)) in) stop (hfs-delimited stop :skip-header? true) src (etl-docs-gen input stop)] (?- (create-d-tap cassandra-ip) (D src)) (?- (create-df-tap cassandra-ip) (DF src)) (?- (create-tf-tap cassandra-ip) (TF src)))) Next, we need to get some data to test with. I have created some test data, which is available at https://bitbucket.org/qanderson/tfidf-cascalog. Simply download the project and copy the contents of src/data to the data folder in your project structure. We can now test this entire implementation. To do this, we need to insert the data into Hadoop: hadoop fs -copyFromLocal ./data/document.avro data/document.avro hadoop fs -copyFromLocal ./data/en.stop data/en.stop Then launch the execution from the REPL: => (execute "data/document" "data/en.stop" "127.0.0.1") How it works… There are many excellent guides on the Cascalog wiki (https://github.com/nathanmarz/cascalog/wiki), but for completeness's sake, the nature of a Cascalog query will be explained here. Before that, however, a revision of Cascading pipelines is required. The following is quoted from the Cascading documentation (http://docs.cascading.org/cascading/2.1/userguide/htmlsingle/): Pipe assemblies define what work should be done against tuple streams, which are read from tap sources and written to tap sinks. The work performed on the data stream may include actions such as filtering, transforming, organizing, and calculating. Pipe assemblies may use multiple sources and multiple sinks, and may define splits, merges, and joins to manipulate the tuple streams. This concept is embodied in Cascalog through the definition of queries. A query takes a set of inputs and applies a list of predicates across the fields in each tuple of the input stream. Queries are composed through the application of many predicates. Queries can also be composed to form larger, more complex queries. In either event, these queries are reduced down into a Cascading pipeline. Cascalog therefore provides an extremely terse and powerful abstraction on top of Cascading; moreover, it enables an excellent development workflow through the REPL. Queries can be easily composed and executed against smaller representative datasets within the REPL, providing the idiomatic API and development workflow that makes Clojure beautiful. If we unpack the query we defined for TF, we will find the following code: (defn DF [src] (<- [?key ?df-count-str] (src ?doc-id ?time ?df-word) (c/distinct-count ?doc-id ?df-word :> ?df-count) (str ?df-word :> ?key) (str ?df-count :> ?df-count-str))) The <- macro defines a query, but does not execute it. The initial vector, [?key ?df-count-str], defines the output fields, which is followed by a list of predicate functions. Each predicate can be one of the following three types: Generators: A source of data where the underlying source is either a tap or another query. Operations: Implicit relations that take in input variables defined elsewhere and either act as a function that binds new variables or a filter. Operations typically act within the scope of a single tuple. Aggregators: Functions that act across tuples to create aggregate representations of data. For example, count and sum. The :> keyword is used to separate input variables from output variables. If no :> keyword is specified, the variables are considered as input variables for operations and output variables for generators and aggregators. The (src ?doc-id ?time ?df-word) predicate function names the first three values within the input tuple, whose names are applicable within the query scope. Therefore, if the tuple ("doc1" 123324 "This") arrives in this query, the variables would effectively bind as follows: ?doc-id: "doc1" ?time: 123324 ?df-word: "This" Each predicate within the scope of the query can use any bound value or add new bound variables to the scope of the query. The final set of bound values that are emitted is defined by the output vector. We defined three queries, each calculating a portion of the value required for the TF-IDF algorithm. These are fed from two single taps, which are files stored in the Hadoop filesystem. The document file is stored using Apache Avro, which provides a high-performance and dynamic serialization layer. Avro takes a record definition and enables serialization/deserialization based on it. The record structure, in this case, is for a document and is defined as follows: {"namespace": "storm.cookbook", "type": "record", "name": "Document", "fields": [ {"name": "docid", "type": "string"}, {"name": "time", "type": "long"}, {"name": "line", "type": "string"} ] } Both the stop words and documents are fed through an ETL function that emits a clean set of words that have been filtered. The words are derived by splitting the line field using a regular expression: (defmapcatop split [line] (s/split line #"[[](),.)s]+")) The ETL function is also a query, which serves as a source for our downstream queries, and defines the [?doc-id ?time ?word] output fields. The output tap, or sink, is based on the Cassandra scheme. A query defines predicate logic, not the source and destination of data. The sink ensures that the outputs of our queries are sent to Cassandra. The ?- macro executes a query, and it is only at execution time that a query is bound to its source and destination, again allowing for extreme levels of composition. The following, therefore, executes the TF query and outputs to Cassandra: (?- (create-tf-tap cassandra-ip) (TF src)) There's more… The Avro test data was created using the test data from the Cascading tutorial at http://www.cascading.org/2012/07/31/cascading-for-the-impatient-part-5/. Within this tutorial is the rain.txt tab-separated data file. A new column was created called time that holds the Unix epoc time in milliseconds. The updated text file was then processed using some basic Java code that leverages Avro: Schema schema = Schema.parse(SandboxMain.class.getResourceAsStream("/document.avsc")); File file = new File("document.avro"); DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema); DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter); dataFileWriter.create(schema, file); BufferedReader reader = new BufferedReader(new InputStreamReader(SandboxMain.class.getResourceAsStream("/rain.txt"))); String line = null; try { while ((line = reader.readLine()) != null) { String[] tokens = line.split("t"); GenericRecord docEntry = new GenericData.Record(schema); docEntry.put("docid", tokens[0]); docEntry.put("time", Long.parseLong(tokens[1])); docEntry.put("line", tokens[2]); dataFileWriter.append(docEntry); } } catch (IOException e) { e.printStackTrace(); } dataFileWriter.close(); Persisting documents from Storm In the previous recipe, we looked at deriving precomputed views of our data taking some immutable data as the source. In that recipe, we used statically created data. In an operational system, we need Storm to store the immutable data into Hadoop so that it can be used in any preprocessing that is required. How to do it… As each tuple is processed in Storm, we must generate an Avro record based on the document record definition and append it to the data file within the Hadoop filesystem. We must create a Trident function that takes each document tuple and stores the associated Avro record. Within the tfidf-topology project created in, inside the storm.cookbook.tfidf.function package, create a new class named PersistDocumentFunction that extends BaseFunction. Within the prepare function, initialize the Avro schema and document writer: public void prepare(Map conf, TridentOperationContext context) { try { String path = (String) conf.get("DOCUMENT_PATH"); schema = Schema.parse(PersistDocumentFunction.class .getResourceAsStream("/document.avsc")); File file = new File(path); DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema); dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter); if(file.exists()) dataFileWriter.appendTo(file); else dataFileWriter.create(schema, file); } catch (IOException e) { throw new RuntimeException(e); } } As each tuple is received, coerce it into an Avro record and add it to the file: public void execute(TridentTuple tuple, TridentCollector collector) { GenericRecord docEntry = new GenericData.Record(schema); docEntry.put("docid", tuple.getStringByField("documentId")); docEntry.put("time", Time.currentTimeMillis()); docEntry.put("line", tuple.getStringByField("document")); try { dataFileWriter.append(docEntry); dataFileWriter.flush(); } catch (IOException e) { LOG.error("Error writing to document record: " + e); throw new RuntimeException(e); } } Next, edit the TermTopology.build topology and add the function to the document stream: documentStream.each(new Fields("documentId","document"), new PersistDocumentFunction(), new Fields()); Finally, include the document path into the topology configuration: conf.put("DOCUMENT_PATH", "document.avro"); How it works… There are various logical streams within the topology, and certainly the input for the topology is not in the appropriate state for the recipes in this article containing only URLs. We therefore need to select the correct stream from which to consume tuples, coerce these into Avro records, and serialize them into a file. The previous recipe will then periodically consume this file. Within the context of the topology definition, include the following code: Stream documentStream = getUrlStream(topology, spout) .each(new Fields("url"), new DocumentFetchFunction(mimeTypes), new Fields("document", "documentId", "source")); documentStream.each(new Fields("documentId","document"), new PersistDocumentFunction(), new Fields()); The function should consume tuples from the document stream whose tuples are populated with already fetched documents.
Read more
  • 0
  • 0
  • 2862

Packt
04 Sep 2013
7 min read
Save for later

Quick start – writing your first MDX query

Packt
04 Sep 2013
7 min read
(For more resources related to this topic, see here.) Step 1 – open the SQL Server Management Studio and connect to the cube The Microsoft SQL Server Management Studio (SSMS) is a client application used by the administrators to manage instances and by developers to create object and write queries. We will use SSMS to connect on the cube and write our first MDX query. Here's a screenshot of SSMS with a connection on a SSAS server: Click on the Windows button, click on All Programs, click on Microsoft SQL Server 2012, and then click on SQL Server Management Studio. In the Connect to Server window, in the Server type box, select Analysis Services. In the Server name box, type the name of your Analysis Services server. Click on Connect. In the SQL Server Management Studio window, click on the File menu, click on New, and then click on Analysis Services MDX Query. In the Connect to Analysis Services window, in the Server name box, type the name of you Analysis Services server, and then click on Connect. SELECT FROM WHERE If you have already written SQL queries, you might have already made connections with the T-SQL language. Here's my tip for you: don't, you will only hurt yourself. Some words are the same, but it is better to think MDX when writing MDX rather than to think SQL when writing MDX. Step 2 – SELECT The SELECT clause is the main part of the MDX query. You will define what are the measure and dimension members that you want to display. You also have to define on which axis of your result set you want to display the measure and dimension members. Axes Axes are the columns and rows of the result set. With SQL Server Analysis Services, upto 128 axes can be specified. The axes have a number which is zero-based. The first axe is 0, the second on is 1, and so on. So, if you want to use two axes, the first one will be 0 and the second will be 1. You cannot use axe 0 and axe 2, if you don't define axe 1. For the first five axes, you can use the axis alias instead. After the axe 4, you will have to revert to the number because no other aliases are available. Axe Number Alias 0 Columns 1 Rows 2 Pages 3 Sections 4 Chapters Even if SSAS supports 128 axes, if you try to use more than two axes in SSMS in your query, you will get this error when you execute your MDX query: Results cannot be displayed for cellsets with more than two axes. So, always write your MDX queries using only two axes in SSMS and separate them with a comma. Tuples A tuple is a specific point in the cube where dimensions meet. A tuple can contain one or more members from the cube's dimensions, but you cannot have two members from the same dimension. If you want to display only the calendar year 2008, you will have to write [Date].[CY 2008]. If you want to have more than one dimension, you have to enclose them using parenthesis () and separate them with a comma. Calendar year for United States will look like ([Date].[CY 2008], [Geography].[United States]). Even if you are writing a tuple with only a single member from a single dimension, it is good practice to enclose it in parenthesis. Sets If you want to display the year 2005 to 2008, you will write four single-dimension tuples which composes a set. When writing the set, you separate the tuples with commas and wrap it all with curly braces {} and separate the tuples with commas such as {[Date].[CY 2005], [Date].[CY 2006] , [Date].[CY 2007] , [Date].[CY 2008]} to have the calendar years from 2005 to 2008. Since all the tuples are from the same dimension, you can also write it using a colon (:), such as {[Date].[CY 2005]: [Date].[CY 2008]} which will give you the years 2005 to 2008. With SSAS 2012, you can write {[Date].[CY 2008]: [Date].[CY 2005]} and the result will still be from 2005 to 2008. What about the calendar year 2008 for both Canada and the United States? You will write two tuples. A set can be composed of one or more tuples. The tuples must have the same dimensionality; otherwise, an error will occur. Meaning that the first member is from the Date dimension and the second from the Geography dimension. You cannot have the first tuple with Date-Geography and the second being Geography-Date; you will encounter an error. So the calendar year 2008 with Canada and United States will look such as {([Date].[CY 2008], [Geography].[Canada]), ([Date].[CY 2008], [Geography].[United States])}. When writing tuples, always use the form [Dimension].[Level].[MemberName]. So, [Geography].[Canada] should be written as [Geography].[Country].[Canada]. You could also use the member key instead of the member name. In SSAS, use the ampersand (&) when using the key; [Geography].[State-Province].[Quebec] with the name becomes [Geography].[State-Province].&[QC]&[CA] using the keys. What happens when you want to write bigger sets such as for the bikes and components product category in Canada and the United States from 2005 to 2008? Enter the Crossjoin function. Crossjoin takes two or more sets for arguments and returns you a set with the cross products or the specified sets. Crossjoin ({[Product].[Category].[Bikes], [Product].[Category].[Components]}, {[Geography].[Country].[Canada], [Geography].[Country].[United States]}, {[Date].[CY 2005] : [Date].[CY 2008]}) The MDX queries can be written using line-break to add visibility to the code. So each time we write a new set and even tuples, we write it on a new line and add some indentation: Crossjoin ({[Product].[Category].[Bikes], [Product].[Category].[Components]},{[Geography].[Country].[Canada], [Geography].[Country].[United States]}, {[Date].[CY 2005] : [Date].[CY 2008]}) Step 3 – FROM The FROM clause defines where the query will get the data. It can be one of the following four things: A cube. A perspective (a subset of dimensions and measures). A subcube (a MDX query inside a MDX query). A dimension (a dimension inside your SSAS database, you must use the dollar sign ($) before the name of the dimension). Step 4 – WHERE The WHERE clause is used to filter the dimensions and members out of the MDX query. The set used in the WHERE clause won't be displayed in your result set. Step 5 – comments Comment your code. You never know when somebody else will take a look on your queries and trying to understand what has been written could be harsh. There are three ways to use delimit comments inside the query: /* and */ // -- (pair of dashes) The /* and */ symbols can be used to comment multiple lines of text in your query. Everything between the /* and the */ symbols will be ignored when the MDX query is parsed. Use // or -- to begin a comment on a single line. Step 6 – your first MDX query So if you want to display the Resellers Sales Amount and Reseller Order Quantity measures on the columns, the years from 2006 to 2008 with the bikes and components product categories for Canada. First, identify what will go where. Start with the two axes, continue with the FROM clause, and finish with the WHERE clause. SELECT{[Measures].[Reseller Sales Amount], [Measures].[Reseller Order Quantity]} on columns,Crossjoin({[Date].[CY 2006] : [Date].[CY 2008]}, {[Product].[Category].[Bikes], [Product].[Category].[Components]}) on rowsFROM [Adventure Works]WHERE {[Geography].[Country].[Canada]} This query will return the following result set:     Reseller Sales Amount Reseller Order Quantity CY 2006 Bikes $3,938,283.99 4,563 CY 2006 Components $746,576.15 2,954 CY 2007 Bikes $4,417,665.71 5,395 CY 2007 Components $997,617.89 4,412 CY 2008 Bikes $1,909,709.62 2,209 CY 2008 Components $370,698.68 1,672 Summary In this article, we saw how to write the MDX queries in various steps. We used the FROM, WHERE, and SELECT clauses in writing the queries. This article was a quick start guide for starting to query and it will help you write more complex queries. Happy querying! Resources for Article : Further resources on this subject: Connecting to Microsoft SQL Server Compact 3.5 with Visual Studio [Article] MySQL Linked Server on SQL Server 2008 [Article] Microsoft SQL Azure Tools [Article]
Read more
  • 0
  • 0
  • 9984

article-image-so-what-apache-wicket
Packt
04 Sep 2013
7 min read
Save for later

So, what is Apache Wicket?

Packt
04 Sep 2013
7 min read
(For more resources related to this topic, see here.) Wicket is a component-based Java web framework that uses just Java and HTML. Here, you will see the main advantages of using Apache Wicket in your projects. Using Wicket, you will not have mutant HTML pages. Most of the Java web frameworks require the insertion of special syntax to the HTML code, making it more difficult for Web designers. On the other hand, Wicket adopts HTML templates by using a namespace that follows the XHTML standard. It consists of an id attribute in the Wicket namespace (wicket:id). You won't need scripts to generate messy HTML code. Using Wicket, the code will be clearer, and refactoring and navigating within the code will be easier. Moreover, you can utilize any HTML editor to edit the HTML files, and web designers can work with little knowledge of Wicket in the presentation layer without worrying about business rules and other developer concerns. The advantages for developers are as follows: All code is written in Java No XML configuration files POJO-centric programming No Back-button problems (that is, unexpected and undesirable results on clicking on the browser's Back button) Ease of creating bookmarkable pages Great compile-time and runtime problem diagnosis Easy testability of components Another interesting thing is that concepts such as generics and anonymous subclasses are widely used in Wicket, leveraging the Java programming language to the max. Wicket is based on components. A component is an object that interacts with other components and encapsulates a set of functionalities. Each component should be reusable, replaceable, extensible, encapsulated, and independent, and it does not have a specific context. Wicket provides all these principles to developers because it has been designed taking into account all of them. In particular, the most remarkable principle is reusability. Developers can create custom reusable components in a straightforward way. For instance, you could create a custom component called SearchPanel (by extending the Panel class, which is also a component) and use it in all your other Wicket projects. Wicket has many other interesting features. Wicket also aims to make the interaction of the stateful server-side Java programming language with the stateless HTTP protocol more natural. Wicket's code is safe by default. For instance, it does not encode state in URLs. Wicket is also efficient (for example, it is possible to do a tuning of page-state replication) and scalable (Wicket applications can easily work on a cluster). Last, but not least, Wicket has support for frameworks like EJB and Spring. Installation In seven easy steps, you can build a Wicket "Hello World" application. Step 1 – what do I need? Before you start to use Apache Wicket 6, you will need to check if you have all of the required elements, listed as follows: Wicket is a Java framework, so you need to have Java virtual machine (at least Version 6) installed on your machine. Apache Maven is required. Maven is a tool that can be used for building and managing Java projects. Its main purpose is to make the development process easier and more structured. More information on how to install and configure Maven can be found at http://maven.apache.org. The examples of this book use the Eclipse IDE Juno version, but you can also use other versions or other IDEs, such as NetBeans. In case you are using other versions, check the link for installing the plugins to the version you have; the remaining steps will be the same. In case of other IDEs, you will need to follow some tutorial to install other equivalent plugins or not use them at all. Step 2 – installing the m2eclipse plugin The steps for installing the m2eclipse plugin are as follows: Go to Help | Install New Software. Click on Add and type in m2eclipse in the Name field; copy and paste the link https://repository.sonatype.org/content/repositories/forge-sites/m2e/1.3.0/N/LATEST onto the Location field. Check all options and click on Next. Conclude the installation of the m2eclipse plugin by accepting all agreements and clicking on Finish. Step 3 – creating a new Maven application The steps for creating a new Maven application are as follows: Go to File | New | Project. Then go to Maven | Maven Project. Click on Next and type wicket in the next form. Choose the wicket-archetype-quickstart maven Archetype and click on Next. Fill the next form according to the following screenshot and click on Finish: Step 4 – coding the "Hello World" program In this step, we will build the famous "Hello World" program. The separation of concerns will be clear between HTML and Java code. In this example, and in most cases, each HTML file has a corresponding Java class (with the same name). First, we will analyse the HTML template code. The content of the HomePage.html file must be replaced by the following code: <!DOCTYPE html> <html > <body> <span wicket_id="helloWorldMessage">Test</span> </body> </html> It is simple HTML code with the Wicket template wicket:id="helloWorldMessage". It indicates that in the Java code related to this page, a method will replace the message Test by another message. Now, let's edit the corresponding Java class; that is, HomePage. package com.packtpub.wicket.hello_world; import org.apache.wicket.markup.html.WebPage; import org.apache.wicket.markup.html.basic.Label; public class HomePage extends WebPage { public HomePage() { add(new Label("helloWorldMessage", "Hello world!!!")); } } The class HomePage extends WebPage; that is, it inherits some of the WebPage class's methods and attributes, and it becomes a WebPage subtype. One of these inherited methods is the method add(), where a Label object can be passed as a parameter. A Label object can be built by passing two parameters: an identifier and a string. The method add() is called in the HomePage class's constructor and will change the message in wicket:id="helloWorldMessage" with Hello world!!!. The resulting HTML code will be as shown in the following code snippet: <!DOCTYPE html> <html > <body> <span>Hello world!!!</span> </body> </html> Step 5 – compile and run! The steps to compile and run the project are as follows: To compile, right-click on the project and go to Run As | Maven install. Verify if the compilation was successful. If not, Wicket provides good error messages, so you can try to fix what is wrong. To run the project, right-click on the class Start and go to Run As | Java application. The class Start will run an embedded Jetty instance that will run the application. Verify if the server has started without any problems. Open a web browser and enter this in the address field: http://localhost:8080. In case you have changed the port, enter http://localhost:<port>. The browser should show Hello world!!!. The most common problem that can occur is that port 8080 is already in use. In this case, you can go into the Java Start class (found at src/test/java) and set another port by replacing 8080 in connector. setPort(8080) (line 21) by another number (for example, 9999). To stop the server, you can either click on Console and press any key or click on the red square on the console, which indicates termination. And that's it! By this point, you should have a working Wicket "Hello World" application and are free to play around and discover more about it. Summary This article describes how to create a simple "Hello World" application using Apache Wicket 6. Resources for Article : Further resources on this subject: Tips for Deploying Sakai [Article] OSGi life cycle [Article] Apache Wicket: Displaying Data Using DataTable [Article]
Read more
  • 0
  • 0
  • 5478
article-image-setting-scans
Packt
04 Sep 2013
10 min read
Save for later

Setting up scans

Packt
04 Sep 2013
10 min read
(For more resources related to this topic, see here.) Setting up a scan in Spiceworks The first thing Spiceworks tries to do to scan a network is contact Active Directory(AD); it also uses AD to populate the People portion of your Inventory. Let's set up AD first, as everything else we will be configuring is on the same page. We are all about saving your time and not going back and forth between pages. If you do not have AD in your environment, you can just skip to the Configuring IP range scans section. Scanning and Active Directory There is a wealth of information within AD that Spiceworks uses. We are going to need to configure Spiceworks to log into AD and get that information. OK, we need to get to the Active Directory Configuration screen in Spiceworks in order to do that. As with most things within the app, it is just a couple of clicks. From anywhere in the app, mouse over the Inventory link at the top of the page; a menu will open up. Click on Settings. This will take us to the Settings screen. You will be spending a lot of time here so you can either get very used to these clicks or just have a separate tab open with these settings already set up. The top section is called Getting Started and the first link is Active Directory Configuration. That is our destination for this section so click away. It will take you to the Active Directory Configuration page: There are three sections that are highlighted. Let's go over each and what they do: The area highlighted as 1 is where you are going to enter the credentials that allow Spiceworks to log into your AD and get information. You specify the Active Directory Server (Domain Controller), username and password. Usernames must be in either domain/username or username@domain.com. If you have SSL enabled for AD inquiries, check the Use SSL box. The area highlighted as 2 shows the frequency at which Spiceworks retrieves information from your AD environment. When Spiceworks queries AD, it does not cause a huge amount of traffic or load. Shortening these times should not cause undue stress on your AD servers. This is useful because when you add a user in AD, it will automatically get loaded into Spiceworks at the next scan. If you want any changes you make to users in Spiceworks to be uploaded into your AD environment, the section highlighted as 3 is for you. Just click on the box and any modifications you make in Spiceworks will automatically be synchronized with your AD. There is one more section that is not in the screenshot. This deals with your user portal and help desk. Setting up AD in your Spiceworks really makes a lot of difference with scans and filling in information. It is recommended that if you are running AD, hook this up. If you are wary about Spiceworks writing data into your AD environment, just set up the user that Spiceworks uses to connect as read-only and don't check the box that writes changes back to AD. Easy enough. Since you are convinced that you should connect your AD to Spiceworks, just fill in the ActiveDirectory server, User, and Password fields and click on Save. Spiceworks will automatically test the credentials and let you know immediately if it can connect. If you have some challenges with Spiceworks connecting to your Domain Controller with just the server name, another method is to put the IP address directly into that field. Let's move on to setting up an IP range scan and get some devices into your Spiceworks install. Configuring IP range scans Remember the Settings page that we have been to a couple of times? We are going back! In case you have forgotten, just mouse over to either Inventory or Help Desk and click on the Settings link at the bottom of the left column. Once on the Settings page, we are going to click on the Network Scan link. It is in the first section of links titled Getting Started. This takes us to the main Network Scan page. The first section is where we are going to set up our IP ranges. Since you will not have any ranges in here as you just installed Spiceworks, let's get one configured so you can get some information into the app. To do this, just click on the Add IP Range button and this window will pop up. There is a lot of flexibility that Spiceworks gives you regarding how it scans IP ranges. You can put a fill range (192.168.1.1-254) with or without exclusions, or just a single IP if you so wish. The next box is for exclusions, if you so choose. If you decide you want to scan a range that has both servers and desktops, you can exclude server IP addresses. This is handy. The last options are for scheduling this IP range scan. If you choose the Daily at… option as we have seen in the screenshot, you can also select the time of the day to run this scan. Other options in this drop-down list are every 4, 6, 8, or 12 hours. If you do decide that you want to scan on an hourly basis, the time of the day magically disappears. The bottom of the window lets you select what days of the week you want to run the scan. When Spiceworks runs an initial scan, it can take a bit of time as there is a ton of data that it is collecting. Spiceworks tries a multitude of credentials and reads all information from devices, which it then writes to the database. Once Spiceworks has scanned and written the data to the database, any subsequent scans just write delta data into it. Enter what range you want to scan, any exclusions you choose, and the scan frequency, and click on the Add button. Congratulations! You have just added an IP range scan! Scanning credentials As we have covered, Spiceworks uses a multitude of credentials to try and figure out what is on your network and put those devices into the inventory. This has been completely overhauled in Spiceworks. In this easy-to-use interface, you can enter all the credentials that you are going to need to have a successful scan. Here you can configure multiple usernames/passwords for the following protocols: WMI SSH SNMP Enable ESX/vSphere HTTP iLo SNMP v2c/v3 Telnet Intel vPro As you can see, if you need to put device-specific usernames and passwords into Spiceworks, you can do so using the format, Domainusername. So if you have a server that uses a unique username/password combination, it is easy to set all that up through this interface. The preceding screenshot shows an example of this. Something new in Spiceworks is the section where it shows devices that the credentials were successfully used on. This is really helpful for troubleshooting any scan errors! To add your own username/password combinations, just use these easy-to-follow directions: Click on the protocol you want to add credentials to on the left column (WMI, SNMP, and so on). Click on +Add Account in the middle column labeled Existing Accounts. Enter all the pertinent information on the left pane labeled Edit Account.For usernames that have passwords, there is a Show Password button as well, so you can make sure that you didn't fat finger it! That's it. Just fill in any credentials that will let Spiceworks access your devices on your network, and as far as permissions are concerned you should be good to go! Best practices and kicking off your first Spiceworks scan You have everything you need to start your first Spiceworks scan. It might be best to read the following best practices before you kick it off, though. They will guide you through some potential pitfalls. Scanning best practices For initial scans, be aware of the number of IP addresses you are scanning and the amount of information that Spiceworks is going to pull out of those devices. If you put in a full IP range on your first scan, do not expect Spiceworks to be completed in 10-15 minutes. The initial scan is the most network traffic intensive and will take the longest duration of time. Do full initial scans during nonbusiness hours. Though running an initial full scan shouldn't flood your network, depending on your network configuration, it is always best to run full initial scans during nonbusiness hours just in case. If you are running a 24 x 7 business, break up your IP ranges into smaller chunks and scan that way. Expect some unknown devices. Unless you are a super administrator with a team of hundreds behind you to make sure that every aspect of your network is 100 percent buttoned down, there will most likely be a few devices that Spiceworks cannot connect to. One of the biggest culprits is that WMI has been disabled, or that there is a firewall of some sort blocking Spiceworks from connecting to the machine. Don't get down on yourself if the scan doesn't work 100 percent the first time. If you are really worried about traffic that Spiceworks might cause, what information it collects, or how it will affect workstation performance, just set up a test environment and run a scan there. Whether it be 5 machines or 500, Spiceworks does the same to each one; so test away. Spiceworks is not designed to scan 10,000 devices at one time without a performance hit. If you have a very large network, break it up into smaller chunks for best performance. Spiceworks could get through a 10,000 device scan, but it would hurt performance until the scan is complete. If you have multiple sites linked either by WAN or VPN connections, drop a remote collector at these to run local scans and then send the data back to your main Spiceworks installation. You can find more information at http://community.spiceworks.com/help/Remote_Collectors OK, now that you have read the required best practices, you can set up your IP range on the Network Scan settings page, check the box associated with that range and click on Start Scan. Away you go! Depending on the IP range you set and the time of the day, your scan could take just a few minutes or several hours. If you are having some serious issues trying to get a successful scan, open a browser and hit this site: http://community.spiceworks.com/support. There are in-depth articles and even real-live support folks that can dive into the specifics of your environment, and they won't give up until you are successful. Let's assume that even if you did have an issue, it is resolved and you have got your first scan under your belt. Summary We were provided with details on how to set up a scan in Spiceworks. Also, we got to know how to run the scan we set up and the best practices. Resources for Article : Further resources on this subject: Using SpriteFonts in a Board-based Game with XNA 4.0 [Article] Why CoffeeScript?HTML5 Games Development: Using Local Storage to Store Game Data [Article] Making Money with Your Game [Article]
Read more
  • 0
  • 0
  • 11583

article-image-understanding-point-time-recovery
Packt
04 Sep 2013
28 min read
Save for later

Understanding Point-In-Time-Recovery

Packt
04 Sep 2013
28 min read
(For more resources related to this topic, see here.) Understanding the purpose of PITR PostgreSQL offers a tool called pg_dump to backup a database. Basically, pg_dump will connect to the database, read all the data in transaction isolation level "serializable" and return the data as text. As we are using "serializable", the dump is always consistent. So, if your pg_dump starts at midnight and finishes at 6 A.M, you will have created a backup, which contains all the data as of midnight but no further data. This kind of snapshot creation is highly convenient and perfectly feasible for small to medium amounts of data. A dump is always consistent. This means that all foreign keys are intact; new data added after starting the dump will be missing. It is most likely the most common way to perform standard backups. But, what if your data is so valuable and maybe so large in size that you want to backup it incrementally? Taking a snapshot from time to time might be enough for some applications; for highly critical data, it is clearly not. In addition to that, replaying 20 TB of data in textual form is not efficient either. Point-In-Time-Recovery has been designed to address this problem. How does it work? Based on a snapshot of the database, the XLOG will be replayed later on. This can happen indefinitely or up to a point chosen by you. This way, you can reach any point in time. This method opens the door to many different approaches and features: Restoring a database instance up to a given point in time Creating a standby database, which holds a copy of the original data Creating a history of all changes In this article, we will specifically feature on the incremental backup functionality and describe how you can make your data more secure by incrementally archiving changes to a medium of choice. Moving to the bigger picture The following picture provides an overview of the general architecture in use for Point-In-Time-Recovery: PostgreSQL produces 16 MB segments of transaction log. Every time one of those segments is filled up and ready, PostgreSQL will call the so called archive_command. The goal of archive_command is to transport the XLOG file from the database instance to an archive. In our image, the archive is represented as the pot on the bottom-right side of the image. The beauty of the design is that you can basically use an arbitrary shell script to archive the transaction log. Here are some ideas: Use some simple copy to transport data to an NFS share Run rsync to move a file Use a custom made script to checksum the XLOG file and move it to an FTP server Copy the XLOG file to a tape The possible options to manage XLOG are only limited by imagination. The restore_command is the exact counterpart of the archive_command. Its purpose is to fetch data from the archive and provide it to the instance, which is supposed to replay it (in our image, this is labeled as Restored Backup). As you have seen, replay might be used for replication or simply to restore a database to a given point in time as outlined in this article. Again, the restore_command is simply a shell script doing whatever you wish, file by file. It is important to mention that you, the almighty administrator, are in charge of the archive. You have to decide how much XLOG to keep and when to delete it. The importance of this task cannot be underestimated. Keep in mind, when then archive_command fails for some reason, PostgreSQL will keep the XLOG file and retry after a couple of seconds. If archiving fails constantly from a certain point on, it might happen that the master fills up. The sequence of XLOG files must not be interrupted; if a single file is missing, you cannot continue to replay XLOG. All XLOG files must be present because PostgreSQL needs an uninterrupted sequence of XLOG files; if a single file is missing, the recovery process will stop there at the very latest. Archiving the transaction log After taking a look at the big picture, we can take a look and see how things can be put to work. The first thing you have to do when it comes to Point-In-Time-Recovery is to archive the XLOG. PostgreSQL offers all the configuration options related to archiving through postgresql.conf. Let us see step by step what has to be done in postgresql.conf to start archiving: First of all, you should turn archive_mode on. In the second step, you should configure your archive command. The archive command is a simple shell command taking two parameters: %p: This is a placeholder representing the XLOG file that should be archived, including its full path (source). %f: This variable holds the name of the XLOG without the path pointing to it. Let us set up archiving now. To do so, we should create a place to put the XLOG. Ideally, the XLOG is not stored on the same hardware as the database instance you want to archive. For the sake of this example, we assume that we want to apply an archive to /archive. The following changes have to be made to postgresql.conf: wal_level = archive # minimal, archive, or hot_standby # (change requires restart) archive_mode = on # allows archiving to be done # (change requires restart) archive_command = 'cp %p /archive/%f' # command to use to archive a logfile segment # placeholders: %p = path of file to archive # %f = file name only Once those changes have been made, archiving is ready for action and you can simply restart the database to activate things. Before we restart the database instance, we want to focus your attention on wal_level. Currently three different wal_level settings are available: minimal archive hot_standby The amount of transaction log produced in the case of just a single node is by far not enough to synchronize an entire second instance. There are some optimizations in PostgreSQL, which allow XLOG-writing to be skipped in the case of single-node mode. The following instructions can benefit from wal_level being set to minimal: CREATE TABLE AS, CREATE INDEX, CLUSTER, and COPY (if the table was created or truncated within the same transaction). To replay the transaction log, at least archive is needed. The difference between archive and hot_standby is that archive does not have to know about currently running transactions. For streaming replication, however, this information is vital. Restarting can either be done through pg_ctl –D /data_directory –m fast restart directly or through a standard init script. The easiest way to check if our archiving works is to create some useless data inside the database. The following code snippets shows a million rows can be made easily: test=# CREATE TABLE t_test AS SELECT * FROM generate_series(1,1000000);SELECT 1000000test=# SELECT * FROM t_test LIMIT 3;generate_series----------------- 1 2 3(3 rows) We have simply created a list of numbers. The important thing is that 1 million rows will trigger a fair amount of XLOG traffic. You will see that a handful of files have made it to the archive: iMac:archivehs$ ls -ltotal 131072-rw------- 1 hs wheel 16777216 Mar 5 22:31000000010000000000000001-rw------- 1 hs wheel 16777216 Mar 5 22:31000000010000000000000002-rw------- 1 hs wheel 16777216 Mar 5 22:31000000010000000000000003-rw------- 1 hs wheel 16777216 Mar 5 22:31000000010000000000000004 Those files can be easily used for future replay operations. If you want to save storage, you can also compress those XLOG files. Just add gzip to your archive_command. Taking base backups In the previous section, you have seen that enabling archiving takes just a handful of lines and offers a great deal of flexibility. In this section, we will see how to create a so called base backup, which can be used to apply XLOG later on. A base backup is an initial copy of the data. Keep in mind that the XLOG itself is more or less worthless. It is only useful in combination with the initial base backup. In PostgreSQL, there are two main options to create an initial base backup: Using pg_basebackup Traditional copy/rsync based methods The following two sections will explain in detail how a base backup can be created: Using pg_basebackup The first and most common method to create a backup of an existing server is to run a command called pg_basebackup, which was introduced in PostgreSQL 9.1.0. Basically pg_basebackup is able to fetch a database base backup directly over a database connection. When executed on the slave, pg_basebackup will connect to the database server of your choice and copy all the data files in the data directory over to your machine. There is no need to log into the box anymore, and all it takes is one line of code to run it; pg_basebackup will do all the rest for you. In this example, we will assume that we want to take a base backup of a host called sample.postgresql-support.de. The following steps must be performed: Modify pg_hba.conf to allow replication Signal the master to take pg_hba.conf changes into account Call pg_basebackup Modifying pg_hba.conf To allow remote boxes to log into a PostgreSQL server and to stream XLOG, you have to explicitly allow replication. In PostgreSQL, there is a file called pg_hba.conf, which tells the server which boxes are allowed to connect using which type of credentials. Entire IP ranges can be allowed or simply discarded through pg_hba.conf. To enable replication, we have to add one line for each IP range we want to allow. The following listing contains an example of a valid configuration: # TYPE DATABASE USER ADDRESS METHODhost replication all 192.168.0.34/32 md5 In this case we allow replication connections from 192.168.0.34. The IP range is identified by 32 (which simply represents a single server in our case). We have decided to use MD5 as our authentication method. It means that the pg_basebackup has to supply a password to the server. If you are doing this in a non-security critical environment, using trust as authentication method might also be an option. What happens if you actually have a database called replication in your system? Basically, setting the database to replication will just configure your streaming behavior, if you want to put in rules dealing with the database called replication, you have to quote the database name as follows: "replication". However, we strongly advise not to do this kind of trickery to avoid confusion. Signaling the master server Once pg_hba.conf has been changed, we can tell PostgreSQL to reload the configuration. There is no need to restart the database completely. We have three options to make PostgreSQL reload pg_hba.conf: By running an SQL command: SELECT pg_reload_conf(); By sending a signal to the master: kill –HUP 4711 (with 4711 being the process ID of the master) By calling pg_ctl: pg_ctl –D $PGDATA reload (with $PGDATA being the home directory of your database instance) Once we have told the server acting as data source to accept streaming connections, we can move forward and run pg_basebackup. pg_basebackup – basic features pg_basebackup is a very simple-to-use command-line tool for PostgreSQL. It has to be called on the target system and will provide you with a ready-to-use base backup, which is ready to consume the transaction log for Point-In-Time-Recovery. The syntax of pg_basebackup is as follows: iMac:dbhs$ pg_basebackup --help pg_basebackup takes a base backup of a running PostgreSQL server. Usage: pg_basebackup [OPTION]... Options controlling the output: -D, --pgdata=DIRECTORY receive base backup into directory -F, --format=p|t output format (plain (default), tar) -x, --xlog include required WAL files in backup (fetch mode) -X, --xlog-method=fetch|stream include required WAL files with specified method -z, --gzip compress tar output -Z, --compress=0-9 compress tar output with given compression level General options: -c, --checkpoint=fast|spread set fast or spread checkpointing -l, --label=LABEL set backup label -P, --progress show progress information -v, --verbose output verbose messages -V, --version output version information, then exit -?, --help show this help, then exit Connection options: -h, --host=HOSTNAME database server host or socket directory -p, --port=PORT database server port number -s, --status-interval=INTERVAL time between status packets sent to server (in seconds) -U, --username=NAME connect as specified database user -w, --no-password never prompt for password -W, --password force password prompt (should happen automatically) A basic call to pg_basebackup would look like that: iMac:dbhs$ pg_basebackup -D /target_directory -h sample.postgresql-support.de In this example, we will fetch the base backup from sample.postgresql-support.de and put it into our local directory called /target_directory. It just takes this simple line to copy an entire database instance to the target system. When we create a base backup as shown in this section, pg_basebackup will connect to the server and wait for a checkpoint to happen before the actual copy process is started. In this mode, this is necessary because the replay process will start exactly at this point in the XLOG. The problem is that it might take a while until a checkpoint kicks in; pg_basebackup does not enforce a checkpoint on the source server straight away to make sure that normal operations are not disturbed. If you don't want to wait on a checkpoint, consider using --checkpoint=fast. It will enforce an instant checkpoint and pg_basebackup will start copying instantly. By default, a plain base backup will be created. It will consist of all the files in directories found on the source server. If the base backup should be stored on tape, we suggest to give –-format=t a try. It will automatically create a TAR archive (maybe on a tape). If you want to move data to a tape, you can save an intermediate step easily this way. When using TAR, it is usually quite beneficial to use it in combination with --gzip to reduce the size of the base backup on disk. There is also a way to see a progress bar while doing the base backup but we don't recommend to use this option (--progress) because it requires pg_basebackup to determine the size of the source instance first, which can be costly. pg_basebackup – self-sufficient backups Usually a base backup without XLOG is pretty worthless. This is because the base backup is taken while the master is fully operational. While the backup is taken, those storage files in the database instance might have been modified heavily. The purpose of the XLOG is to fix those potential issues in the data files reliably. But, what if we want to create a base backup, which can live without (explicitly archived) XLOG? In this case, we can use the --xlog-method=stream option. If this option has been chosen, pg_basebackup will not just copy the data as it is but it will also stream the XLOG being created during the base backup to our destination server. This will provide us with just enough XLOG to allow us to start a base backup made that way directly. It is self-sufficient and does not need additional XLOG files. This is not Point-In-Time-Recovery but it can come in handy in case of trouble. Having a base backup, which can be started right away, is usually a good thing and it comes at fairly low cost. Please note that --xlog-method=stream will require two database connections to the source server, not just one. You have to keep that in mind when adjusting max_wal_senders on the source server. If you are planning to use Point-In-Time-Recovery and if there is absolutely no need to start the backup as it is, you can safely skip the XLOG and save some space this way (default mode). Making use of traditional methods to create base backups These days pg_basebackup is the most common way to get an initial copy of a database server. This has not always been the case. Traditionally, a different method has been used which works as follows: Call SELECT pg_start_backup('some label'); Copy all data files to the remote box through rsync or any other means. Run SELECT pg_stop_backup(); The main advantage of this old method is that there is no need to open a database connection and no need to configure XLOG-streaming infrastructure on the source server. A main advantage is also that you can make use of features such as ZFS-snapshots or similar means, which can help to dramatically reduce the amount of I/O needed to create an initial backup. Once you have started pg_start_backup, there is no need to hurry. It is not necessary and not even especially desirable to leave the backup mode soon. Nothing will happen if you are in backup mode for days. PostgreSQL will archive the transaction log as usual and the user won't face any kind of downside. Of course, it is bad habit not to close backups soon and properly. However, the way PostgreSQL works internally does not change when a base backup is running. There is nothing filling up, no disk I/O delayed, or anything of this sort. Tablespace issues If you happen to use more than one tablespace, pg_basebackup will handle this just fine if the filesystem layout on the target box is identical to the filesystem layout on the master. However, if your target system does not use the same filesystem layout there is a bit more work to do. Using the traditional way of doing the base backup might be beneficial in this case. In case you are using --format=t (for TAR), you will be provided with one TAR file per tablespace. Keeping an eye on network bandwidth Let us imagine a simple scenario involving two servers. Each server might have just one disk (no SSDs). Our two boxes might be interconnected through a 1 Gbit link. What will happen to your applications if the second server starts to run a pg_basebackup? The second box will connect, start to stream data at full speed and easily kill your hard drive by using the full bandwidth of your network. An application running on the master might instantly face disk wait and offer bad response times. Therefore it is highly recommended to control the bandwidth used up by rsync to make sure that your business applications have enough spare capacity (mainly disk, CPU is usually not an issue). If you want to limit rsync to, say, 20 MB/sec, you can simply use rsync --bwlimit=20000. This will definitely make the creation of the base backup take longer but it will make sure that your client apps will not face problems. In general we recommend a dedicated network interconnect between master and slave to make sure that a base backup does not affect normal operations. Limiting bandwidth cannot be done with pg_basebackup onboard functionality.Of course, you can use any other tool to copy data and achieve similar results. If you are using gzip compression with –-gzip, it can work as an implicit speed brake. However, this is mainly a workaround. Replaying the transaction log Once we have created ourselves a shiny initial base backup, we can collect the XLOG files created by the database. When the time has come, we can take all those XLOG files and perform our desired recovery process. This works as described in this section. Performing a basic recovery In PostgreSQL, the entire recovery process is governed by a file named recovery.conf, which has to reside in the main directory of the base backup. It is read during startup and tells the database server where to find the XLOG archive, when to end replay, and so forth. To get you started, we have decided to include a simple recovery.conf sample file for performing a basic recovery process: restore_command = 'cp /archive/%f %p'recovery_target_time = '2013-10-10 13:43:12' The restore_command is essentially the exact counterpart of the archive_command you have seen before. While the archive_command is supposed to put data into the archive, the restore_command is supposed to provide the recovering instance with the data file by file. Again, it is a simple shell command or a simple shell script providing one chunk of XLOG after the other. The options you have here are only limited by imagination; all PostgreSQL does is to check for the return code of the code you have written, and replay the data provided by your script. Just like in postgresql.conf, we have used %p and %f as placeholders; the meaning of those two placeholders is exactly the same. To tell the system when to stop recovery, we can set the recovery_target_time. The variable is actually optional. If it has not been specified, PostgreSQL will recover until it runs out of XLOG. In many cases, simply consuming the entire XLOG is a highly desirable process; if something crashes, you want to restore as much data as possible. But, it is not always so. If you want to make PostgreSQL stop recovery at a specific point in time, you simply have to put the proper date in. The crucial part here is actually to know how far you want to replay XLOG; in a real work scenario this has proven to be the trickiest question to answer. If you happen to a recovery_target_time, which is in the future, don't worry, PostgreSQL will start at the very last transaction available in your XLOG and simply stop recovery. The database instance will still be consistent and ready for action. You cannot break PostgreSQL, but, you might break your applications in case data is lost because of missing XLOG. Before starting PostgreSQL, you have to run chmod 700 on the directory containing the base backup, otherwise, PostgreSQL will error out: iMac:target_directoryhs$ pg_ctl -D /target_directorystartserver startingFATAL: data directory "/target_directory" has group or world accessDETAIL: Permissions should be u=rwx (0700). This additional security check is supposed to make sure that your data directory cannot be read by some user accidentally. Therefore an explicit permission change is definitely an advantage from a security point of view (better safe than sorry). Now that we have all the pieces in place, we can start the replay process by starting PostgreSQL: iMac:target_directoryhs$ pg_ctl –D /target_directory startserver startingLOG: database system was interrupted; last known up at 2013-03-1018:04:29 CETLOG: creating missing WAL directory "pg_xlog/archive_status"LOG: starting point-in-time recovery to 2013-10-10 13:43:12+02LOG: restored log file "000000010000000000000006" from archiveLOG: redo starts at 0/6000020LOG: consistent recovery state reached at 0/60000B8LOG: restored log file "000000010000000000000007" from archiveLOG: restored log file "000000010000000000000008" from archiveLOG: restored log file "000000010000000000000009" from archiveLOG: restored log file "00000001000000000000000A" from archivecp: /tmp/archive/00000001000000000000000B: No such file ordirectoryLOG: could not open file "pg_xlog/00000001000000000000000B" (logfile 0, segment 11): No such file or directoryLOG: redo done at 0/AD5CE40LOG: last completed transaction was at log time 2013-03-1018:05:33.852992+01LOG: restored log file "00000001000000000000000A" from archivecp: /tmp/archive/00000002.history: No such file or directoryLOG: selected new timeline ID: 2cp: /tmp/archive/00000001.history: No such file or directoryLOG: archive recovery completeLOG: database system is ready to accept connectionsLOG: autovacuum launcher started The amount of log produced by the database tells us everything we need to know about the restore process and it is definitely worth investigating this information in detail. The first line indicates that PostgreSQL has found out that it has been interrupted and that it has to start recovery. From the database instance point of view, a base backup looks more or less like a crash needing some instant care by replaying XLOG; this is precisely what we want. The next couple of lines (restored log file ...) indicate that we are replaying one XLOG file after the other that have been created since the base backup. It is worth mentioning that the replay process starts at the sixth file. The base backup knows where to start, so PostgreSQL will automatically look for the right XLOG file. The message displayed after PostgreSQL reaches the sixth file (consistent recovery state reached at 0/60000B8) is of importance. PostgreSQL states that it has reached a consistent state. This is important. The reason is that the data files inside a base backup are actually broken by definition, but, the data files are not broken beyond repair. As long as we have enough XLOG to recover, we are very well off. If you cannot reach a consistent state, your database instance will not be usable and your recovery cannot work without providing additional XLOG. Practically speaking, not being able to reach a consistent state usually indicates a problem somewhere in your archiving process and your system setup. If everything up to now has been working properly, there is no reason not to reach a consistent state. Once we have reached a consistent state, one file after the other will be replayed successfully until the system finally looks for the 00000001000000000000000B file. The problem is that this file has not been created by the source database instance. Logically, an error pops up. Not finding the last file is absolutely normal; this type of error is expected if the recovery_target_time does not ask PostgreSQL to stop recovery before it reaches the end of the XLOG stream. Don't worry, your system is actually fine. You have successfully replayed everything to the file showing up exactly before the error message. As soon as all the XLOG has been consumed and the error message discussed earlier has been issued, PostgreSQL reports the last transaction it was able or supposed to replay, and starts up. You have a fully recovered database instance now and you can connect to the database instantly. As soon as the recovery has ended, recovery.conf will be renamed by PostgreSQL to recovery.done to make sure that it does not do any harm when the new instance is restarted later on at some point. More sophisticated positioning in the XLOG Up to now, we have recovered a database up to the very latest moment available in our 16 MB chunks of transaction log. We have also seen that you can define the desired recovery timestamp. But the question now is: How do you know which point in time to perform the recovery to? Just imagine somebody has deleted a table during the day. What if you cannot easily determine the recovery timestamp right away? What if you want to recover to a certain transaction? recovery.conf has all you need. If you want to replay until a certain transaction, you can refer to recovery_target_xid. Just specify the transaction you need and configure recovery_target_inclusive to include this very specific transaction or not. Using this setting is technically easy but as mentioned before, it is not easy by far to find the right position to replay to. In a typical setup, the best way to find a reasonable point to stop recovery is to use pause_at_recovery_target. If this is set to true, PostgreSQL will not automatically turn into a productive instance if the recovery point has been reached. Instead, it will wait for further instructions from the database administrator. This is especially useful if you don't know exactly how far to replay. You can replay, log in, see how far the database is, change to the next target time, and continue replaying in small steps. You have to set hot_standby = on in postgresql.conf to allow reading during recovery. Resuming recovery after PostgreSQL has paused can be done by calling a simple SQL statement: SELECT pg_xlog_replay_resume(). It will make the instance move to the next position you have set in recovery.conf. Once you have found the right place, you can set the pause_at_recovery_target back to false and call pg_xlog_replay_resume. Alternatively, you can simply utilize pg_ctl –D ... promote to stop recovery and make the instance operational. Was this explanation too complicated? Let us boil it down to a simple list: Add a restore_command to the recovery.conf file. Add recovery_target_time to the recovery.conf file. Set pause_at_recovery_target to true in the recovery.conf file. Set hot_standby to on in postgresql.conf file. Start the instance to be recovered. Connect to the instance once it has reached a consistent state and as soon as it stops recovering. Check if you are already where you want to be. If you are not: Change recovery_target_time. Run SELECT pg_xlog_replay_resume(). Check again and repeat this section if it is necessary. Keep in mind that once recovery has finished and once PostgreSQL has started up as a normal database instance, there is (as of 9.2) no way to replay XLOG later on. Instead of going through this process, you can of course always use filesystem snapshots. A filesystem snapshot will always work with PostgreSQL because when you restart a snapshotted database instance, it will simply believe that it had crashed before and recover normally. Cleaning up the XLOG on the way Once you have configured archiving, you have to store the XLOG being created by the source server. Logically, this cannot happen forever. At some point, you really have to get rid of this XLOG; it is essential to have a sane and sustainable cleanup policy for your files. Keep in mind, however, that you must keep enough XLOG so that you can always perform recovery from the latest base backup. But if you are certain that a specific base backup is not needed anymore, you can safely clean out all the XLOG that is older than the base backup you want to keep. How can an administrator figure out what to delete? The best method is to simply take a look at your archive directory: 000000010000000000000005000000010000000000000006000000010000000000000006.00000020.backup000000010000000000000007000000010000000000000008 Check out the filename in the middle of the listing. The .backup file has been created by the base backup. It contains some information about the way the base backup has been made and tells the system where to continue replaying the XLOG. If the backup file belongs to the oldest base backup you need to keep around, you can safely erase all the XLOG lower than file number 6; in this case, file number 5 could be safely deleted. In our case, 000000010000000000000006.00000020.backup contains the following information: START WAL LOCATION: 0/6000020 (file 000000010000000000000006)STOP WAL LOCATION: 0/60000E0 (file 000000010000000000000006)CHECKPOINT LOCATION: 0/6000058BACKUP METHOD: streamedBACKUP FROM: masterSTART TIME: 2013-03-10 18:04:29 CETLABEL: pg_basebackup base backupSTOP TIME: 2013-03-10 18:04:30 CET The .backup file will also provide you with relevant information such as the time the base backup has been made. It is plain there and so it should be easy for ordinary users to read this information. As an alternative to deleting all the XLOG files at one point, it is also possible to clean them up during replay. One way is to hide an rm command inside your restore_command. While this is technically possible, it is not necessarily wise to do so (what if you want to recover again?). Also, you can add the recovery_end_command command to your recovery.conf file. The goal of recovery_end_command is to allow you to automatically trigger some action as soon as the recovery ends. Again, PostgreSQL will call a script doing precisely what you want. You can easily abuse this setting to clean up the old XLOG when the database declares itself active. Switching the XLOG files If you are going for an XLOG file-based recovery, you have seen that one XLOG will be archived every 16 MB. What would happen if you never manage to create 16 MB of changes? What if you are a small supermarket, which just makes 50 sales a day? Your system will never manage to fill up 16 MB in time. However, if your system crashes, the potential data loss can be seen as the amount of data in your last unfinished XLOG file. Maybe this is not good enough for you. A postgresql.conf setting on the source database might help. The archive_timeout tells PostgreSQL to create a new XLOG file at least every x seconds. So, if you are this little supermarket, you can ask the database to create a new XLOG file every day shortly before you are heading for home. In this case, you can be sure that the data of the day will safely be on your backup device already. It is also possible to make PostgreSQL switch to the next XLOG file by hand. A procedure named pg_switch_xlog() is provided by the server to do the job: test=# SELECT pg_switch_xlog();pg_switch_xlog----------------0/17C0EF8(1 row) You might want to call this procedure when some important patch job has finished or if you want to make sure that a certain chunk of data is safely in your XLOG archive. Summary In this article, you have learned about Point-In-Time-Recovery, which is a safe and easy way to restore your PostgreSQL database to any desired point in time. PITR will help you to implement better backup policies and make your setups more robust. Resources for Article: Further resources on this subject: Introduction to PostgreSQL 9 [Article] PostgreSQL: Tips and Tricks [Article] PostgreSQL 9: Reliable Controller and Disk Setup [Article]
Read more
  • 0
  • 0
  • 5261

article-image-using-gerrit-github
Packt
04 Sep 2013
14 min read
Save for later

Using Gerrit with GitHub

Packt
04 Sep 2013
14 min read
In this article by Luca Milanesio, author of the book Learning Gerrit Code review, we will learn about Gerrit Code revew. GitHub is the world's largest platform for the free hosting of Git Projects, with over 4.5 million registered developers. We will now provide a step-by-step example of how to connect Gerrit to an external GitHub server so as to share the same set of repositories. Additionally, we will provide guidance on how to use the Gerrit Code Review workflow and GitHub concurrently. By the end of this article we will have our Gerrit installation fully integrated and ready to be used for both open source public projects and private projects on GitHub. (For more resources related to this topic, see here.) GitHub workflow GitHub has become the most popular website for open source projects, thanks to the migration of some major projects to Git (for example, Eclipse) and new projects adopting it, along with the introduction of the social aspect of software projects that piggybacks on the Facebook hype. The following diagram shows the GitHub collaboration model: The key aspects of the GitHub workflow are as follows: Each developer pushes to their own repository and pulls from others Developers who want to make a change to another repository, create a fork on GitHub and work on their own clone When forked repositories are ready to be merged, pull requests are sent to the original repository maintainer The pull requests include all of the proposed changes and their associated discussion threads Whenever a pull request is accepted, the change is merged by the maintainer and pushed to their repository on GitHub   GitHub controversy The preceding workflow works very effectively for most open source projects; however, when the projects gets bigger and more complex, the tools provided by GitHub are too unstructured, and a more defined review process with proper tools, additional security, and governance is needed. In May 2012 Linus Torvalds , the inventor of Git version control, openly criticized GitHub as a commit editing tool directly on the pull request discussion thread: " I consider GitHub useless for these kinds of things. It's fine for hosting, but the pull requests and the online commit editing, are just pure garbage " and additionally, " the way you can clone a (code repository), make changes on the web, and write total crap commit messages, without GitHub in any way making sure that the end result looks good. " See https://github.com/torvalds/linux/pull/17#issuecomment-5654674. Gerrit provides the additional value that Linus Torvalds claimed was missing in the GitHub workflow: Gerrit and GitHub together allows the open source development community to reuse the extended hosting reach and social integration of GitHub with the power of governance of the Gerrit review engine. GitHub authentication The list of authentication backends supported by Gerrit does not include GitHub and it cannot be used out of the box, as it does not support OpenID authentication. However, a GitHub plugin for Gerrit has been recently released in order to fill the gaps and allow a seamless integration. GitHub implements OAuth 2.0 for allowing external applications, such as Gerrit, to integrate using a three-step browser-based authentication. Using this scheme, a user can leverage their existing GitHub account without the need to provision and manage a separate one in Gerrit. Additionally, the Gerrit instance will be able to self-provision the SSH public keys needed for pushing changes for review. In order for us to use GitHub OAuth authentication with Gerrit, we need to do the following: Build the Gerrit GitHub plugin Install the GitHub OAuth filter into the Gerrit libraries (/lib under the Gerrit site directory) Reconfigure Gerrit to use the HTTP authentication type   Building the GitHub plugin The Gerrit GitHub plugin can be found under the Gerrit plugins/github repository on https://gerrit-review.googlesource.com/#/admin/projects/plugins/github. It is open source under the Apache 2.0 license and can be cloned and built using the Java 6 JDK and Maven. Refer to the following example: $ git clone https://gerrit.googlesource.com/plugins/github $ cd github $ mvn install […] [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------- [INFO] Total time: 9.591s [INFO] Finished at: Wed Jun 19 18:38:44 BST 2013 [INFO] Final Memory: 12M/145M [INFO] ------------------------------------------------------- The Maven build should generate the following artifacts: github-oauth/target/github-oauth*.jar, the GitHub OAuth library for authenticating Gerrit users github-plugin/target/github-plugin*.jar, the Gerrit plugin for integrating with GitHub repositories and pull requests Installing GitHub OAuth library The GitHub OAuth JAR file needs to copied to the Gerrit /lib directory; this is required to allow Gerrit to use it for filtering all HTTP requests and enforcing the GitHub three-step authentication process: $ cp github-oauth/target/github-oauth-*.jar /opt/gerrit/lib/ Installing GitHub plugin The GitHub plugin includes the additional support for the overall configuration, the advanced GitHub repositories replication, and the integration of pull requests into the Code Review process. We now need to install the plugin before running the Gerrit init again so that we can benefit from the simplified automatic configuration steps: $ cp github-plugin/target/github-plugin-*.jar /opt/gerrit/plugins/github.jar Register Gerrit as a GitHub OAuth application Before going through the Gerrit init, we need to tell GitHub to trust Gerrit as a partner application. This is done through the generation of a ClientId/ClientSecret pair associated to the exact Gerrit URLs that will be used for initiating the 3-step OAuth authentication. We can register a new application in GitHub through the URL https://github.com/settings/applications/new, where the following three fields are requested: Application name : It is the logical name of the application authorized to access GitHub, for example, Gerrit. Main URL : The Gerrit canonical web URL used for redirecting to GitHub OAuth authentication, for example, https://myhost.mydomain:8443. Callback URL : The URL that GitHub should redirect to when the OAuth authentication is successfully completed, for example, https://myhost.mydomain:8443/oauth. GitHub will automatically generate a unique pair ClientId/ClientSecret that has to be provided to Gerrit identifying them as a trusted authentication partner. ClientId/ClientSecret are not GitHub credentials and cannot be used by an interactive user to access any GitHub data or information. They are only used for authorizing the integration between a Gerrit instance and GitHub. Running Gerrit init to configure GitHub OAuth We now need to stop Gerrit and go through the init steps again in order to reconfigure the Gerrit authentication. We need to enable HTTP authentication by choosing an HTTP header to be used to verify the user's credentials, and to go through the GitHub settings wizard to configure the OAuth authentication. $ /opt/gerrit/bin/gerrit.sh stop Stopping Gerrit Code Review: OK $ cd /opt/gerrit $ java -jar gerrit.war init [...] *** User Authentication *** Authentication method []: HTTP RETURN Get username from custom HTTP header [Y/n]? Y RETURN Username HTTP header []: GITHUB_USER RETURN SSO logout URL : /oauth/reset RETURN *** GitHub Integration *** GitHub URL [https://github.com]: RETURN Use GitHub for Gerrit login ? [Y/n]? Y RETURN ClientId []: 384cbe2e8d98192f9799 RETURN ClientSecret []: f82c3f9b3802666f2adcc4 RETURN Initialized /opt/gerrit $ /opt/gerrit/bin/gerrit.sh start Starting Gerrit Code Review: OK   Using GitHub login for Gerrit Gerrit is now fully configured to register and authenticate users through GitHub OAuth. When opening the browser to access any Gerrit web pages, we are automatically redirected to the GitHub for login. If we have already visited and authenticated with GitHub previously, the browser cookie will be automatically recognized and used for the authentication, instead of presenting the GitHub login page. Alternatively, if we do not yet have a GitHub account, we create a new GitHub profile by clicking on the SignUp button. Once the authentication process is successfully completed, GitHub requests the user's authorization to grant access to their public profile information. The following screenshot shows GitHub OAuth authorization for Gerrit: The authorization status is then stored under the user's GitHub applications preferences on https://github.com/settings/applications. Finally, GitHub redirects back to Gerrit propagating the user's profile securely using a one-time code which is used to retrieve the full data profile including username, full name, e-mail, and associated SSH public keys. Replication to GitHub The next steps in the Gerrit to GitHub integration is to share the same Git repositories and then keep them up-to-date; this can easily be achieved by using the Gerrit replication plugin. The standard Gerrit replication is a master-slave, where Gerrit always plays the role of the master node and pushes to remote slaves. We will refer to this scheme as push replication because the actual control of the action is given to Gerrit through a git push operation of new commits and branches. Configure Gerrit replication plugin In order to configure push replication we need to enable the Gerrit replication plugin through Gerrit init: $ /opt/gerrit/bin/gerrit.sh stop Stopping Gerrit Code Review: OK $ cd /opt/gerrit $ java -jar gerrit.war init [...] *** Plugins *** Prompt to install core plugins [y/N]? y RETURN Install plugin reviewnotes version 2.7-rc4 [y/N]? RETURN Install plugin commit-message-length-validator version 2.7-rc4 [y/N]? RETURN Install plugin replication version 2.6-rc3 [y/N]? y RETURN Initialized /opt/gerrit $ /opt/gerrit/bin/gerrit.sh start Starting Gerrit Code Review: OK The Gerrit replication plugin relies on the replication.config file under the /opt/gerrit/etc directory to identify the list of target Git repositories to push to. The configuration syntax is a standard .ini format where each group section represents a target replica slave. See the following simplest replication.config script for replicating to GitHub: [remote "github"] url = git@github.com:myorganisation/${name}.git The preceding configuration enables all of the repositories in Gerrit to be replicated to GitHub under the myorganisa tion GitHub Team account. Authorizing Gerrit to push to GitHub Now, that Gerrit knows where to push, we need GitHub to authorize the write operations to its repositories. To do so, we need to upload the SSH public key of the underlying OS user where Gerrit is running to one of the accounts in the GitHub myorganisation team, with the permissions to push to any of the GitHub repositories. Assuming that Gerrit runs under the OS user gerrit, we can copy and paste the SSH public key values from the ~gerrit/.ssh/id_rsa.pub (or ~gerrit/.ssh/id_dsa.pub) to the Add an SSH Key section of the GitHub account under target URL to be set to: https://github.com/settings/ssh Start working with Gerrit replication Everything is now ready to start playing with Gerrit to GitHub replication. Whenever a change to a repository is made on Gerrit, it will be automatically replicated to the corresponding GitHub repository. In reality there is one additional operation that is needed on the GitHub side: the actual creation of the empty repositories using https://github.com/new associated to the ones created in Gerrit. We need to make sure that we select the organization name and repository name, consistent with the ones defined in Gerrit and in the replication.config file. Never initialize the repository from GitHub with an empty commit or readme file; otherwise the first replication attempt from Gerrit will result in a conflict and will then fail. Now GitHub and Gerrit are fully connected and whenever a repository in GitHub matches one of the repositories in Gerrit, it will be linked and synchronized with the latest set of commits pushed in Gerrit. Thanks to the Gerrit-GitHub authentication previously configured, Gerrit and GitHub share the same set of users and the commits authors will be automatically recognized and formatted by GitHub. The following screenshot shows Gerrit commits replicated to GitHub: Reviewing and merging to GitHub branches The final goal of the Code Review process is to agree and merge changes to their branches. The merging strategies need to be aligned with real-life scenarios that may arise when using Gerrit and GitHub concurrently. During the Code Review process the alignment between Gerrit and GitHub was at the change level, not influenced by the evolution of their target branches. Gerrit changes and GitHub pull requests are isolated branches managed by their review lifecycle. When a change is merged, it needs to align with the latest status of its target branch using a fast-forward, merge, rebase, or cherry-pick strategy. Using the standard Gerrit merge functionality, we can apply the configured project merge strategy to the current status of the target branch on Gerrit. The situation on GitHub may have changed as well, so even if the Gerrit merge has succeeded there is no guarantee that the actual subsequent synchronization to GitHub will do the same! The GitHub plugin mitigates this risk by implementing a two-phase submit + merge operation for merging opened changes as follows: Phase-1 : The change target branch is checked against its remote peer on GitHub and fast forwarded if needed. If two branches diverge, the submit + merge is aborted and manual merge intervention is requested. Phase-2 : The change is merged on its target branch in Gerrit and an additional ad hoc replication is triggered. If the merge succeeds then the GitHub pull request is marked as completed. At the end of Phase-2 the Gerrit and GitHub statuses will be completely aligned. The pull request author will then receive the notification that his/her commit has been merged. Using Gerrit and GitHub on http://gerrithub.io When using Gerrit and GitHub on the web with public or private repositories, all of the commits are replicated from Gerrit to GitHub, and each one of them has a complete copy of the data. If we are using a Git and collaboration server on GitHub over the Internet, why can't we do the same for its Gerrit counterpart? Can we avoid installing a standalone instance of Gerrit just for the purpose of going through a formal Code Review? One hassle-free solution is to use the GerritHub service (http://gerrithub.io), which offers a free Gerrit instance on the cloud already configured and connected with GitHub through the github-plugin and github-oauth authentication library. All of the flows that we have covered in this article are completely automated, including the replication and automatic pull request to change automation. As accounts are shared with GitHub, we do not need to register or create another account to use GerritHub; we can just visit http://gerrithub.io and start using Gerrit Code Review with our existing GitHub projects without having to teach our existing community about a new tool. GerritHub also includes an initial setup Wizard for the configuration and automation of the Gerrit projects and the option to configure the Gerrit groups using the existing GitHub. Once Gerrit is configured, the Code Review and GitHub can be used seamlessly for achieving maximum control and social reach within your developer community. Summary We have now integrated our Gerrit installation with GitHub authentication for a seamless Single-Sign-On experience. Using an existing GitHub account we started using Gerrit replication to automatically mirror all the commits to GitHub repositories, allowing our projects to have an extended reach to external users, free to fork our repositories, and to contribute changes as pull requests. Finally, we have completed our Code Review in Gerrit and managed the merge to GitHub with a two-phase change submit + merge process to ensure that the target branches on both Gerrit and GitHub have been merged and aligned accordingly. Similarly to GitHub, this Gerrit setup can be leveraged for free on the web without having to manage a separate private instance, thanks to the free set target URL to http://gerrithub.io service available on the cloud. Resources for Article : Further resources on this subject: Getting Dynamics NAV 2013 on Your Computer – For (Almost) Free [Article] Building Your First Zend Framework Application [Article] Quick start - your first Sinatra application [Article]
Read more
  • 0
  • 1
  • 51232
article-image-article-rapid-development
Packt
04 Sep 2013
7 min read
Save for later

Rapid Development

Packt
04 Sep 2013
7 min read
(For more resources related to this topic, see here.) Concept of reusability The concept of reusability has its roots in the production process. Typically, most of us go about creating e-learning using a process similar to what is shown in the following screenshot. It works well for large teams and the one man band, except in the latter case, you become a specialist for all the stages of production. That's a heavy load. It's hard to be good at all things and it demands that you constantly stretch and improve your skills, and find ways to increase the efficiency of what you do. Reusability in Storyline is about leveraging the formatting, look and feel and interactions you create so that you can re-purpose your work and speed-up production. Not every project will be an original one-off, in fact most won't, so the concept is to approach development with a plan to repurpose 80 percent of the media, quizzes, interactions, and designs you create. As you do this, you begin to establish processes, templates, and libraries that can be used to rapidly assemble base courses. With a little tweaking and some minor customization, you'll have a new, original course in no time. Your client doesn't need to know that 80 percent was made from reusable elements with just 20 percent created as original, unique components, but you'll know the difference in terms of time and effort. Leveraging existing assets So how can you leverage existing assets with Storyline? The first things you'll want to look at are the courses you've built with other authoring programs, such as PowerPoint, QuizMaker Engage, Captivate, Flash, and Camtasia. If there are design themes, elements, or interactions within these courses that you might want to use for future Storyline courses, you should focus your efforts on importing what you can, and further adjusting within Storyline to create a new version of the asset that can be reused for future Storyline courses. If re-working the asset is too complex or if you don't expect to reuse it in multiple courses, then using Storyline's web object feature to embed the interaction without re-working it in any way may be the better approach. In both cases, you'll save time by reusing content you've already put a lot of time in developing. Importing external content Here are the steps to bring external content into Storyline: From the Articulate Startup screen or by choosing the Insert tab, and then New Slide within a project, select the Import option. There are options to import PowerPoint, Quizmaker, and Storyline. All of these will display the slides within the file to be imported. You can pick and choose which slides to import into a new or the current scene in Storyline. The Engage option displays the entire interaction that can be imported into a single slide in the current or a new scene. Click on Import to complete the process. Considerations when importing Keep the following points in mind when importing: PowerPoint and Quizmaker files can be imported directly into Storyline. Once imported, you can edit the content like you would any other Storyline slide. Master slides come along with the import making it simple to reuse previous designs. Note that 64-bit PowerPoint is not supported and you must have an installed, activated version of Quizmaker for the import to work. The PowerPoint to Storyline conversion is not one-to-one. You can expect some alignment issues with slide objects due to the fact that PowerPoint uses points and Storyline uses pixels. There are 2.66 pixels for each point which is why you'll need to tweak the imported slides just a bit. Same with Quizmaker though the reason why is slightly different; Quizmaker is 686 x 424 in size, whereas Storyline is 720 x 540 by default. Engage files can be imported into Storyline and they are completely functional, but cannot be edited within Storyline. Though the option to import Engage appears on the Import screen, what Storyline is really doing is creating a web object to contain the Engage interaction. Once imported into a new scene, clicking on the Engage interaction will display an Options menu where you can make minor adjustments to the behavior of the interaction as well as Preview and Edit in it Engage. You can also resize and position the interaction just as you would any web object. Remember that though web objects work in iPad and HTML5 outputs, Engage content is Flash, so it will not playback on an iPad or in an HTML5 browser. Like Quizmaker, you'll need an installed, activated version of Engage for the import to work. Flash, Captivate, and Camtasia files cannot be imported in Storyline and cannot be edited within Storyline. You can however, use web objects to embed these projects into Storyline or the Insert Flash option. In both cases, the imported elements appear seamless to the learner while retaining full functionality.   Build once, and reuse many times Quizzing is at the heart of many e-learning courses where often the quiz questions need to be randomized or even reused in different sections of a single course (that is, the same questions for a pre and post-test). The concept of building once and reusing many times works well with several aspects of Storyline. We'll start with quizzing and a feature called Question Banks as follows: Question Banks Question Bank offers a way to pool, reuse, and randomize questions within a project. Slides in a question bank are housed within the project file but are not visible until placed into the story. Question Banks can include groups of quiz slides and regular slides (that is, you might include a regular slide if you need to provide instructions for the quiz or would like to include a post-quiz summary). When you want to include questions from a Question Bank, you just need to insert a new Quizzing slide, and then choose Draw from Bank . You can then select one or more questions to include and randomize them if desired. Follow along… In this exercise we will be removing three questions from a scene and moving them into a question bank. This will allow you to draw one or more of those questions at any point in the project where the quiz questions are needed, as follows: From the Home tab, choose Question Banks , and then Create Question bank . Title this Identity Theft Questions . Notice that a new tab has opened in Normal View . The Question Bank appears in this tab. Click on the Import link and navigate to question slides 2, 3, and 4. From the Import drop-down menu at the top, select move questions into question bank . Click on the Story View tab and notice the three slides containing the quiz questions are no longer in the story. Click back on the Identity Theft tab and notice that they are located here. The questions will not become a part of the story until the next step, when you draw them from the bank. In Story View, click once on slide 1 to select it, and then from the Home tab, choose Question Banks and New Draw from Question Bank . From the Question Bank drop-down menu, select Identity Theft Questions . All questions will be selected by default and will be randomized after being placed into the story. This means that the learner will need to answer three questions before continuing onto the next slide in the story. Click on Insert . The Question Bank draw has been inserted as slide 2. To see how this works, Preview the scene. Save as Exercise 11 – Identity Theft Quiz.   There are multiple ways to get back to the questions that are in a question bank. You can do this by selecting the tab the questions are located in (in this case, Identity Theft ), you can view the question bank slide in Normal View or choose Question Banks from the Home tab and navigate to the name of the question bank you'd like to edit.
Read more
  • 0
  • 0
  • 1452

article-image-using-virtual-destinations-advanced
Packt
04 Sep 2013
5 min read
Save for later

Using Virtual Destinations (Advanced)

Packt
04 Sep 2013
5 min read
(For more resources related to this topic, see here.) Getting ready For this article we will use the sample application virtual-destinations to demonstrate the use of Virtual Destinations. How to do it... To run the sample for this article, open a terminal, change the path to point to the directory where virtual-destinations is located, and run it by typing mvn compile exec:java. In the terminal where you started the example, you will see output similar to the following snippet, indicating that the application is running: Starting Virtual Destination example now... Queue A Consumer 1 processed 500 Messages Queue A Consumer 2 processed 500 Messages Queue B Consumer processed 1000 Messages Finished running the Virtual Destination example. How it works... Our example takes advantage of ActiveMQ's Virtual Destination feature to allow a JMS Producer to send messages to a topic and have those messages be received by a number of queue consumers. Let's take a look at the sent code first: private void sendMessages() throws Exception { Connection connection = connectionFactory.createConnection(); Session session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE); Destination destination = session.createTopic("VirtualTopic.Foo"); MessageProducer producer = session.createProducer(destination); for (int i = 0; i < 1000; ++i) { producer.send(session.createMessage()); } connection.close(); } As we can see, there's not much new here; we are just using standard JMS API code to send our messages. The important part is in the name of the topic destination VirtualTopic.Foo, which informs ActiveMQ Broker we want this topic to be one of those special Virtual Destinations. Now let's take a look at the consumer code, and then we'll try and make sense of how this works: Queue queueA = receiverSession.createQueue("Consumer.A. VirtualTopic.Foo"); VirtualMessageListener listenerA1 = new VirtualMessageListener(done); MessageConsumer consumerA1 = receiverSession.createConsumer(queueA); consumerA1.setMessageListener(listenerA1); VirtualMessageListener listenerA2 = new VirtualMessageListener(done); MessageConsumer consumerA2 = receiverSession.createConsumer(queueA); consumerA2.setMessageListener(listenerA2);  Queue queueB = receiverSession.createQueue("Consumer.B. VirtualTopic.Foo"); VirtualMessageListener listenerB = new VirtualMessageListener(done); MessageConsumer consumerB = receiverSession.createConsumer(queueB); consumerB.setMessageListener(listenerB); As we can see, the consumer code is also not very mysterious; we create two consumers on the Consumer.A.VirtualTopic.Foo queue and one on the Consumer.B.VirtualTopic.Foo queue, and yet when we run the example, the two consumers on queue A split 500 of the topic message and the consumer on queue B gets its own copy of the 1,000 messages. So what's happening on the broker then to make this happen? By default, whenever a topic is created with a name matching the pattern VirtualTopic.<TopicName>, we gain access to the feature known as Virtual Destinations. These Virtual Destinations allow us to send a message to a topic but create consumers that have all the benefits of queue consumers, namely, those of load balancing and message persistence. Our queue consumers just need to use their own naming convention to access the Virtual Destination functionality. Each queue that we want to create must be named using the pattern Consumer.<Name>.VirtualTopic.<TopicName>. In the following figure we can see a visual representation of the message flow in our example: As messages are sent to our example's topic, they are distributed to both the queues we created. Two of our consumers split the work of processing messages from queue A, while the other consumer gets all the messages from queue B. In our example we didn't attach a listener to the topic itself, but we could have done so and it would also have received all the messages we sent. It's easy to imagine using the topic as a way to monitor the messages that are being distributed to the queue consumers in this scenario; this pattern is often referred to as a wiretap. If it isn't already apparent, Virtual Destinations are a very powerful feature for your messaging applications. Topics are great for broadcasting events but when we need to be able to consume messages that were sent while a client was offline, the only recourse is to use a durable topic subscription. Unfortunately, durable subscriptions have a number of limitations, the least of which is that only one JMS connection can be actively subscribed to a logical topic subscription. This means that we can't load balance messages and we can't have fast failover if a subscriber goes down. Virtual Destinations solve these problems since all of our consumers subscribe to a queue so their messages are persistent, and the work of processing the messages on the queue can be shared by more than one active subscription. There's more... The naming pattern for Virtual Destinations is not set in stone. By default, ActiveMQ is configured to use the pattern we used in our sample application, but you can easily change this. In the following XML snippet, we configured our broker to all topics on our broker into virtual topics. We use the wildcard syntax here, which matches every topic name a client sends messages to: <broker> <destinationInterceptors> <virtualDestinationInterceptor> <virtualDestinations><virtualTopic name=">" prefix="VirtualTopic Consumers.*."/> </virtualDestinations></virtualDestinationInterceptor> </destinationInterceptors> </broker> More information on Virtual Destinations can be found on the ActiveMQ website, http://activemq.apache.org/virtual-destinations.html. Summary This article thus explained what Virtual Destinations are and how to use them to avoid the limitations of JMS's durable topic subscriptions. Resources for Article: Further resources on this subject: Routing to an external ActiveMQ broker [Article] So, what is Apache Wicket? [Article] Apache Geronimo Logging [Article]
Read more
  • 0
  • 0
  • 5999
Modal Close icon
Modal Close icon