How-To Tutorials

article-image-setting-single-width-column-system-simple

05 Sep 2013

3 min read

Setting up a single-width column system (Simple)

05 Sep 2013

(For more resources related to this topic, see here.) Getting ready To perform the steps listed in this article, we will need a text editor, a browser, and a copy of the Masonry plugin. Any text editor will do, but my browser of choice is Google Chrome, as the V8 JavaScript engine that ships with it generally performs better and supports CSS3 transitions, and as a result we see smoother animations when resizing the browser window. We need to make sure we have a copy of the most recent version of Masonry, which was Version 2.1.08 at the time of writing this article. This version is compatible with the most recent version of jQuery, which is Version 1.9.1. A production copy of Masonry can be found on the GitHub repository at the following address: https://github.com/desandro/masonry/blob/master/jquery.masonry.min.js For jQuery, we will be using a content delivery network (CDN) for ease of development. Open the basic single-column HTML file to follow along. You can download this file from the following location: http://www.packtpub.com/sites/default/files/downloads/1-single-column.zip How to do it... Set up the styling for the masonry-item class with the proper width, padding, and margins. We want our items to have a total width of 200 pixels, including the padding and margins. <style> .masonry-item { background: #FFA500; float: left; margin: 5px; padding: 5px; width: 180px; }</style> Set up the HTML structure on which you are going to use Masonry. At a minimum, we need a tagged Masonry container with the elements inside tagged as Masonry items. <div id='masonry-container'> <div class='masonry-item '> Maecenas faucibus mollis interdum. </div> <div class='masonry-item '> Maecenas faucibus mollis interdum. Donec sed odio dui. Nullamquis risus eget urna mollis ornare vel eu leo. Vestibulum idligula porta felis euismod semper. </div> <div class='masonry-item '> Nullam quis risus eget urna mollis ornare vel eu leo. Crasjusto odio, dapibus ac facilisis in, egestas eget quam. Aeneaneu leo quam. Pellentesque ornare sem lacinia quam venenatisvestibulum. </div></div> All Masonry options need not be included, but it is recommended (by David DeSandro, the creator of Masonry) to set itemSelector for single-column usage. We will be setting this every time we use Masonry. <script> $(function() { $('#masonry-container').masonry({ // options itemSelector : '.masonry-item', }); });</script> How it works... Using jQuery, we select our Masonry container and use the itemSelector option to select the elements that will be affected by Masonry. The size of the columns will be determined by the CSS code. Using the box model, we set our Masonry items to a width of 90 px (80-px wide, with a 5-px padding all around the item). The margin is our gutter between elements, which is also 5-px wide. With this setup, we can con firm that we have built the basic single-column grid system, with each column being 100-px wide. The end result should look like the following screenshot: Summary This article showed you how to set up the very basic Masonry single-width column system around which Masonry revolves. Resources for Article : Further resources on this subject: Designing Site Layouts in Inkscape [Article] New features in Domino Designer 8.5 [Article] Using jQuery and jQueryUI Widget Factory plugins with RequireJS [Article]

0
0
1852

How-To Tutorials

article-image-using-indexes-manipulate-pandas-objects

Packt

05 Sep 2013

4 min read

Using indexes to manipulate pandas objects

Packt

05 Sep 2013

4 min read

(For more resources related to this topic, see here.) Getting ready A good understanding of indexes in pandas is crucial to quickly move the data around. From a business intelligence perspective, they create a distinction similar to that of metrics and dimensions in an OLAP cube. To illustrate this point, this recipe walks through getting stock data out of pandas, combining it, then reindexing it for easy chomping. How to do it... Use the DataReader object to transfer stock price information into a DataFrame and to explore the basic axis of Panel. > from pandas.i git push -u origin master o.data import DataReader > tickers = ['gs', 'ibm', 'f', 'ba', 'axp'] > dfs = {} > for ticker in tickers: dfs[ticker] = DataReader(ticker, "yahoo", '2006-01-01') # a yet undiscussed data structure, in the same way the a # DataFrame is a collection of Series, a Panel is a collection of # DataFrames > pan = pd.Panel(dfs) > pan <class 'pandas.core.panel.Panel'> Dimensions: 5 (items) x 1764 (major_axis) x 6 (minor_axis)Items axis: axp to ibm Major_axis axis: 2006-01-03 00:00:00 to 2013-01-04 00:00:00 Minor_axis axis: Open to Adj Close > pan.items Index([axp, ba, f, gs, ibm], dtype=object) > pan.minor_axis Index([Open, High, Low, Close, Volume, Adj Close], dtype=object) > pan.major_axis <class 'pandas.tseries.index.DatetimeIndex'>[2006-01-03 00:00:00, ..., 2013-01-04 00:00:00] Length: 1764, Freq: None, Timezone: None Use the axis selectors to easily compute different sets of summary statistics. > pan.minor_xs('Open').mean() axp 46.227466 ba 70.746451 f 9.135794 gs 151.655091 ibm 129.570969 # major axis is sliceable as well > day_slice = pan.major_axis[1] > pan.major_xs(day_slice)[['gs', 'ba']] ba gs Open 70.08 127.35 High 71.27 128.91 Low 69.86 126.38 Close 71.17 127.09 Volume 3165000.00 4861600.00 Adj Close 60.43 118.12 Convert the Panel to a DataFrame. > dfs = [] > for df in pan: idx = pan.major_axis idx = pd.MultiIndex.from_tuples(zip([df]*len(idx), idx)) idx.names = ['ticker', 'timestamp'] dfs.append(pd.DataFrame(pan[df].values, index=idx, columns=pan.minor_axis)) > df = pd.concat(dfs) > df Data columns: Open 8820 non-null values High 8820 non-null values Low 8820 non-null values Close 8820 non-null values Volume 8820 non-null values Adj Close 8820 non-null values dtypes: float64(6) Perform the analogous operations as in the preceding examples on the newly created DataFrame. # selecting from a MultiIndex isn't much different than the Panel # (output muted) > df.ix['gs':'ibm'] > df['Open'] How it works... The previous example was certainly contrived, but when indexing and statistical techniques are incorporated, the power of pandas begins to come through. Statistics will be covered in an upcoming recipe. pandas' indexes by themselves can be thought of as descriptors of a certain point in the DataFrame. When ticker and timestamp are the only indexes in a DataFrame, then the point is individualized by the ticker, timestamp, and column name. After the point is individualized, it's more convenient for aggregation and analysis. There's more... Indexes show up all over the place in pandas so it's worthwhile to see some other use cases as well. Advanced header indexes Hierarchical indexing isn't limited to rows. Headers can also be represented by MultiIndex, as shown in the following command line: > header_top = ['Price', 'Price', 'Price', 'Price', 'Volume', 'Price'] > df.columns = pd.MultiIndex.from_tuples(zip(header_top, df.columns) Performing aggregate operations with indexes As a prelude to the following sections, we'll do a single groupby function here since they work with indexes so well. > df.groupby(level=['tickers', 'day'])['Volume'].mean() This answers the question for each ticker and for each day (not date), that is, what was the mean volume over the life of the data. Summary This article talks about the use and importance of indexes in pandas. It also talks about different operations that can be done with indexes. Resources for Article : Further resources on this subject: Installing Panda3D [Article] Setting Up Panda3D and Configuring Development Tools [Article] Collision Detection and Physics in Panda3D Game Development [Article]

0
0
3109

Packt

05 Sep 2013

10 min read

Cocos2d-x: Installation

Packt

05 Sep 2013

10 min read

(For more resources related to this topic, see here.) Download and installation All the examples in this article were developed on a Mac using Xcode. Although you can use Cocos2d-x to develop your games for other platforms, using different systems, the examples will focus on iOS and Mac. Xcode is free and can be downloaded from the Mac App store (https://developer.apple.com/xcode/index.php), but in order to test your code on an iOS device and publish your games, you will need a developer account with Apple, which will cost you USD 99 a year. You can find more information on their website: https://developer.apple.com/ So, assuming you have an internet connection, and that Xcode is ready to rock, let's begin! Time for action – downloading and installing Cocos2d-x We start by downloading the framework: Go to http://download.cocos2d-x.org/ and download the latest stable version of Cocos2d-x. For this article I'll be using version Cocos2d-2.0-x-2.0.4, which means the 2.0.4 C++ port of version 2.0 of Cocos2d. Uncompress the files somewhere on your machine. Open Terminal and type cd (that is cd and a space). Drag the uncompressed folder you just downloaded to the Terminal window. You should see the path to the folder added to the command line. Hit returnto go to that folder in Terminal. Now type: sudo ./install-templates-xcode.sh -u Hit return again and you're done. What just happened? You have successfully installed the Cocos2d-x templates in your machine. With these in place, you can select the type of Cocos2d-x application you wish to build inside Xcode, and the templates will take care of copying all the necessary files into your application. Next, open Xcode and select Create a new Xcode Project.You should see something like this: So let's build our first application. Hello-x World-x Let's create that old chestnut in computer programming: the hello world example. Time for action – creating an application Open Xcode and select File | New | Project... and follow these steps: In the dialogue box select cocos2d-x under the iOS menu and choose the cocos2dx template. Hit Next . Give the application a name, but not HelloWorld. I'll show you why in a second. You will be then asked to select a place to save the project and you are done. Once your application is ready, click Run to build it. After that, this is what you should see in the simulator: When you run a cocos2d-x application in Xcode it is quite common for the program to post some warnings regarding your code, or most likely the frameworks. These will mostly reference deprecated methods, or statements that do not precisely follow more recent, and stricter rules of the current SDK. But that's okay. These warnings, though certainly annoying, can be ignored. What just happened? You created your first Cocos2d-x application using the cocos2dx template, sometimes referred to as the basic template. The other template options include one with Box2D, one with Chipmunk (both related to physics simulation), one with JavaScript, and one with Lua. The last two options allow you to code some or all of your game using those script languages instead of the native C++; and they work just as you would expect a scripting language to work, meaning the commands written in either Javascript or Lua are actually replaced and interpreted as C++ commands by the compiler. Now if you look at the files created by the basic template you will see a HelloWorldScene class file. That's the reason I didn't want you to call your application HelloWorld, because I didn't want you to have the impression that the file name was based on your project name. It isn't. You will always get a HelloWorldScene file unless you change the template itself. Now let's go over the sample application and its files: The folder structure First you have the Resources folder, where you find the images used by the application. The ios folder has the necessary underlying connections between your app and iOS. For other platforms, you will have their necessary linkage files in separate folders targeting their respective platform (like an android folder the Android platform, for instance.) In the libs folder you have all the cocos2dx files, plus CocosDenshion files (for sound support) and a bunch of other extensions. Using a different template for your projects will result in a different folder structure here, based on what needs to be added to your project. So you will see a Box2D folder, for example, if you choose the Box2D template. In the Classes folder you have your application. In here, everything is written in C++ and this is the home for the part of your code that will hopefully not need to change, however many platforms you target with your application. Now let us go over the main classes of the basic application. The iOS linkage classes AppController and RootViewController are responsible for setting up OpenGL in iOS as well as telling the underlying operating system that your application is about to say Hello... To the World. These classes are written with a mix of Objective-C and C++, as all the nice brackets and the .mm extensions show. You will change very little if anything in these classes; and again that will reflect in changes to the way iOS handles your application. So other targets would require the same instructions or none at all depending on the target. In AppController for instance, I could add support for multitouch. And in RootViewController, I could limit the screen orientations supported by my application. The AppDelegate class This class marks the first time your C++ app will talk to the underlying OS. It attempts to map the main events that mobile devices wants to dispatch and listen to. From here on, all your application will be written in C++ (unless you need something else). In AppDelegate you should setup CCDirector (the cocos2d-x all powerful singleton manager object) to run your application just the way you want. You can: Get rid of the application status information Change the frame rate of your application Tell CCDirector where your high definition images are, and where your standard definition images are, as well as which to use You can change the overall scale of your application to suit different screens The AppDelegate class is also the best place to start any preloading process And, most importantly, it is here you tell the CCDirector object what CCScene to begin your application with Here too you will handle what happens to your application if the OS decides to kill it, push it aside, or hang it upside down to dry. All you need to do is place your logic inside the correct event handler: applicationDidEnterBackground or applicationWillEnterForeground. The HelloWorldScene class When you run the application you get a screen with the words Hello World and a bunch of numbers in one corner. These are the display stats you decided you wanted around in the AppDelegate class. The actual screen is created by the oddly named HelloWorldScene class. It is a Layer class that creates its own scene (don't worry if you don't know what a Layer class is, or a Scene class, you will soon enough). When it initializes, HelloWorldScene puts a button on screen that you can press to exit the application. The button is actually a Menu item, part of a Menu group consisting of one button, two image states for that button, and one callback event, triggered when the said button is pressed. The Menu group automatically handles touch events targeting its members, so you don't get to see any of that code floating about. There is also the necessary Label object to show the Hello World message and the background image. Who begets whom If you never worked with either Cocos2d or Cocos2d-x before, the way the initial scene() method is instantiated may lead to dizziness. To recap, in AppDelegate you have: CCScene *pScene = HelloWorld::scene(); pDirector->runWithScene(pScene); CCDirector needs a CCScene object to run, which you can think of as being your application, basically. CCScene needs something to show, which in this case is a CCLayer class. CCScene is then said to contain a CCLayer class. Here a CCScene object is created through a static method scene inside a CCLayer derived class. So the layer creates the scene, and the scene immediately adds the layer to itself. Huh? Relax. This incestuous-like instantiation will most likely only happen once, and you have nothing to do with it when it happens. So you can easily ignore all these funny goings-on and look the other way. I promise instantiations will be much easier after this first one. Further information Follow these steps to access one of the best sources for reference material on Cocos2d-x: its Test project. Time for action – running the test samples You open the test project just like you would do for any other Xcode project: Go inside the folder you downloaded for the framework, and navigate to samples/TestCpp/proj.ios/TestCpp.xcodeproj. Open that project in Xcode. When you run the project, you will see inside the simulator a long list of tests, all nicely organized by topic. Pick any one to review. Better yet, navigate to samples/TestCpp/Classes and if you have a program like TextWrangler or some equivalent, you can open that entire directory inside a Disk Browser window and have all that information ready for referencing right at your desktop. What just happened? With the test samples you can visualize most features in Cocos2d-x and see what they do, as well as some of the ways you can initialize and customize them. I will refer to the code found in the tests quite often. As usual with programming, there is always a different way to accomplish a given task, so sometimes after showing you one way, I'll refer to another one that you can find (and by then easily understand) inside the Test classes. The other tools Now comes the part where you may need to spend a bit more money to get some extremely helpful tools. In this articles examples I use four of them: A tool to help build sprite sheets: I'll use Texture Packer (http://www.codeandweb.com/texturepacker). There are other alternatives, like Zwoptex (http://zwopple.com/zwoptex/). And they usually offer some features for free. A tool to help build particle effects: I'll use Particle Designer (http://www.71squared.com/en/particledesigner). Depending on your operating system you may find free tools online for this. Cocos2d-x comes bundled with some common particle effects that you can customize. But to do it blindly is a process I do not recommend. A tool to help build bitmap fonts: I'll use Glyph Designer (http://www.71squared.com/en/glyphdesigner). But there are others: bmGlyph (which is not as expensive), FontBuilder (which is free). It is not extremely hard to build a Bitmap font by hand, not nearly as hard as building a particle effect from scratch, but doing it once is enough to convince you to get one of these tools fast. A tool to produce sound effects: No contest. cfxr for Mac or the original sfxr for Windows. Both are free (http://www.drpetter.se/project_sfxr.html and http://thirdcog.eu/apps/cfxr respectively). Summary You just learned how to install Cocos2d-x templates and create a basic application. You also learned enough of the structure of a basic Cocos2d-x application to get started to build your first game. Resources for Article: Further resources on this subject: Getting Started With Cocos2d [Article] Cocos2d: Working with Sprites [Article] Cocos2d for iPhone: Surfing Through Scenes [Article]

0
0
8926

Packt

05 Sep 2013

5 min read

So, what is PowerShell 3.0 WMI?

Packt

05 Sep 2013

5 min read

(For more resources related to this topic, see here.) Microsoft created Windows Management Instrumentation ( WMI ) as a management layer to the Windows operating system. This management layer allows you to retrieve information pertaining to the operating system or physical hardware on a system. It also allows you to manipulate components within the operating system. A good example for this section is a hard drive. WMI provides the ability to view the physical hard drive as well as the logical components of the hard drive. Using a call to the Win32_diskdrive class, you have the ability to view the physical aspects of the disk drive, such as the tracks, sectors, manufacturer, and even the serial number. The win32_logicaldisk class provides you with the ability to see the logical aspects of a drive, such as partitions, free space, and volume names. Not limited to just Windows operating system management, WMI also has the extensibility to allow third-party developers to create WMI providers for use within their projects. This means that you can create the management hooks within your code that will allow remote management through a common standardized framework. Many companies have adopted the use of WMI providers for items such as storage subsystems, application virtualization, hardware virtualization, and enhanced management of the hardware connected to a workstation or server system. Microsoft chose to follow the Common Information Model ( CIM ) industry standard for WMI. The preceding diagram takes a simplified look at Microsoft's implementation of WMI through the use of the CIM standard. There are three layers to their model, which are as follows: WMI consumers: The WMI consumers are exactly what their name states. They consume the available APIs to access the managed component. The WMI consumers are the real users of the C/C++ and .NET clients, and they use scripting languages, such as PowerShell 3.0, to access management data and interact with the managed components. In the hard drive example, the WMI consumer is the PowerShell code that calls information about the hard drive. This would look as follows: get-wmiobject –class win32_logicaldisk WMI infrastructure: The WMI infrastructure includes the CIM Object Manager ( CIMOM ), which stores a repository of the available WMI providers. If a third-party WMI provider doesn't register with the CIM Object Manager, the Windows operating system will not be able to manage the component through WMI. In the hard drive example, the CIMOM will import the Managed Object Format ( MOF) file of the hard drive into the WMI repository. This will register the hard drive's available WMI properties and methods into use by a WMI consumer. WMI providers: The last components, WMI providers, are made up of a driver (DLL) and a MOF file. These two components are responsible for returning the management data to the WMI consumer, through the WMI infrastructure. This allows the WMI consumer to interact with the managed components. In the hard drive example, the WMI consumer will access the hard drive through the WMI infrastructure utilizing the hard drive driver (DLL). The WMI consumer will have the ability to retrieve the information pertaining to the physical components on that hard drive. WMI integration with PowerShell 3.0 PowerShell has the ability to interact with WMI through the use of the built-in cmdlets. These cmdlets act as the WMI consumers and interact with the WMI. As WMI evolved with the release of new operating systems, PowerShell also needed to evolve in parallel to manage those systems. With the release of Windows 8 and Windows Server 2012, Microsoft created a new iteration of the Microsoft Windows Management Framework ( WMF ), Version 3.0. The new release of Windows Management Framework updates WMI to Version 3.0, PowerShell to Version 3.0, and installs the new Windows Remote Management ( WinRM ), OData IIS Extensions, and Server Manager CIM Provider. PowerShell 3.0 includes new WMI management cmdlets, displayed in the following table, which leverage the use of the new functionality within Windows Management Framework 3.0. The new CIM cmdlets provide a richer WMI experience leveraging stateful communications to the remote systems. They also provide the ability to create CIM calls through PowerShell 3.0 to non-Windows-based WMI systems that support Web Services-Management ( WSman ). This provides engineers the ability to tap into a variety of systems for management purposes. Get-CimAssociatedInstance New-CimSession Get-CimClass New-CimSessionOption Get-CimInstance Register-CimIndicationEvent Get-CimSession Remove-CimInstance Invoke-CimMethod Remove-CimSession New-CimInstance Set-CimInstance Using PowerShell in your environment PowerShell is quickly becoming the de-facto standard for managing the Windows-based systems in small and large organizations. It is being used for tasks, such as automated software deployments, dynamic system provisioning, and system maintenance. It is also being heavily used in Microsoft products, such as System Center 2012 - Service Manager, that allow for business process automation. PowerShell 3.0 has introduced a variety of new cmdlets that further simplify the administration of systems through the use of WMI. Whatever the administrative task, the PowerShell community has several examples of the ways to manage systems through the use of the WMI consumers. The examples provided in this article are just the tip of the iceberg compared to what you can accomplish utilizing PowerShell 3.0 and Windows Management Instrumentation. You may be surprised by the number of manual administration steps that can be automated by creating PowerShell scripts. The sky is the limit when it comes to scripting with PowerShell! Summary In this article, we learned what Windows Management Instrumentation (WMI) is, how PowerShell 3.0 utilizes it, and why it's applicable to systems engineers. Resources for Article : Further resources on this subject: Using the Windows Azure Platform PowerShell Cmdlets [Article] Accessing Oracle [Article] Inventorying Servers with PowerShell [Article]

0
0
9339

Packt

05 Sep 2013

10 min read

Chef Infrastructure

Packt

05 Sep 2013

10 min read

0
0
2688

Packt

05 Sep 2013

3 min read

Using a LINQ query in LINQPad

Packt

05 Sep 2013

3 min read

(For more resources related to this topic, see here.) The standard version We are going to implement a simple scenario: given a deck of 52 cards, we want to pick a random number of cards, and then take out all of the hearts. From this stack of hearts, we will discard the first two and take the next five cards (if possible), and order them by their face value for display. You can try it in a C# program query in LINQPad: public static Random random = new Random();void Main(){ var deck = CreateDeck(); var randomCount = random.Next(52); var hearts = new Card[randomCount]; var j = 0; // take all hearts out for(var i=0;i<randomCount;i++) { if(deck[i].Suit == "Hearts") { hearts[j++] = deck[i]; } } // resize the array to avoid null references Array.Resize(ref hearts, j); // check that we have at least 2 cards. If not, stop if(hearts.Length <= 2) return; var count = 0; // check how many cards we can take count = hearts.Length - 2; // the most we need to take is 5 if(count > 5) { count = 5; } // take the cards var finalDeck = new Card[count]; Array.Copy(hearts, 2, finalDeck, 0, count); // now order the cards Array.Sort(finalDeck, new CardComparer()); // Display the result finalDeck.Dump();}public class Card{ public string Suit { get; set; } public int Value { get; set; }}// Create the cards' deckpublic Card[] CreateDeck(){ var suits = new [] { "Spades", "Clubs", "Hearts", "Diamonds" }; var deck = new Card[52]; for(var i = 0; i < 52; i++) { deck[i] = new Card { Suit = suits[i / 13], FaceValue = i-(13*(i/13))+1 }; } // randomly shuffle the deck for (var i = deck.Length - 1; i > 0; i--) { var j = random.Next(i + 1); var tmp = deck[j]; deck[j] = deck[i]; deck[i] = tmp; } return deck;}// CardComparer compare 2 cards against their face valuepublic class CardComparer : Comparer<Card>{ public override int Compare(Card x, Card y) { return x.FaceValue.CompareTo(y.FaceValue); }} Even if we didn't consider the CreateDeck() method, we had to do quite a few operations to produce the expected result (your values might be different as we are using random cards). The output is as follows: Depending on the data, LINQPad will add contextual information. For example, in this sample it will add the bottom row with the sum of all the values (here, only FaceValue). Also, if you click on the horizontal graph button, you will get a visual representation of your data, as shown in the following screenshot: This information is not always relevant but it can help you explore your data. Summary In this article we saw how LINQ queries can be used in LINQPad. The powerful query capabilitiesof LINQ has been utilized to the maximum in LINQPad. Resources for Article: Further resources on this subject: Displaying SQL Server Data using a Linq Data Source [Article] Binding MS Chart Control to LINQ Data Source Control [Article] LINQ to Objects [Article]

0
0
2678

Packt

04 Sep 2013

15 min read

Learning MuseScore

Packt

04 Sep 2013

15 min read

(For more resources related to this topic, see here.) Entering notes In order to enter notes into our score, we need to enter Note Entry mode. MuseScore has various modes that we can use to accomplish special tasks. You can enter Note Entry mode by clicking on the N button in the toolbar. You can tell whether you are in Note Entry mode at any given time by checking whether the N button is depressed. You may also enter/exit Note Entry mode by pressing the N key. After you enter Note Entry mode, the quarter note should be selected by default. If you hover over the staff, you should see a light blue outline of a note appear. Clicking here will cause a quarter note of that pitch to be inserted. In the toolbar, you will see several notes of different lengths, such as half notes, eighth notes, and whole notes. This area is called the Note Entry toolbar, and indicates which note will be inserted when you click on the staff. Right now, the quarter note should be selected. Click on the half note, and then click an area of the staff on top of the rest that is immediately after the quarter note we just inserted. A half note of the pitch you chose will be added. In MuseScore, whenever we add notes, we must overwrite other notes. First, we overwrote a whole rest with a quarter note, which caused three beats of rest to be added after the quarter note. Then, we overwrote a quarter rest with a half note. Since the half note was longer than the quarter rest, it also overwrote one beat from the half rest following it, and changed the rest to a quarter rest to accommodate the size of the half note. To add an accidental, simply insert the note without the accidental, and then press the appropriate accidental button in the toolbar. For example, let's insert an F eighth note. We click on the eighth note button, then on the F line of the staff, and finally on the sharp button in the toolbar. We can insert dotted notes in a similar fashion by using the dot button on the Note Entry toolbar. In the next measure, let's add a G dotted quarter note by clicking on the quarter note in the Note Entry toolbar, then clicking on the dot button, and then clicking on a G in the staff. The dot will stay selected after you insert the note. If you would like to deselect the dot, you can click on it again. It is also automatically deselected when you change the note duration. Thus, you should always select the dot after you select the value of the note you would like to be dotted. It is possible to notate more quickly using keyboard shortcuts. The number keys 1 through 9 will select different durations, and the letters A through G will insert the designated note. The 0 key inserts a rest. Inserting notes this way will always insert the closest note with the desired pitch. If you hold Ctrl (or on Mac) while pressing the up or down arrow keys, MuseScore will move the last note you inserted up or down an octave. So, inserting a C half note and moving it up an octave can be accomplished by pressing the sequence 6 C Ctrl + ↑. Notes can be adjusted by a half step by pressing the up or down arrows without holding the Ctrl key. Hitting the up arrow will always create sharps, and the down arrow creates flats. This allows us to insert an F eighth note with the keystroke sequence 4 f ↑. While at first the keyboard shortcuts may seem complicated, as you get the hang of MuseScore, it is worthwhile to learn them. They will allow you to notate music extremely quickly and make your overall experience with MuseScore much more pleasurable. Making chords is also very straightforward. We just click on top of our previously inserted note after selecting a note of the same value. Be careful! If a different note length is selected, it will overwrite the previous note. Chords can also be inserted rapidly with keyboard shortcuts. Just start by inserting the first note of the chord normally. If you would like to insert a note of the chord above the previous note, hold Alt and press the interval above the previous note you would like to insert. To insert it below, hold Shift and do the same. Notes are always inserted in the present key signature. So to insert a C first inversion chord, press the sequence E Alt + 3 Alt + 4, or to insert a C second inversion chord, press the sequence G Alt + 4 Alt + 3. Alternatively, after inserting the first note, you can hold Shift and type the letter names of the notes to add to the chord. So pressing the sequence G Shift+C Shift+E would insert the same C second inversion chord. If you ever make a mistake, you can always undo your latest changes by going to the Edit menu and selecting Undo. You can also use the keyboard shortcut Ctrl + Z (or + Z on Mac). Let's put some notes and chords in some measures for both the trombone and piano parts so that we have something to work with. Inserting triplets To insert a triplet, first enter Note Entry mode. Then, from the Note Entry toolbar, choose the total duration that you would like all three triplets to sum to. Next, insert the first note of the triplet in the position you would like the triplet to occupy. After this, exit Note Entry mode, and from the Notes menu, under the Tuplets submenu, click on the Triplet option. A triplet will be created with the selected note as the first note. MuseScore will automatically enter Note Entry mode for you again, and select the correct duration of note needed to complete the triplet. From here, you can replace the two rests with notes by inserting the correct notes on top of them, as we did when we entered notes previously. Also, there is a keyboard shortcut to make this process easier. While in Note Entry mode, select the proper duration you would like the entire triplet to be, as before, but then hit Ctrl + 3 (or + 3 on Mac). The triplet will be inserted, and the proper note duration to fill in the triplet will be selected. You can now enter the notes of the triplet as you would enter normal notes. For instance, to insert a triplet arpeggio of an F major triad totaling one beat, we would press the sequence 5 Ctrl + 3 F A C. For a B major triad totaling two beats, we would similarly press 6 Ctrl + 3 B D ↑ F ↑. Inserting ties Ties are very easy to create in MuseScore. The simplest way to insert a tie is to insert both of the notes that you want to be tied together, exit Note Entry mode, click on the first note, and then click on the tie button in the toolbar, or press the + key. Make sure the two notes you are trying to tie together have the same pitch, or no tie will be inserted. This method works for individual notes, and also for chords. In order to have flexibility when tying chords, you must tie each note of the chord individually if you want the full chord to be tied. An easy way to do this is to ensure that you are not in Note Entry mode, hold Shift, click on the first note of the first chord so that the whole chord is selected, and press the + key. Again, for this to work, you must have two chords with identical pitches next to each other. If you are working with keyboard shortcuts, then there is also a faster way to enter ties that does not require the use of the mouse. After you enter a note in Note Entry mode, the note you just entered will be selected, and the cursor will be located on the right-hand side of this note, as shown in the following screenshot: Then, using the appropriate keyboard shortcut, select the duration of note you would like this note to be tied to. Finally, press the + key. MuseScore will insert a note of the selected duration tied to the previous note. So, pressing the sequence 5 C 4 + will insert a quarter note C tied to an eighth note. While this method is extremely convenient for single notes, it does not work for chords. Often, it is necessary to flip the tie for visual appeal, especially when tying chords. This can be accomplished by ensuring that you are not in Note Entry mode, clicking on a tie, and then pressing the X key. Even though ties look very similar to slurs in many situations, they are created differently. Slurs will be discussed later. Copying and pasting Suppose that we would like to repeat a measure in the bass line, or that the next measure in the melody is very similar to the previous measure. As in a word processor, we can copy and paste measures and fragments of music. First, let's copy and paste a measure. Exit Note Entry mode by ensuring the N button in the toolbar is not selected. Then, click on a portion of the measure where no notes are present. The measure should be selected, as indicated by the blue box around it. Now, either go to the Edit menu and click on Copy, or press Ctrl + C ( + C on Mac). The measure will be copied to the clipboard. Now, click on a portion of the target measure without any notes, and either click on Paste from the Edit menu, or press Ctrl + V ( + V on Mac). The notes will be inserted, and the target measure will be overwritten. It is also possible to copy any portion of your score, even if it spans partial measures or multiple staves. First, click on the note at the top-left of the region you want to copy. In the following example, this would be the E♭ in the right hand. Then, press and hold the Shift key, and click on the note at the bottom right corner of the region you would like to copy. Here, that would be the D in the left hand. MuseScore will select all of the notes in between. Once you have selected the region, you can copy it in the same way you copied the measure before. To paste the region, click on the first note or rest in the uppermost stave where you would like to paste it, and paste as we did with a single measure using either Ctrl + V or Paste from the Edit menu. If your selection has different measure breaks or is in a different meter than the destination, the selection will be reflowed to fit the destination, and ties will be added as necessary. Inserting and deleting measures Often, it is helpful to insert or delete a measure in your score. Luckily, MuseScore makes this extremely easy. To insert a measure, select the measure (as we did when we copied a measure) immediately after the location where you would like to insert the measure. Then, go to the Create menu, and under the Measures submenu, select Insert Measure. A measure will be inserted. To insert multiple measures, select Insert Measures. A dialog box will prompt you for how many measures to insert. If you would like to add measures to the end of the score, you can select Append Measures from under the Measures submenu within the Create menu. There is no need to select any measures to perform this operation. To delete measures, simply select the measure by clicking any blank area within the measure, and then go to the Edit menu, and click on Delete Selected Measures. Doing so will delete this measure position within all staves, not just the selected staff. You can also select multiple measures (as we did earlier when we were copying by selecting one measure, holding the Shift key, and selecting additional measures), and use the same menu button to delete all of the measures that you have selected. Chord symbols In jazz and popular music, it is very common to give musicians chord symbols to read from. To create a chord symbol, make sure you are not in Note Entry mode, and click on a note that you would like to add a chord symbol to. Then, either go to the Create menu, go to the Text submenu, and select Chord Name, or press Ctrl + K ( + K on Mac). A text box should appear that looks exactly like the ones we saw before. Now, you can type the name of the chord in the same way you would write it on paper. (For example, D minor would be Dm, and a G7 chord would just be G7.) All lowercase b characters will be converted into flat signs, and all # characters will be converted into sharps. To move to the next location in the measure, press the space bar. If you press the space bar repeatedly, you will move forward without inserting any chords. Now that our chords are inserted, we can optionally make them look stylized. To do this, go to the Style menu and click on Edit General Style. Then, click on the Chordnames option on the left-hand side. You should see a textbox appear on the right-hand side containing the text stdchords.xml. Change this to jazzchords.xml, and then press OK. The chords you entered should be appropriately stylized. Many styles of notation, especially within jazz music, use chord symbols and slashes to indicate improvisation. To create these slashes in MuseScore, insert four quarter notes on the middle line of the staff. Then, after exiting Note Entry mode, right-click on each note and select Note Properties. Check the box that says Stemless. Also, find the option labeled velocity type and choose user, and then change the value of the box velocity (0-127) to 0. Now press OK. Then, locate the section of the palette labeled Note Heads, and drag the parallelogram slash shape on top of each note. This will create the slash notation. Beaming The proper beaming of notes is a key feature of quality engraved scores that often goes unappreciated. It is extremely easy to change the beaming patterns to enhance the readability of your score. There are several utilities in the palette that allow for this. To start, go to the section of the palette labeled Beam Properties. Hovering over each icon will tell you what it does. These properties can be applied to different notes. The Start beam option is for notes in the middle of an existing beam. It breaks the existing beam at the specified note, and starts a new beam on that note. The Middle of the beam option will ensure that the selected note is beamed to the notes on both sides of it, and the No beam option will break any beams going to the selected note. Let's learn how to use these with a simple use case scenario. Suppose you enter three eighth notes followed by an eighth rest. MuseScore will automatically choose the following beaming: However, to a musician who is sight-reading, it may be easy to confuse this with a triplet. To correct this, simply drag the No beam icon on top of the third eighth note in the passage. The note should highlight red as you hover over it, before you drop it. Once you let go of the mouse button, MuseScore will automatically adjust the beam according to what you specified. Similarly, choosing the beaming wisely can make difficult passages easier to read. Let's consider the case of two sixteenth notes followed by two eighth notes and two more sixteenth notes. Especially with the sharps and flats in this example, it would not be easy to sight-read such a passage. However, dragging the Start beam option on top of the B♮ makes this passage much cleaner and easier to read. To undo any of these changes, ensure that you are not in Note Entry mode, and click on the note that you have changed. Then, in the Beam Properties section of the palette, double-click the A icon to reset it back to default. Though MuseScore uses standard conventions for whether to put the beam above or below the notes, if you would like to change this, simply ensure that you are not in Note Entry mode, click on the beam, and press the X key. The beam will flip to the other side of the staff. Summary In this article, we learned the basics of creating notes including ties and triplets, copying and pasting measures, creating chord symbols, and also changing the beaming patterns to enhance the readability of our score. Resources for Article: Further resources on this subject: Importing and Adding Background Music with Audacity 1.3 [Article] New iPad Features in iOS 6 [Article] Quick start – media files and XBMC [Article]

0
0
4553

How-To Tutorials

article-image-integrating-storm-and-hadoop

Packt

04 Sep 2013

17 min read

Integrating Storm and Hadoop

Packt

04 Sep 2013

17 min read

(For more resources related to this topic, see here.) In this article, we will implement the Batch and Service layers to complete the architecture. There are some key concepts underlying this big data architecture: Immutable state Abstraction and composition Constrain complexity Immutable state is the key, in that it provides true fault-tolerance for the architecture. If a failure is experienced at any level, we can always rebuild the data from the original immutable data. This is in contrast to many existing data systems, where the paradigm is to act on mutable data. This approach may seem simple and logical; however, it exposes the system to a particular kind of risk in which the state is lost or corrupted. It also constrains the system, in that you can only work with the current view of the data; it isn't possible to derive new views of the data. When the architecture is based on a fundamentally immutable state, it becomes both flexible and fault-tolerant. Abstractions allow us to remove complexity in some cases, and in others they can introduce complexity. It is important to achieve an appropriate set of abstractions that increase our productivity and remove complexity, but at an appropriate cost. It must be noted that all abstractions leak, meaning that when failures occur at a lower abstraction, they will affect the higher-level abstractions. It is therefore often important to be able to make changes within the various layers and understand more than one layer of abstraction. The designs we choose to implement our abstractions must therefore not prevent us from reasoning about or working at the lower levels of abstraction when required. Open source projects are often good at this, because of the obvious access to the code of the lower level abstractions, but even with source code available, it is easy to convolute the abstraction to the extent that it becomes a risk. In a big data solution, we have to work at higher levels of abstraction in order to be productive and deal with the massive complexity, so we need to choose our abstractions carefully. In the case of Storm, Trident represents an appropriate abstraction for dealing with the data-processing complexity, but the lower level Storm API on which Trident is based isn't hidden from us. We are therefore able to easily reason about Trident based on an understanding of lower-level abstractions within Storm. Another key issue to consider when dealing with complexity and productivity is composition. Composition within a given layer of abstraction allows us to quickly build out a solution that is well tested and easy to reason about. Composition is fundamentally decoupled, while abstraction contains some inherent coupling to the lower-level abstractions—something that we need to be aware of. Finally, a big data solution needs to constrain complexity. Complexity always equates to risk and cost in the long run, both from a development perspective and from an operational perspective. Real-time solutions will always be more complex than batch-based systems; they also lack some of the qualities we require in terms of performance. Nathan Marz's Lambda architecture attempts to address this by combining the qualities of each type of system to constrain complexity and deliver a truly fault-tolerant architecture. We divided this flow into preprocessing and "at time" phases, using streams and DRPC streams respectively. We also introduced time windows that allowed us to segment the preprocessed data. In this article, we complete the entire architecture by implementing the Batch and Service layers. The Service layer is simply a store of a view of the data. In this case, we will store this view in Cassandra, as it is a convenient place to access the state alongside Trident's state. The preprocessed view is identical to the preprocessed view created by Trident, counted elements of the TF-IDF formula (D, DF, and TF), but in the batch case, the dataset is much larger, as it includes the entire history. The Batch layer is implemented in Hadoop using MapReduce to calculate the preprocessed view of the data. MapReduce is extremely powerful, but like the lower-level Storm API, is potentially too low-level for the problem at hand for the following reasons: We need to describe the problem as a data pipeline; MapReduce isn't congruent with such a way of thinking Productivity We would like to think of a data pipeline in terms of streams of data, tuples within the stream and predicates acting on those tuples. This allows us to easily describe a solution to a data processing problem, but it also promotes composability, in that predicates are fundamentally composable, but pipelines themselves can also be composed to form larger, more complex pipelines. Cascading provides such an abstraction for MapReduce in the same way as Trident does for Storm. With these tools, approaches, and considerations in place, we can now complete our real-time big data architecture. There are a number of elements, that we will update, and a number of elements that we will add. The following figure illustrates the final architecture, where the elements in light grey will be updated from the existing recipe, and the elements in dark grey will be added in this article: Implementing TF-IDF in Hadoop TF-IDF is a well-known problem in the MapReduce communities; it is well-documented and implemented, and it is interesting in that it is sufficiently complex to be useful and instructive at the same time. Cascading has a series of tutorials on TF-IDF at http://www.cascading.org/2012/07/31/cascading-for-the-impatient-part-5/, which documents this implementation well. For this recipe, we shall use a Clojure Domain Specific Language (DSL) called Cascalog that is implemented on top of Cascading. Cascalog has been chosen because it provides a set of abstractions that are very semantically similar to the Trident API and are very terse while still remaining very readable and easy to understand. Getting ready Before you begin, please ensure that you have installed Hadoop by following the instructions at http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/. How to do it… Start by creating the project using the lein command: lein new tfidf-cascalog Next, you need to edit the project.clj file to include the dependencies: (defproject tfidf-cascalog "0.1.0-SNAPSHOT" :dependencies [[org.clojure/clojure "1.4.0"] [cascalog "1.10.1"] [org.apache.cassandra/cassandra-all "1.1.5"] [clojurewerkz/cassaforte "1.0.0-beta11-SNAPSHOT"] [quintona/cascading-cassandra "0.0.7-SNAPSHOT"] [clj-time "0.5.0"] [cascading.avro/avro-scheme "2.2-SNAPSHOT"] [cascalog-more-taps "0.3.0"] [org.apache.httpcomponents/httpclient "4.2.3"]] :profiles{:dev{:dependencies[[org.apache.hadoop/hadoop-core "0.20.2-dev"] [lein-midje "3.0.1"] [cascalog/midje-cascalog "1.10.1"]]}}) It is always a good idea to validate your dependencies; to do this, execute lein deps and review any errors. In this particular case, cascading-cassandra has not been deployed to clojars, and so you will receive an error message. Simply download the source from https://github.com/quintona/cascading-cassandra and install it into your local repository using Maven. It is also good practice to understand your dependency tree. This is important to not only prevent duplicate classpath issues, but also to understand what licenses you are subject to. To do this, simply run lein pom, followed by mvn dependency:tree. You can then review the tree for conflicts. In this particular case, you will notice that there are two conflicting versions of Avro. You can fix this by adding the appropriate exclusions: [org.apache.cassandra/cassandra-all "1.1.5" :exclusions [org.apache.cassandra.deps/avro]] We then need to create the Clojure-based Cascade queries that will process the document data. We first need to create the query that will create the "D" view of the data; that is, the D portion of the TF-IDF function. This is achieved by defining a Cascalog function that will output a key and a value, which is composed of a set of predicates: (defn D [src] (let [src (select-fields src ["?doc-id"])] (<- [?key ?d-str] (src ?doc-id) (c/distinct-count ?doc-id :> ?n-docs) (str "twitter" :> ?key) (str ?n-docs :> ?d-str)))) You can define this and any of the following functions in the REPL, or add them to core.clj in your project. If you want to use the REPL, simply use lein repl from within the project folder. The required namespace (the use statement), require, and import definitions can be found in the source code bundle. We then need to add similar functions to calculate the TF and DF values: (defn DF [src] (<- [?key ?df-count-str] (src ?doc-id ?time ?df-word) (c/distinct-count ?doc-id ?df-word :> ?df-count) (str ?df-word :> ?key) (str ?df-count :> ?df-count-str))) (defn TF [src] (<- [?key ?tf-count-str] (src ?doc-id ?time ?tf-word) (c/count ?tf-count) (str ?doc-id ?tf-word :> ?key) (str ?tf-count :> ?tf-count-str))) This Batch layer is only interested in calculating views for all the data leading up to, but not including, the current hour. This is because the data for the current hour will be provided by Trident when it merges this batch view with the view it has calculated. In order to achieve this, we need to filter out all the records that are within the current hour. The following function makes that possible: (deffilterop timing-correct? [doc-time] (let [now (local-now) interval (in-minutes (interval (from-long doc-time) now))] (if (< interval 60) false true)) Each of the preceding query definitions require a clean stream of words. The text contained in the source documents isn't clean. It still contains stop words. In order to filter these and emit a clean set of words for these queries, we can compose a function that splits the text into words and filters them based on a list of stop words and the time function defined previously: (defn etl-docs-gen [rain stop] (<- [?doc-id ?time ?word] (rain ?doc-id ?time ?line) (split ?line :> ?word-dirty) ((c/comp s/trim s/lower-case) ?word-dirty :> ?word) (stop ?word :> false) (timing-correct? ?time))) We will be storing the outputs from our queries to Cassandra, which requires us to define a set of taps for these views: (defn create-tap [rowkey cassandra-ip] (let [keyspace storm_keyspace column-family "tfidfbatch" scheme (CassandraScheme. cassandra-ip "9160" keyspace column-family rowkey {"cassandra.inputPartitioner""org.apache.cassandra.dht.RandomPartitioner" "cassandra.outputPartitioner" "org.apache.cassandra.dht.RandomPartitioner"}) tap (CassandraTap. scheme)] tap)) (defn create-d-tap [cassandra-ip] (create-tap "d"cassandra-ip)) (defn create-df-tap [cassandra-ip] (create-tap "df" cassandra-ip)) (defn create-tf-tap [cassandra-ip] (create-tap "tf" cassandra-ip)) The way this schema is created means that it will use a static row key and persist name-value pairs from the tuples as column:value within that row. This is congruent with the approach used by the Trident Cassandra adaptor. This is a convenient approach, as it will make our lives easier later. We can complete the implementation by a providing a function that ties everything together and executes the queries: (defn execute [in stop cassandra-ip] (cc/connect! cassandra-ip) (sch/set-keyspace storm_keyspace) (let [input (tap/hfs-tap (AvroScheme. (load-schema)) in) stop (hfs-delimited stop :skip-header? true) src (etl-docs-gen input stop)] (?- (create-d-tap cassandra-ip) (D src)) (?- (create-df-tap cassandra-ip) (DF src)) (?- (create-tf-tap cassandra-ip) (TF src)))) Next, we need to get some data to test with. I have created some test data, which is available at https://bitbucket.org/qanderson/tfidf-cascalog. Simply download the project and copy the contents of src/data to the data folder in your project structure. We can now test this entire implementation. To do this, we need to insert the data into Hadoop: hadoop fs -copyFromLocal ./data/document.avro data/document.avro hadoop fs -copyFromLocal ./data/en.stop data/en.stop Then launch the execution from the REPL: => (execute "data/document" "data/en.stop" "127.0.0.1") How it works… There are many excellent guides on the Cascalog wiki (https://github.com/nathanmarz/cascalog/wiki), but for completeness's sake, the nature of a Cascalog query will be explained here. Before that, however, a revision of Cascading pipelines is required. The following is quoted from the Cascading documentation (http://docs.cascading.org/cascading/2.1/userguide/htmlsingle/): Pipe assemblies define what work should be done against tuple streams, which are read from tap sources and written to tap sinks. The work performed on the data stream may include actions such as filtering, transforming, organizing, and calculating. Pipe assemblies may use multiple sources and multiple sinks, and may define splits, merges, and joins to manipulate the tuple streams. This concept is embodied in Cascalog through the definition of queries. A query takes a set of inputs and applies a list of predicates across the fields in each tuple of the input stream. Queries are composed through the application of many predicates. Queries can also be composed to form larger, more complex queries. In either event, these queries are reduced down into a Cascading pipeline. Cascalog therefore provides an extremely terse and powerful abstraction on top of Cascading; moreover, it enables an excellent development workflow through the REPL. Queries can be easily composed and executed against smaller representative datasets within the REPL, providing the idiomatic API and development workflow that makes Clojure beautiful. If we unpack the query we defined for TF, we will find the following code: (defn DF [src] (<- [?key ?df-count-str] (src ?doc-id ?time ?df-word) (c/distinct-count ?doc-id ?df-word :> ?df-count) (str ?df-word :> ?key) (str ?df-count :> ?df-count-str))) The <- macro defines a query, but does not execute it. The initial vector, [?key ?df-count-str], defines the output fields, which is followed by a list of predicate functions. Each predicate can be one of the following three types: Generators: A source of data where the underlying source is either a tap or another query. Operations: Implicit relations that take in input variables defined elsewhere and either act as a function that binds new variables or a filter. Operations typically act within the scope of a single tuple. Aggregators: Functions that act across tuples to create aggregate representations of data. For example, count and sum. The :> keyword is used to separate input variables from output variables. If no :> keyword is specified, the variables are considered as input variables for operations and output variables for generators and aggregators. The (src ?doc-id ?time ?df-word) predicate function names the first three values within the input tuple, whose names are applicable within the query scope. Therefore, if the tuple ("doc1" 123324 "This") arrives in this query, the variables would effectively bind as follows: ?doc-id: "doc1" ?time: 123324 ?df-word: "This" Each predicate within the scope of the query can use any bound value or add new bound variables to the scope of the query. The final set of bound values that are emitted is defined by the output vector. We defined three queries, each calculating a portion of the value required for the TF-IDF algorithm. These are fed from two single taps, which are files stored in the Hadoop filesystem. The document file is stored using Apache Avro, which provides a high-performance and dynamic serialization layer. Avro takes a record definition and enables serialization/deserialization based on it. The record structure, in this case, is for a document and is defined as follows: {"namespace": "storm.cookbook", "type": "record", "name": "Document", "fields": [ {"name": "docid", "type": "string"}, {"name": "time", "type": "long"}, {"name": "line", "type": "string"} ] } Both the stop words and documents are fed through an ETL function that emits a clean set of words that have been filtered. The words are derived by splitting the line field using a regular expression: (defmapcatop split [line] (s/split line #"[[](),.)s]+")) The ETL function is also a query, which serves as a source for our downstream queries, and defines the [?doc-id ?time ?word] output fields. The output tap, or sink, is based on the Cassandra scheme. A query defines predicate logic, not the source and destination of data. The sink ensures that the outputs of our queries are sent to Cassandra. The ?- macro executes a query, and it is only at execution time that a query is bound to its source and destination, again allowing for extreme levels of composition. The following, therefore, executes the TF query and outputs to Cassandra: (?- (create-tf-tap cassandra-ip) (TF src)) There's more… The Avro test data was created using the test data from the Cascading tutorial at http://www.cascading.org/2012/07/31/cascading-for-the-impatient-part-5/. Within this tutorial is the rain.txt tab-separated data file. A new column was created called time that holds the Unix epoc time in milliseconds. The updated text file was then processed using some basic Java code that leverages Avro: Schema schema = Schema.parse(SandboxMain.class.getResourceAsStream("/document.avsc")); File file = new File("document.avro"); DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema); DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter); dataFileWriter.create(schema, file); BufferedReader reader = new BufferedReader(new InputStreamReader(SandboxMain.class.getResourceAsStream("/rain.txt"))); String line = null; try { while ((line = reader.readLine()) != null) { String[] tokens = line.split("t"); GenericRecord docEntry = new GenericData.Record(schema); docEntry.put("docid", tokens[0]); docEntry.put("time", Long.parseLong(tokens[1])); docEntry.put("line", tokens[2]); dataFileWriter.append(docEntry); } } catch (IOException e) { e.printStackTrace(); } dataFileWriter.close(); Persisting documents from Storm In the previous recipe, we looked at deriving precomputed views of our data taking some immutable data as the source. In that recipe, we used statically created data. In an operational system, we need Storm to store the immutable data into Hadoop so that it can be used in any preprocessing that is required. How to do it… As each tuple is processed in Storm, we must generate an Avro record based on the document record definition and append it to the data file within the Hadoop filesystem. We must create a Trident function that takes each document tuple and stores the associated Avro record. Within the tfidf-topology project created in, inside the storm.cookbook.tfidf.function package, create a new class named PersistDocumentFunction that extends BaseFunction. Within the prepare function, initialize the Avro schema and document writer: public void prepare(Map conf, TridentOperationContext context) { try { String path = (String) conf.get("DOCUMENT_PATH"); schema = Schema.parse(PersistDocumentFunction.class .getResourceAsStream("/document.avsc")); File file = new File(path); DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema); dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter); if(file.exists()) dataFileWriter.appendTo(file); else dataFileWriter.create(schema, file); } catch (IOException e) { throw new RuntimeException(e); } } As each tuple is received, coerce it into an Avro record and add it to the file: public void execute(TridentTuple tuple, TridentCollector collector) { GenericRecord docEntry = new GenericData.Record(schema); docEntry.put("docid", tuple.getStringByField("documentId")); docEntry.put("time", Time.currentTimeMillis()); docEntry.put("line", tuple.getStringByField("document")); try { dataFileWriter.append(docEntry); dataFileWriter.flush(); } catch (IOException e) { LOG.error("Error writing to document record: " + e); throw new RuntimeException(e); } } Next, edit the TermTopology.build topology and add the function to the document stream: documentStream.each(new Fields("documentId","document"), new PersistDocumentFunction(), new Fields()); Finally, include the document path into the topology configuration: conf.put("DOCUMENT_PATH", "document.avro"); How it works… There are various logical streams within the topology, and certainly the input for the topology is not in the appropriate state for the recipes in this article containing only URLs. We therefore need to select the correct stream from which to consume tuples, coerce these into Avro records, and serialize them into a file. The previous recipe will then periodically consume this file. Within the context of the topology definition, include the following code: Stream documentStream = getUrlStream(topology, spout) .each(new Fields("url"), new DocumentFetchFunction(mimeTypes), new Fields("document", "documentId", "source")); documentStream.each(new Fields("documentId","document"), new PersistDocumentFunction(), new Fields()); The function should consume tuples from the document stream whose tuples are populated with already fetched documents.

0
0
2862

Packt

04 Sep 2013

7 min read

Quick start – writing your first MDX query

Packt

04 Sep 2013

7 min read

(For more resources related to this topic, see here.) Step 1 – open the SQL Server Management Studio and connect to the cube The Microsoft SQL Server Management Studio (SSMS) is a client application used by the administrators to manage instances and by developers to create object and write queries. We will use SSMS to connect on the cube and write our first MDX query. Here's a screenshot of SSMS with a connection on a SSAS server: Click on the Windows button, click on All Programs, click on Microsoft SQL Server 2012, and then click on SQL Server Management Studio. In the Connect to Server window, in the Server type box, select Analysis Services. In the Server name box, type the name of your Analysis Services server. Click on Connect. In the SQL Server Management Studio window, click on the File menu, click on New, and then click on Analysis Services MDX Query. In the Connect to Analysis Services window, in the Server name box, type the name of you Analysis Services server, and then click on Connect. SELECT FROM WHERE If you have already written SQL queries, you might have already made connections with the T-SQL language. Here's my tip for you: don't, you will only hurt yourself. Some words are the same, but it is better to think MDX when writing MDX rather than to think SQL when writing MDX. Step 2 – SELECT The SELECT clause is the main part of the MDX query. You will define what are the measure and dimension members that you want to display. You also have to define on which axis of your result set you want to display the measure and dimension members. Axes Axes are the columns and rows of the result set. With SQL Server Analysis Services, upto 128 axes can be specified. The axes have a number which is zero-based. The first axe is 0, the second on is 1, and so on. So, if you want to use two axes, the first one will be 0 and the second will be 1. You cannot use axe 0 and axe 2, if you don't define axe 1. For the first five axes, you can use the axis alias instead. After the axe 4, you will have to revert to the number because no other aliases are available. Axe Number Alias 0 Columns 1 Rows 2 Pages 3 Sections 4 Chapters Even if SSAS supports 128 axes, if you try to use more than two axes in SSMS in your query, you will get this error when you execute your MDX query: Results cannot be displayed for cellsets with more than two axes. So, always write your MDX queries using only two axes in SSMS and separate them with a comma. Tuples A tuple is a specific point in the cube where dimensions meet. A tuple can contain one or more members from the cube's dimensions, but you cannot have two members from the same dimension. If you want to display only the calendar year 2008, you will have to write [Date].[CY 2008]. If you want to have more than one dimension, you have to enclose them using parenthesis () and separate them with a comma. Calendar year for United States will look like ([Date].[CY 2008], [Geography].[United States]). Even if you are writing a tuple with only a single member from a single dimension, it is good practice to enclose it in parenthesis. Sets If you want to display the year 2005 to 2008, you will write four single-dimension tuples which composes a set. When writing the set, you separate the tuples with commas and wrap it all with curly braces {} and separate the tuples with commas such as {[Date].[CY 2005], [Date].[CY 2006] , [Date].[CY 2007] , [Date].[CY 2008]} to have the calendar years from 2005 to 2008. Since all the tuples are from the same dimension, you can also write it using a colon (:), such as {[Date].[CY 2005]: [Date].[CY 2008]} which will give you the years 2005 to 2008. With SSAS 2012, you can write {[Date].[CY 2008]: [Date].[CY 2005]} and the result will still be from 2005 to 2008. What about the calendar year 2008 for both Canada and the United States? You will write two tuples. A set can be composed of one or more tuples. The tuples must have the same dimensionality; otherwise, an error will occur. Meaning that the first member is from the Date dimension and the second from the Geography dimension. You cannot have the first tuple with Date-Geography and the second being Geography-Date; you will encounter an error. So the calendar year 2008 with Canada and United States will look such as {([Date].[CY 2008], [Geography].[Canada]), ([Date].[CY 2008], [Geography].[United States])}. When writing tuples, always use the form [Dimension].[Level].[MemberName]. So, [Geography].[Canada] should be written as [Geography].[Country].[Canada]. You could also use the member key instead of the member name. In SSAS, use the ampersand (&) when using the key; [Geography].[State-Province].[Quebec] with the name becomes [Geography].[State-Province].&[QC]&[CA] using the keys. What happens when you want to write bigger sets such as for the bikes and components product category in Canada and the United States from 2005 to 2008? Enter the Crossjoin function. Crossjoin takes two or more sets for arguments and returns you a set with the cross products or the specified sets. Crossjoin ({[Product].[Category].[Bikes], [Product].[Category].[Components]}, {[Geography].[Country].[Canada], [Geography].[Country].[United States]}, {[Date].[CY 2005] : [Date].[CY 2008]}) The MDX queries can be written using line-break to add visibility to the code. So each time we write a new set and even tuples, we write it on a new line and add some indentation: Crossjoin ({[Product].[Category].[Bikes], [Product].[Category].[Components]},{[Geography].[Country].[Canada], [Geography].[Country].[United States]}, {[Date].[CY 2005] : [Date].[CY 2008]}) Step 3 – FROM The FROM clause defines where the query will get the data. It can be one of the following four things: A cube. A perspective (a subset of dimensions and measures). A subcube (a MDX query inside a MDX query). A dimension (a dimension inside your SSAS database, you must use the dollar sign ($) before the name of the dimension). Step 4 – WHERE The WHERE clause is used to filter the dimensions and members out of the MDX query. The set used in the WHERE clause won't be displayed in your result set. Step 5 – comments Comment your code. You never know when somebody else will take a look on your queries and trying to understand what has been written could be harsh. There are three ways to use delimit comments inside the query: /* and */ // -- (pair of dashes) The /* and */ symbols can be used to comment multiple lines of text in your query. Everything between the /* and the */ symbols will be ignored when the MDX query is parsed. Use // or -- to begin a comment on a single line. Step 6 – your first MDX query So if you want to display the Resellers Sales Amount and Reseller Order Quantity measures on the columns, the years from 2006 to 2008 with the bikes and components product categories for Canada. First, identify what will go where. Start with the two axes, continue with the FROM clause, and finish with the WHERE clause. SELECT{[Measures].[Reseller Sales Amount], [Measures].[Reseller Order Quantity]} on columns,Crossjoin({[Date].[CY 2006] : [Date].[CY 2008]}, {[Product].[Category].[Bikes], [Product].[Category].[Components]}) on rowsFROM [Adventure Works]WHERE {[Geography].[Country].[Canada]} This query will return the following result set: Reseller Sales Amount Reseller Order Quantity CY 2006 Bikes $3,938,283.99 4,563 CY 2006 Components $746,576.15 2,954 CY 2007 Bikes $4,417,665.71 5,395 CY 2007 Components $997,617.89 4,412 CY 2008 Bikes $1,909,709.62 2,209 CY 2008 Components $370,698.68 1,672 Summary In this article, we saw how to write the MDX queries in various steps. We used the FROM, WHERE, and SELECT clauses in writing the queries. This article was a quick start guide for starting to query and it will help you write more complex queries. Happy querying! Resources for Article : Further resources on this subject: Connecting to Microsoft SQL Server Compact 3.5 with Visual Studio [Article] MySQL Linked Server on SQL Server 2008 [Article] Microsoft SQL Azure Tools [Article]

0
0
9984

Packt

04 Sep 2013

7 min read

So, what is Apache Wicket?

Packt

04 Sep 2013

7 min read

(For more resources related to this topic, see here.) Wicket is a component-based Java web framework that uses just Java and HTML. Here, you will see the main advantages of using Apache Wicket in your projects. Using Wicket, you will not have mutant HTML pages. Most of the Java web frameworks require the insertion of special syntax to the HTML code, making it more difficult for Web designers. On the other hand, Wicket adopts HTML templates by using a namespace that follows the XHTML standard. It consists of an id attribute in the Wicket namespace (wicket:id). You won't need scripts to generate messy HTML code. Using Wicket, the code will be clearer, and refactoring and navigating within the code will be easier. Moreover, you can utilize any HTML editor to edit the HTML files, and web designers can work with little knowledge of Wicket in the presentation layer without worrying about business rules and other developer concerns. The advantages for developers are as follows: All code is written in Java No XML configuration files POJO-centric programming No Back-button problems (that is, unexpected and undesirable results on clicking on the browser's Back button) Ease of creating bookmarkable pages Great compile-time and runtime problem diagnosis Easy testability of components Another interesting thing is that concepts such as generics and anonymous subclasses are widely used in Wicket, leveraging the Java programming language to the max. Wicket is based on components. A component is an object that interacts with other components and encapsulates a set of functionalities. Each component should be reusable, replaceable, extensible, encapsulated, and independent, and it does not have a specific context. Wicket provides all these principles to developers because it has been designed taking into account all of them. In particular, the most remarkable principle is reusability. Developers can create custom reusable components in a straightforward way. For instance, you could create a custom component called SearchPanel (by extending the Panel class, which is also a component) and use it in all your other Wicket projects. Wicket has many other interesting features. Wicket also aims to make the interaction of the stateful server-side Java programming language with the stateless HTTP protocol more natural. Wicket's code is safe by default. For instance, it does not encode state in URLs. Wicket is also efficient (for example, it is possible to do a tuning of page-state replication) and scalable (Wicket applications can easily work on a cluster). Last, but not least, Wicket has support for frameworks like EJB and Spring. Installation In seven easy steps, you can build a Wicket "Hello World" application. Step 1 – what do I need? Before you start to use Apache Wicket 6, you will need to check if you have all of the required elements, listed as follows: Wicket is a Java framework, so you need to have Java virtual machine (at least Version 6) installed on your machine. Apache Maven is required. Maven is a tool that can be used for building and managing Java projects. Its main purpose is to make the development process easier and more structured. More information on how to install and configure Maven can be found at http://maven.apache.org. The examples of this book use the Eclipse IDE Juno version, but you can also use other versions or other IDEs, such as NetBeans. In case you are using other versions, check the link for installing the plugins to the version you have; the remaining steps will be the same. In case of other IDEs, you will need to follow some tutorial to install other equivalent plugins or not use them at all. Step 2 – installing the m2eclipse plugin The steps for installing the m2eclipse plugin are as follows: Go to Help | Install New Software. Click on Add and type in m2eclipse in the Name field; copy and paste the link https://repository.sonatype.org/content/repositories/forge-sites/m2e/1.3.0/N/LATEST onto the Location field. Check all options and click on Next. Conclude the installation of the m2eclipse plugin by accepting all agreements and clicking on Finish. Step 3 – creating a new Maven application The steps for creating a new Maven application are as follows: Go to File | New | Project. Then go to Maven | Maven Project. Click on Next and type wicket in the next form. Choose the wicket-archetype-quickstart maven Archetype and click on Next. Fill the next form according to the following screenshot and click on Finish: Step 4 – coding the "Hello World" program In this step, we will build the famous "Hello World" program. The separation of concerns will be clear between HTML and Java code. In this example, and in most cases, each HTML file has a corresponding Java class (with the same name). First, we will analyse the HTML template code. The content of the HomePage.html file must be replaced by the following code: <!DOCTYPE html> <html > <body> <span wicket_id="helloWorldMessage">Test</span> </body> </html> It is simple HTML code with the Wicket template wicket:id="helloWorldMessage". It indicates that in the Java code related to this page, a method will replace the message Test by another message. Now, let's edit the corresponding Java class; that is, HomePage. package com.packtpub.wicket.hello_world; import org.apache.wicket.markup.html.WebPage; import org.apache.wicket.markup.html.basic.Label; public class HomePage extends WebPage { public HomePage() { add(new Label("helloWorldMessage", "Hello world!!!")); } } The class HomePage extends WebPage; that is, it inherits some of the WebPage class's methods and attributes, and it becomes a WebPage subtype. One of these inherited methods is the method add(), where a Label object can be passed as a parameter. A Label object can be built by passing two parameters: an identifier and a string. The method add() is called in the HomePage class's constructor and will change the message in wicket:id="helloWorldMessage" with Hello world!!!. The resulting HTML code will be as shown in the following code snippet: <!DOCTYPE html> <html > <body> <span>Hello world!!!</span> </body> </html> Step 5 – compile and run! The steps to compile and run the project are as follows: To compile, right-click on the project and go to Run As | Maven install. Verify if the compilation was successful. If not, Wicket provides good error messages, so you can try to fix what is wrong. To run the project, right-click on the class Start and go to Run As | Java application. The class Start will run an embedded Jetty instance that will run the application. Verify if the server has started without any problems. Open a web browser and enter this in the address field: http://localhost:8080. In case you have changed the port, enter http://localhost:<port>. The browser should show Hello world!!!. The most common problem that can occur is that port 8080 is already in use. In this case, you can go into the Java Start class (found at src/test/java) and set another port by replacing 8080 in connector. setPort(8080) (line 21) by another number (for example, 9999). To stop the server, you can either click on Console and press any key or click on the red square on the console, which indicates termination. And that's it! By this point, you should have a working Wicket "Hello World" application and are free to play around and discover more about it. Summary This article describes how to create a simple "Hello World" application using Apache Wicket 6. Resources for Article : Further resources on this subject: Tips for Deploying Sakai [Article] OSGi life cycle [Article] Apache Wicket: Displaying Data Using DataTable [Article]

0
0
5478

How-To Tutorials

Packt

04 Sep 2013

10 min read

Setting up scans

Packt

04 Sep 2013

10 min read

(For more resources related to this topic, see here.) Setting up a scan in Spiceworks The first thing Spiceworks tries to do to scan a network is contact Active Directory(AD); it also uses AD to populate the People portion of your Inventory. Let's set up AD first, as everything else we will be configuring is on the same page. We are all about saving your time and not going back and forth between pages. If you do not have AD in your environment, you can just skip to the Configuring IP range scans section. Scanning and Active Directory There is a wealth of information within AD that Spiceworks uses. We are going to need to configure Spiceworks to log into AD and get that information. OK, we need to get to the Active Directory Configuration screen in Spiceworks in order to do that. As with most things within the app, it is just a couple of clicks. From anywhere in the app, mouse over the Inventory link at the top of the page; a menu will open up. Click on Settings. This will take us to the Settings screen. You will be spending a lot of time here so you can either get very used to these clicks or just have a separate tab open with these settings already set up. The top section is called Getting Started and the first link is Active Directory Configuration. That is our destination for this section so click away. It will take you to the Active Directory Configuration page: There are three sections that are highlighted. Let's go over each and what they do: The area highlighted as 1 is where you are going to enter the credentials that allow Spiceworks to log into your AD and get information. You specify the Active Directory Server (Domain Controller), username and password. Usernames must be in either domain/username or username@domain.com. If you have SSL enabled for AD inquiries, check the Use SSL box. The area highlighted as 2 shows the frequency at which Spiceworks retrieves information from your AD environment. When Spiceworks queries AD, it does not cause a huge amount of traffic or load. Shortening these times should not cause undue stress on your AD servers. This is useful because when you add a user in AD, it will automatically get loaded into Spiceworks at the next scan. If you want any changes you make to users in Spiceworks to be uploaded into your AD environment, the section highlighted as 3 is for you. Just click on the box and any modifications you make in Spiceworks will automatically be synchronized with your AD. There is one more section that is not in the screenshot. This deals with your user portal and help desk. Setting up AD in your Spiceworks really makes a lot of difference with scans and filling in information. It is recommended that if you are running AD, hook this up. If you are wary about Spiceworks writing data into your AD environment, just set up the user that Spiceworks uses to connect as read-only and don't check the box that writes changes back to AD. Easy enough. Since you are convinced that you should connect your AD to Spiceworks, just fill in the ActiveDirectory server, User, and Password fields and click on Save. Spiceworks will automatically test the credentials and let you know immediately if it can connect. If you have some challenges with Spiceworks connecting to your Domain Controller with just the server name, another method is to put the IP address directly into that field. Let's move on to setting up an IP range scan and get some devices into your Spiceworks install. Configuring IP range scans Remember the Settings page that we have been to a couple of times? We are going back! In case you have forgotten, just mouse over to either Inventory or Help Desk and click on the Settings link at the bottom of the left column. Once on the Settings page, we are going to click on the Network Scan link. It is in the first section of links titled Getting Started. This takes us to the main Network Scan page. The first section is where we are going to set up our IP ranges. Since you will not have any ranges in here as you just installed Spiceworks, let's get one configured so you can get some information into the app. To do this, just click on the Add IP Range button and this window will pop up. There is a lot of flexibility that Spiceworks gives you regarding how it scans IP ranges. You can put a fill range (192.168.1.1-254) with or without exclusions, or just a single IP if you so wish. The next box is for exclusions, if you so choose. If you decide you want to scan a range that has both servers and desktops, you can exclude server IP addresses. This is handy. The last options are for scheduling this IP range scan. If you choose the Daily at… option as we have seen in the screenshot, you can also select the time of the day to run this scan. Other options in this drop-down list are every 4, 6, 8, or 12 hours. If you do decide that you want to scan on an hourly basis, the time of the day magically disappears. The bottom of the window lets you select what days of the week you want to run the scan. When Spiceworks runs an initial scan, it can take a bit of time as there is a ton of data that it is collecting. Spiceworks tries a multitude of credentials and reads all information from devices, which it then writes to the database. Once Spiceworks has scanned and written the data to the database, any subsequent scans just write delta data into it. Enter what range you want to scan, any exclusions you choose, and the scan frequency, and click on the Add button. Congratulations! You have just added an IP range scan! Scanning credentials As we have covered, Spiceworks uses a multitude of credentials to try and figure out what is on your network and put those devices into the inventory. This has been completely overhauled in Spiceworks. In this easy-to-use interface, you can enter all the credentials that you are going to need to have a successful scan. Here you can configure multiple usernames/passwords for the following protocols: WMI SSH SNMP Enable ESX/vSphere HTTP iLo SNMP v2c/v3 Telnet Intel vPro As you can see, if you need to put device-specific usernames and passwords into Spiceworks, you can do so using the format, Domainusername. So if you have a server that uses a unique username/password combination, it is easy to set all that up through this interface. The preceding screenshot shows an example of this. Something new in Spiceworks is the section where it shows devices that the credentials were successfully used on. This is really helpful for troubleshooting any scan errors! To add your own username/password combinations, just use these easy-to-follow directions: Click on the protocol you want to add credentials to on the left column (WMI, SNMP, and so on). Click on +Add Account in the middle column labeled Existing Accounts. Enter all the pertinent information on the left pane labeled Edit Account.For usernames that have passwords, there is a Show Password button as well, so you can make sure that you didn't fat finger it! That's it. Just fill in any credentials that will let Spiceworks access your devices on your network, and as far as permissions are concerned you should be good to go! Best practices and kicking off your first Spiceworks scan You have everything you need to start your first Spiceworks scan. It might be best to read the following best practices before you kick it off, though. They will guide you through some potential pitfalls. Scanning best practices For initial scans, be aware of the number of IP addresses you are scanning and the amount of information that Spiceworks is going to pull out of those devices. If you put in a full IP range on your first scan, do not expect Spiceworks to be completed in 10-15 minutes. The initial scan is the most network traffic intensive and will take the longest duration of time. Do full initial scans during nonbusiness hours. Though running an initial full scan shouldn't flood your network, depending on your network configuration, it is always best to run full initial scans during nonbusiness hours just in case. If you are running a 24 x 7 business, break up your IP ranges into smaller chunks and scan that way. Expect some unknown devices. Unless you are a super administrator with a team of hundreds behind you to make sure that every aspect of your network is 100 percent buttoned down, there will most likely be a few devices that Spiceworks cannot connect to. One of the biggest culprits is that WMI has been disabled, or that there is a firewall of some sort blocking Spiceworks from connecting to the machine. Don't get down on yourself if the scan doesn't work 100 percent the first time. If you are really worried about traffic that Spiceworks might cause, what information it collects, or how it will affect workstation performance, just set up a test environment and run a scan there. Whether it be 5 machines or 500, Spiceworks does the same to each one; so test away. Spiceworks is not designed to scan 10,000 devices at one time without a performance hit. If you have a very large network, break it up into smaller chunks for best performance. Spiceworks could get through a 10,000 device scan, but it would hurt performance until the scan is complete. If you have multiple sites linked either by WAN or VPN connections, drop a remote collector at these to run local scans and then send the data back to your main Spiceworks installation. You can find more information at http://community.spiceworks.com/help/Remote_Collectors OK, now that you have read the required best practices, you can set up your IP range on the Network Scan settings page, check the box associated with that range and click on Start Scan. Away you go! Depending on the IP range you set and the time of the day, your scan could take just a few minutes or several hours. If you are having some serious issues trying to get a successful scan, open a browser and hit this site: http://community.spiceworks.com/support. There are in-depth articles and even real-live support folks that can dive into the specifics of your environment, and they won't give up until you are successful. Let's assume that even if you did have an issue, it is resolved and you have got your first scan under your belt. Summary We were provided with details on how to set up a scan in Spiceworks. Also, we got to know how to run the scan we set up and the best practices. Resources for Article : Further resources on this subject: Using SpriteFonts in a Board-based Game with XNA 4.0 [Article] Why CoffeeScript?HTML5 Games Development: Using Local Storage to Store Game Data [Article] Making Money with Your Game [Article]

0
0
11583

article-image-understanding-point-time-recovery

Packt

04 Sep 2013

28 min read

Understanding Point-In-Time-Recovery

Packt

04 Sep 2013

28 min read

0
0
5261

Packt

04 Sep 2013

14 min read

Using Gerrit with GitHub

Packt

04 Sep 2013

14 min read

In this article by Luca Milanesio, author of the book Learning Gerrit Code review, we will learn about Gerrit Code revew. GitHub is the world's largest platform for the free hosting of Git Projects, with over 4.5 million registered developers. We will now provide a step-by-step example of how to connect Gerrit to an external GitHub server so as to share the same set of repositories. Additionally, we will provide guidance on how to use the Gerrit Code Review workflow and GitHub concurrently. By the end of this article we will have our Gerrit installation fully integrated and ready to be used for both open source public projects and private projects on GitHub. (For more resources related to this topic, see here.) GitHub workflow GitHub has become the most popular website for open source projects, thanks to the migration of some major projects to Git (for example, Eclipse) and new projects adopting it, along with the introduction of the social aspect of software projects that piggybacks on the Facebook hype. The following diagram shows the GitHub collaboration model: The key aspects of the GitHub workflow are as follows: Each developer pushes to their own repository and pulls from others Developers who want to make a change to another repository, create a fork on GitHub and work on their own clone When forked repositories are ready to be merged, pull requests are sent to the original repository maintainer The pull requests include all of the proposed changes and their associated discussion threads Whenever a pull request is accepted, the change is merged by the maintainer and pushed to their repository on GitHub GitHub controversy The preceding workflow works very effectively for most open source projects; however, when the projects gets bigger and more complex, the tools provided by GitHub are too unstructured, and a more defined review process with proper tools, additional security, and governance is needed. In May 2012 Linus Torvalds , the inventor of Git version control, openly criticized GitHub as a commit editing tool directly on the pull request discussion thread: " I consider GitHub useless for these kinds of things. It's fine for hosting, but the pull requests and the online commit editing, are just pure garbage " and additionally, " the way you can clone a (code repository), make changes on the web, and write total crap commit messages, without GitHub in any way making sure that the end result looks good. " See https://github.com/torvalds/linux/pull/17#issuecomment-5654674. Gerrit provides the additional value that Linus Torvalds claimed was missing in the GitHub workflow: Gerrit and GitHub together allows the open source development community to reuse the extended hosting reach and social integration of GitHub with the power of governance of the Gerrit review engine. GitHub authentication The list of authentication backends supported by Gerrit does not include GitHub and it cannot be used out of the box, as it does not support OpenID authentication. However, a GitHub plugin for Gerrit has been recently released in order to fill the gaps and allow a seamless integration. GitHub implements OAuth 2.0 for allowing external applications, such as Gerrit, to integrate using a three-step browser-based authentication. Using this scheme, a user can leverage their existing GitHub account without the need to provision and manage a separate one in Gerrit. Additionally, the Gerrit instance will be able to self-provision the SSH public keys needed for pushing changes for review. In order for us to use GitHub OAuth authentication with Gerrit, we need to do the following: Build the Gerrit GitHub plugin Install the GitHub OAuth filter into the Gerrit libraries (/lib under the Gerrit site directory) Reconfigure Gerrit to use the HTTP authentication type Building the GitHub plugin The Gerrit GitHub plugin can be found under the Gerrit plugins/github repository on https://gerrit-review.googlesource.com/#/admin/projects/plugins/github. It is open source under the Apache 2.0 license and can be cloned and built using the Java 6 JDK and Maven. Refer to the following example: $ git clone https://gerrit.googlesource.com/plugins/github $ cd github $ mvn install […] [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------- [INFO] Total time: 9.591s [INFO] Finished at: Wed Jun 19 18:38:44 BST 2013 [INFO] Final Memory: 12M/145M [INFO] ------------------------------------------------------- The Maven build should generate the following artifacts: github-oauth/target/github-oauth*.jar, the GitHub OAuth library for authenticating Gerrit users github-plugin/target/github-plugin*.jar, the Gerrit plugin for integrating with GitHub repositories and pull requests Installing GitHub OAuth library The GitHub OAuth JAR file needs to copied to the Gerrit /lib directory; this is required to allow Gerrit to use it for filtering all HTTP requests and enforcing the GitHub three-step authentication process: $ cp github-oauth/target/github-oauth-*.jar /opt/gerrit/lib/ Installing GitHub plugin The GitHub plugin includes the additional support for the overall configuration, the advanced GitHub repositories replication, and the integration of pull requests into the Code Review process. We now need to install the plugin before running the Gerrit init again so that we can benefit from the simplified automatic configuration steps: $ cp github-plugin/target/github-plugin-*.jar /opt/gerrit/plugins/github.jar Register Gerrit as a GitHub OAuth application Before going through the Gerrit init, we need to tell GitHub to trust Gerrit as a partner application. This is done through the generation of a ClientId/ClientSecret pair associated to the exact Gerrit URLs that will be used for initiating the 3-step OAuth authentication. We can register a new application in GitHub through the URL https://github.com/settings/applications/new, where the following three fields are requested: Application name : It is the logical name of the application authorized to access GitHub, for example, Gerrit. Main URL : The Gerrit canonical web URL used for redirecting to GitHub OAuth authentication, for example, https://myhost.mydomain:8443. Callback URL : The URL that GitHub should redirect to when the OAuth authentication is successfully completed, for example, https://myhost.mydomain:8443/oauth. GitHub will automatically generate a unique pair ClientId/ClientSecret that has to be provided to Gerrit identifying them as a trusted authentication partner. ClientId/ClientSecret are not GitHub credentials and cannot be used by an interactive user to access any GitHub data or information. They are only used for authorizing the integration between a Gerrit instance and GitHub. Running Gerrit init to configure GitHub OAuth We now need to stop Gerrit and go through the init steps again in order to reconfigure the Gerrit authentication. We need to enable HTTP authentication by choosing an HTTP header to be used to verify the user's credentials, and to go through the GitHub settings wizard to configure the OAuth authentication. $ /opt/gerrit/bin/gerrit.sh stop Stopping Gerrit Code Review: OK $ cd /opt/gerrit $ java -jar gerrit.war init [...] *** User Authentication *** Authentication method []: HTTP RETURN Get username from custom HTTP header [Y/n]? Y RETURN Username HTTP header []: GITHUB_USER RETURN SSO logout URL : /oauth/reset RETURN *** GitHub Integration *** GitHub URL [https://github.com]: RETURN Use GitHub for Gerrit login ? [Y/n]? Y RETURN ClientId []: 384cbe2e8d98192f9799 RETURN ClientSecret []: f82c3f9b3802666f2adcc4 RETURN Initialized /opt/gerrit $ /opt/gerrit/bin/gerrit.sh start Starting Gerrit Code Review: OK Using GitHub login for Gerrit Gerrit is now fully configured to register and authenticate users through GitHub OAuth. When opening the browser to access any Gerrit web pages, we are automatically redirected to the GitHub for login. If we have already visited and authenticated with GitHub previously, the browser cookie will be automatically recognized and used for the authentication, instead of presenting the GitHub login page. Alternatively, if we do not yet have a GitHub account, we create a new GitHub profile by clicking on the SignUp button. Once the authentication process is successfully completed, GitHub requests the user's authorization to grant access to their public profile information. The following screenshot shows GitHub OAuth authorization for Gerrit: The authorization status is then stored under the user's GitHub applications preferences on https://github.com/settings/applications. Finally, GitHub redirects back to Gerrit propagating the user's profile securely using a one-time code which is used to retrieve the full data profile including username, full name, e-mail, and associated SSH public keys. Replication to GitHub The next steps in the Gerrit to GitHub integration is to share the same Git repositories and then keep them up-to-date; this can easily be achieved by using the Gerrit replication plugin. The standard Gerrit replication is a master-slave, where Gerrit always plays the role of the master node and pushes to remote slaves. We will refer to this scheme as push replication because the actual control of the action is given to Gerrit through a git push operation of new commits and branches. Configure Gerrit replication plugin In order to configure push replication we need to enable the Gerrit replication plugin through Gerrit init: $ /opt/gerrit/bin/gerrit.sh stop Stopping Gerrit Code Review: OK $ cd /opt/gerrit $ java -jar gerrit.war init [...] *** Plugins *** Prompt to install core plugins [y/N]? y RETURN Install plugin reviewnotes version 2.7-rc4 [y/N]? RETURN Install plugin commit-message-length-validator version 2.7-rc4 [y/N]? RETURN Install plugin replication version 2.6-rc3 [y/N]? y RETURN Initialized /opt/gerrit $ /opt/gerrit/bin/gerrit.sh start Starting Gerrit Code Review: OK The Gerrit replication plugin relies on the replication.config file under the /opt/gerrit/etc directory to identify the list of target Git repositories to push to. The configuration syntax is a standard .ini format where each group section represents a target replica slave. See the following simplest replication.config script for replicating to GitHub: [remote "github"] url = git@github.com:myorganisation/${name}.git The preceding configuration enables all of the repositories in Gerrit to be replicated to GitHub under the myorganisa tion GitHub Team account. Authorizing Gerrit to push to GitHub Now, that Gerrit knows where to push, we need GitHub to authorize the write operations to its repositories. To do so, we need to upload the SSH public key of the underlying OS user where Gerrit is running to one of the accounts in the GitHub myorganisation team, with the permissions to push to any of the GitHub repositories. Assuming that Gerrit runs under the OS user gerrit, we can copy and paste the SSH public key values from the ~gerrit/.ssh/id_rsa.pub (or ~gerrit/.ssh/id_dsa.pub) to the Add an SSH Key section of the GitHub account under target URL to be set to: https://github.com/settings/ssh Start working with Gerrit replication Everything is now ready to start playing with Gerrit to GitHub replication. Whenever a change to a repository is made on Gerrit, it will be automatically replicated to the corresponding GitHub repository. In reality there is one additional operation that is needed on the GitHub side: the actual creation of the empty repositories using https://github.com/new associated to the ones created in Gerrit. We need to make sure that we select the organization name and repository name, consistent with the ones defined in Gerrit and in the replication.config file. Never initialize the repository from GitHub with an empty commit or readme file; otherwise the first replication attempt from Gerrit will result in a conflict and will then fail. Now GitHub and Gerrit are fully connected and whenever a repository in GitHub matches one of the repositories in Gerrit, it will be linked and synchronized with the latest set of commits pushed in Gerrit. Thanks to the Gerrit-GitHub authentication previously configured, Gerrit and GitHub share the same set of users and the commits authors will be automatically recognized and formatted by GitHub. The following screenshot shows Gerrit commits replicated to GitHub: Reviewing and merging to GitHub branches The final goal of the Code Review process is to agree and merge changes to their branches. The merging strategies need to be aligned with real-life scenarios that may arise when using Gerrit and GitHub concurrently. During the Code Review process the alignment between Gerrit and GitHub was at the change level, not influenced by the evolution of their target branches. Gerrit changes and GitHub pull requests are isolated branches managed by their review lifecycle. When a change is merged, it needs to align with the latest status of its target branch using a fast-forward, merge, rebase, or cherry-pick strategy. Using the standard Gerrit merge functionality, we can apply the configured project merge strategy to the current status of the target branch on Gerrit. The situation on GitHub may have changed as well, so even if the Gerrit merge has succeeded there is no guarantee that the actual subsequent synchronization to GitHub will do the same! The GitHub plugin mitigates this risk by implementing a two-phase submit + merge operation for merging opened changes as follows: Phase-1 : The change target branch is checked against its remote peer on GitHub and fast forwarded if needed. If two branches diverge, the submit + merge is aborted and manual merge intervention is requested. Phase-2 : The change is merged on its target branch in Gerrit and an additional ad hoc replication is triggered. If the merge succeeds then the GitHub pull request is marked as completed. At the end of Phase-2 the Gerrit and GitHub statuses will be completely aligned. The pull request author will then receive the notification that his/her commit has been merged. Using Gerrit and GitHub on http://gerrithub.io When using Gerrit and GitHub on the web with public or private repositories, all of the commits are replicated from Gerrit to GitHub, and each one of them has a complete copy of the data. If we are using a Git and collaboration server on GitHub over the Internet, why can't we do the same for its Gerrit counterpart? Can we avoid installing a standalone instance of Gerrit just for the purpose of going through a formal Code Review? One hassle-free solution is to use the GerritHub service (http://gerrithub.io), which offers a free Gerrit instance on the cloud already configured and connected with GitHub through the github-plugin and github-oauth authentication library. All of the flows that we have covered in this article are completely automated, including the replication and automatic pull request to change automation. As accounts are shared with GitHub, we do not need to register or create another account to use GerritHub; we can just visit http://gerrithub.io and start using Gerrit Code Review with our existing GitHub projects without having to teach our existing community about a new tool. GerritHub also includes an initial setup Wizard for the configuration and automation of the Gerrit projects and the option to configure the Gerrit groups using the existing GitHub. Once Gerrit is configured, the Code Review and GitHub can be used seamlessly for achieving maximum control and social reach within your developer community. Summary We have now integrated our Gerrit installation with GitHub authentication for a seamless Single-Sign-On experience. Using an existing GitHub account we started using Gerrit replication to automatically mirror all the commits to GitHub repositories, allowing our projects to have an extended reach to external users, free to fork our repositories, and to contribute changes as pull requests. Finally, we have completed our Code Review in Gerrit and managed the merge to GitHub with a two-phase change submit + merge process to ensure that the target branches on both Gerrit and GitHub have been merged and aligned accordingly. Similarly to GitHub, this Gerrit setup can be leveraged for free on the web without having to manage a separate private instance, thanks to the free set target URL to http://gerrithub.io service available on the cloud. Resources for Article : Further resources on this subject: Getting Dynamics NAV 2013 on Your Computer – For (Almost) Free [Article] Building Your First Zend Framework Application [Article] Quick start - your first Sinatra application [Article]

0
1
51232

Packt

04 Sep 2013

7 min read

Rapid Development

Packt

04 Sep 2013

7 min read

(For more resources related to this topic, see here.) Concept of reusability The concept of reusability has its roots in the production process. Typically, most of us go about creating e-learning using a process similar to what is shown in the following screenshot. It works well for large teams and the one man band, except in the latter case, you become a specialist for all the stages of production. That's a heavy load. It's hard to be good at all things and it demands that you constantly stretch and improve your skills, and find ways to increase the efficiency of what you do. Reusability in Storyline is about leveraging the formatting, look and feel and interactions you create so that you can re-purpose your work and speed-up production. Not every project will be an original one-off, in fact most won't, so the concept is to approach development with a plan to repurpose 80 percent of the media, quizzes, interactions, and designs you create. As you do this, you begin to establish processes, templates, and libraries that can be used to rapidly assemble base courses. With a little tweaking and some minor customization, you'll have a new, original course in no time. Your client doesn't need to know that 80 percent was made from reusable elements with just 20 percent created as original, unique components, but you'll know the difference in terms of time and effort. Leveraging existing assets So how can you leverage existing assets with Storyline? The first things you'll want to look at are the courses you've built with other authoring programs, such as PowerPoint, QuizMaker Engage, Captivate, Flash, and Camtasia. If there are design themes, elements, or interactions within these courses that you might want to use for future Storyline courses, you should focus your efforts on importing what you can, and further adjusting within Storyline to create a new version of the asset that can be reused for future Storyline courses. If re-working the asset is too complex or if you don't expect to reuse it in multiple courses, then using Storyline's web object feature to embed the interaction without re-working it in any way may be the better approach. In both cases, you'll save time by reusing content you've already put a lot of time in developing. Importing external content Here are the steps to bring external content into Storyline: From the Articulate Startup screen or by choosing the Insert tab, and then New Slide within a project, select the Import option. There are options to import PowerPoint, Quizmaker, and Storyline. All of these will display the slides within the file to be imported. You can pick and choose which slides to import into a new or the current scene in Storyline. The Engage option displays the entire interaction that can be imported into a single slide in the current or a new scene. Click on Import to complete the process. Considerations when importing Keep the following points in mind when importing: PowerPoint and Quizmaker files can be imported directly into Storyline. Once imported, you can edit the content like you would any other Storyline slide. Master slides come along with the import making it simple to reuse previous designs. Note that 64-bit PowerPoint is not supported and you must have an installed, activated version of Quizmaker for the import to work. The PowerPoint to Storyline conversion is not one-to-one. You can expect some alignment issues with slide objects due to the fact that PowerPoint uses points and Storyline uses pixels. There are 2.66 pixels for each point which is why you'll need to tweak the imported slides just a bit. Same with Quizmaker though the reason why is slightly different; Quizmaker is 686 x 424 in size, whereas Storyline is 720 x 540 by default. Engage files can be imported into Storyline and they are completely functional, but cannot be edited within Storyline. Though the option to import Engage appears on the Import screen, what Storyline is really doing is creating a web object to contain the Engage interaction. Once imported into a new scene, clicking on the Engage interaction will display an Options menu where you can make minor adjustments to the behavior of the interaction as well as Preview and Edit in it Engage. You can also resize and position the interaction just as you would any web object. Remember that though web objects work in iPad and HTML5 outputs, Engage content is Flash, so it will not playback on an iPad or in an HTML5 browser. Like Quizmaker, you'll need an installed, activated version of Engage for the import to work. Flash, Captivate, and Camtasia files cannot be imported in Storyline and cannot be edited within Storyline. You can however, use web objects to embed these projects into Storyline or the Insert Flash option. In both cases, the imported elements appear seamless to the learner while retaining full functionality. Build once, and reuse many times Quizzing is at the heart of many e-learning courses where often the quiz questions need to be randomized or even reused in different sections of a single course (that is, the same questions for a pre and post-test). The concept of building once and reusing many times works well with several aspects of Storyline. We'll start with quizzing and a feature called Question Banks as follows: Question Banks Question Bank offers a way to pool, reuse, and randomize questions within a project. Slides in a question bank are housed within the project file but are not visible until placed into the story. Question Banks can include groups of quiz slides and regular slides (that is, you might include a regular slide if you need to provide instructions for the quiz or would like to include a post-quiz summary). When you want to include questions from a Question Bank, you just need to insert a new Quizzing slide, and then choose Draw from Bank . You can then select one or more questions to include and randomize them if desired. Follow along… In this exercise we will be removing three questions from a scene and moving them into a question bank. This will allow you to draw one or more of those questions at any point in the project where the quiz questions are needed, as follows: From the Home tab, choose Question Banks , and then Create Question bank . Title this Identity Theft Questions . Notice that a new tab has opened in Normal View . The Question Bank appears in this tab. Click on the Import link and navigate to question slides 2, 3, and 4. From the Import drop-down menu at the top, select move questions into question bank . Click on the Story View tab and notice the three slides containing the quiz questions are no longer in the story. Click back on the Identity Theft tab and notice that they are located here. The questions will not become a part of the story until the next step, when you draw them from the bank. In Story View, click once on slide 1 to select it, and then from the Home tab, choose Question Banks and New Draw from Question Bank . From the Question Bank drop-down menu, select Identity Theft Questions . All questions will be selected by default and will be randomized after being placed into the story. This means that the learner will need to answer three questions before continuing onto the next slide in the story. Click on Insert . The Question Bank draw has been inserted as slide 2. To see how this works, Preview the scene. Save as Exercise 11 – Identity Theft Quiz. There are multiple ways to get back to the questions that are in a question bank. You can do this by selecting the tab the questions are located in (in this case, Identity Theft ), you can view the question bank slide in Normal View or choose Question Banks from the Home tab and navigate to the name of the question bank you'd like to edit.

0
0
1452

How-To Tutorials

article-image-using-virtual-destinations-advanced

Packt

04 Sep 2013

5 min read

Using Virtual Destinations (Advanced)

Packt

04 Sep 2013

5 min read

0
0
5999

How-To Tutorials

Setting up a single-width column system (Simple)

Using indexes to manipulate pandas objects

Cocos2d-x: Installation

So, what is PowerShell 3.0 WMI?

Chef Infrastructure

Using a LINQ query in LINQPad

Learning MuseScore

Integrating Storm and Hadoop

Quick start – writing your first MDX query

So, what is Apache Wicket?

Trending Topics

Setting up scans

Understanding Point-In-Time-Recovery

Using Gerrit with GitHub

Rapid Development

Using Virtual Destinations (Advanced)

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access