How-To Tutorials

article-image-flash-10-multiplayer-game-lobby-and-new-game-screen-implementation

27 Jul 2010

5 min read

Flash 10 Multiplayer Game: The Lobby and New Game Screen Implementation

27 Jul 2010

(For more resources on Flash and Games, see here.) The lobby screen implementation In this section, we will learn how to implement the room display within the lobby. Lobby screen in Hello World Upon login, the first thing the player needs to do is enter the lobby. Once the player has logged into the server successfully, the default behavior of the PulseGame in PulseUI is to call enterLobby API. The following is the implementation within PulseGame: protected function postInit():void { m_netClient.enterLobby();} Once the player has successfully entered the lobby, the client will start listening to all the room updates that happen in the lobby. These updates include any newly created room, any updates to the room objects, for example, any changes to the player count of a game room, host change, etc. Customizing lobby screen In the PulseUI, the lobby screen is the immediate screen that gets displayed after a successful login. The lobby screen is drawn over whatever the outline object has drawn onto the screen. The following is added to the screen when the lobby screen is shown to the player: Search lobby UI Available game rooms Game room scroll buttons Buttons for creating a new game room Navigation buttons to top ten and register screens When the lobby is called to hide, the lobby UI elements are taken off the screen to make way for the incoming screen. For our initial game prototype, we don't need to make any changes. The PulseUI framework already offers all of the essential set of functionalities of a lobby for any kind of multiplayer game. However, the one place you may want to add more details is in what gets display for each room within the lobby. Customizing game room display The room display is controlled by the class RoomsDisplay, an instance of which is contained in GameLobbyScreen. The RoomsDisplay contains a number of RoomDisplay object instances, one for each room being displayed. In order to modify what gets displayed in each room display, we do it inside of the class that is subclassed from RoomDisplay. The following figure shows the containment of the Pulse layer classes and shows what we need to subclass in order to modify the room display: In all cases, we would subclass (MyGame) the PulseGame. In order to have our own subclass of lobby screen, we first need to create class (MyGameLobbyScreen) inherited from GameLobbyScreen. In addition, we also need to override the method initLobbyScreen as shown below: protected override function initLobbyScreen():void { m_gameLobbyScreen = new MyGameLobbyScreen();} In order to provide our own RoomsDisplay, we need to create a subclass (MyRoomsDisplay) inherited from RoomsDisplay class and we need to override the method where it creates the RoomsDisplay in GameLobbyScreen as shown below: protected function createRoomsDisplay():void { m_roomsDisplay = new MyRoomsDisplay();} Finally, we do similar subclassing for MyRoomDisplay and override the method that creates the RoomDisplay in MyRoomsDisplay as follows: protected override function createRoomDisplay (room:GameRoomClient):RoomDisplay { return new MyRoomDisplay(room);} Now that we have hooked up to create our own implementation of RoomDisplay, we are free to add any additional information we like. In order to add additional sprites, we now simply need to override the init method of GameRoom and provide our additional sprites. Filtering rooms to display The choice is up to the game developer to either display all the rooms currently created or just the ones that are available to join. We may override the method shouldShowRoom method in the subclass of RoomsDisplay (MyRoomsDisplay) to change the default behavior. The default behavior is to show rooms that are only available to join as well as rooms that allow players to join even after the game has started. Following is the default method implementation: protected function shouldShowRoom(room:GameRoomClient):Boolean { var show:Boolean; show = (room.getRoomType() == GameConstants.ROOM_ALLOW_POST_START); if(show == true) return true; else { return (room.getRoomStatus() == GameConstants.ROOM_STATE_WAITING); }} Lobby and room-related API Upon successful logging, all game implementation must call the enterLobby method. public function enterLobby(gameLobbyId:String = "DefaultLobby"):void You may pass a null string in case you only wish to have one default lobby. The following notification will be received again by the client whether the request to enter a lobby was successful or not. At this point, the game screen should switch to the lobby screen. function onEnteredLobby(error:int):void If entering a lobby was successful, then the client will start to receive a bunch of onNewGameRoom notifications, one for each room that was found active in the entered lobby. The implementation should draw the corresponding game room with the details on the lobby screen. function onNewGameRoom(room:GameRoomClient):void The client may also receive other lobby-related notifications such as onUpdateGameRoom for any room updates and onRemoveGameRoom for any room objects that no longer exist in lobby. function onUpdateGameRoom(room:GameRoomClient):voidfunction onRemoveGameRoom(room:GameRoomClient):void If the player wishes to join an existing game room in the lobby, you simply call joinGameRoom and pass the corresponding room object. public function joinGameRoom(gameRoom:GameRoomClient):void In response to a join request, the server notifies the requesting client of whether the action was successful or failed via the game client callback method. function onJoinedGameRoom(gameRoomId:int, error:int):void A player already in a game room may leave the room and go back to the lobby, by calling the following API: public function leaveGameRoom():void Note that if the player successfully left the room, the calling game client will receive the notification via the following callback API: function onLeaveGameRoom(error:int):void

0
0
4931

article-image-joomla-15-top-extensions-using-languages

Packt

28 Oct 2010

5 min read

Joomla! 1.5 Top Extensions for Using Languages

Packt

28 Oct 2010

5 min read

Joomla! 1.5 Top Extensions Cookbook Over 80 great recipes for taking control of Joomla! Extensions Set up and use the best extensions available for Joomla! Covers extensions for just about every use of Joomla! Packed with recipes to help you get the most of the Joomla! extensions Part of Packt's Cookbook series: Each recipe is a carefully organized sequence of instructions to complete the task as efficiently as possible Introduction One of the greatest features of Joomla! is that you can build a multilingual website. The Joomla! interface can be displayed in many languages. You can simply download the translation pack for the required language and install that to Joomla!. If you don't have a translation pack for your desired language, you can translate it by editing language files directly or by using the translation manager component. The translation manager component allows you to visually translate your site's interface into any language, right from the Joomla! administration area. After completing the translation, you can pack the translation and share it with others, so that they can install the translation in other Joomla! sites. Besides translating the Joomla! interface, you can translate a site's contents into your desired language. The GTranslate component allows you to translate your site's content into 55 languages using Google's translation service. Adding a language to your site Joomla! can build a multilingual site. A site interface can be in multiple languages, using different locales. In this recipe, you will learn how to add an additional language to a Joomla! site. Getting ready... Joomla! translations are available in major languages. First, decide which language you want to add to your site. For example, we want to add French to a Joomla! website. A French translation for Joomla! 1.5 is available to download at the Joomla! Extensions Directory, http://extensions.joomla.org/extensions/languages/translations-for-joomla. Download this extension from http://joomlacode.org/gf/project/french/frs/, and install it from the Extensions | Install/Uninstall screen. How to do it... After installation, follow the steps as shown: From the Joomla! administration panel, click on Extensions | Language Manager. This will show the Language Manager screen, listing the installed languages for the site: Note that the default language for the site is marked with a yellow star in the Default column. To make the newly-installed language, French (Fr), the default language for your site, select the language and click on the Default button in the toolbar. Preview the site's frontend and you will find the site's interface (not content) in French. For example, the Login Form module will look like the following screenshot: For changing the language of the administration panel, in the Language Manager screen click on Administrator, select a language from the list, and set that as the default language for the administrator backend. See also... Adding a translation will only show the Joomla! interfaces in that language. The content of the site is not translated or displayed in the selected language. Also note that we still don't have a mechanism to select our desired language. All of these things can be done using the Joom!Fish extension, which is discussed in the recipe Manually Translating Your Joomla! Site's Content into Your Desired Language. Translating language files for your site Joomla!'s translations are available in most major languages. However, you may like to change the translations and have your own translation in your desired language. In that case, Joomla! provides a mechanism to translate the Joomla! interface language. In this recipe, you will learn how to translate language files for your site from the administration backend. Getting ready... Translation Manager is a popular extension that can help you translate the site's language files right from the administration backend, without opening a text editor. Download this extension from http://joomlacode.org/gf/project/joomla_1_5_tr1/frs/ and install it from the Extensions | Install/Uninstall screen. How to do it... After installation, follow the steps as shown: From the Joomla! administration panel, click on Components | Translation Manager. This will show the Translation Manager screen, listing all of the installed languages for the site and the administration backend. < For changing any language translation, select that language, for example Site [en-GB] English(United Kingdom), and click on the View Files button. This will show the language files for that language. Now select a file, such as, com_banners, and click on the Edit button. This shows the string editing screen for the com_banners.ini file. Change the strings accordingly, and click on the Save button in the toolbar. For adding a new language, click on New in the Translation Manager screen. This will show the Create New Language screen: In the Language Details section, configure the following: Client: Select who will be the client for this translation—Administrator, Installation, or Site. If you want to translate for the administrator interface, select Administrator. We want to translate the site's frontend, therefore we select Site. Language ISO tag: Type the ISO tag for the language. For example, if we want to translate it into Bengali, type ISO code bn-BD. Name: Type language name, that is Bangla. Description: Type a short description for the translation. Legacy Name: Type the traditional name of the language, for example, bn for bn-BD. Language Locales: Type the locale code for the language. Windows Code page: Specify the code page for the language. The default is iso-8859-1. For the Bangla language it will be utf-8. PDF Font: Specify the font family to be used for displaying the PDF in that language. Right-to-Left: Specify Yes if the language is to be read from right to left (for example, Arabic). In the Author Details section, provide the translator's name (probably your name), e-mail address, website URL, version number for the translation, creation date, the copyright holder's name, and URL to the license document. When done, click on the Save button in the toolbar. This saves the language definition and you will see the language name on the Translation Manager screen:

0
0
4925

article-image-customizing-kubrik-wordpress

Packt

19 Apr 2010

5 min read

Customizing Kubrik with Wordpress

Packt

19 Apr 2010

5 min read

Getting ready The first step is to make a list of the changes that you want to make. Here's what we've come up with: Increase width to 980px Increase font sizes in sidebar Use a graphic header We've picked 980px as our target width because this size is optimized for a 1024 screen resolution and works well in a grid layout. Several CSS adjustments will be necessary to realize this modification, as well as using an image editing program (we will be using Photoshop). How to do it... To increase the page width, the first step is to determine which entries in the CSS stylesheet are controlling the width. Using Firebug to inspect the page (as seen below), we find that the selector #page has a value of 760px for the width property. And #header has a width of 758px (less because there is a 1px left margin). The .narrowcolumn selector gives the main content column a width of 450px. And #sidebar has a width of 190px. Finally, #footer has a width of 760px. So, we will increase #page and #footer to 980px. #header we will increase to 978px. Let's apply all of the additional 220px width to .narrowcolumn. Taking note of the existing 45px left margin, our new value for the width property will be 700px. That means #sidebar width will remain at 190px, but the margin-left will need to be increased from 545px to 765px. Click on Appearance | Editor. In the right-hand column, below the Templates heading, click on style.css. Scroll past the section that says /* Begin Typography & Colors */, until you get to the section that says /* Begin Structure */. Make the following changes to the stylesheet (style.css), commenting as appropriate to document your changes. #page { background-color: white; margin: 20px auto; padding: 0; width: 980px; /* increased from 760px */ border: 1px solid #959596; }#header { background-color: #73a0c5; margin: 0 0 0 1px; padding: 0; height: 200px; width: 978px; /* increased from 758px */ }.narrowcolumn { float: left; padding: 0 0 20px 45px; margin: 0px 0 0; width: 700px; /* increased from 450px */ }#sidebar {margin-left:765px; /* increaseed from 545px */padding:20px 0 10px;width:190px;}#footer { padding: 0; margin: 0 auto; width: 980px; /* increased from 760px */ clear: both; } Adjustments via Photoshop We'll also need to use an image editing program to modify the three background images that create the rounded corners: kubrikbg-ltr.jpg, kubrickheader.jpg, and kubrickfooter.jpg. In this example, we modify kubrik-ltr.jpg (the background image for #page), a 760px image. Open up the image in Photoshop, select all, copy, create a new document (with a white or transparent background), and paste (Ctrl-A, Ctrl-C, Ctrl-N, Ctrl-V). Increase the canvas size (Image | Canvas Size) to 980px, keeping the image centered on the left-hand side by clicking on the left-pointing arrow. Select one half of the image with the Rectangular Marquee Tool, cut and paste. Use the Move Tool to drag the new layer to the right-hand side of the canvas. In this case, it does not matter if you can see the transparent background or if your selection was exactly one half the image. Since the middle of the image is simply a white background, we are really only concerned with the borders on the left and right. The following screenshot shows the background image cut in half and moved over: Save for Web and Devices, exporting as a jpg. Then, replace the existing kubrikbgltr.jpg with your modified version via FTP. The steps are similar for both kubrickheader.jpg and kubrickfooter.jpg. Increase the canvas size and copy/paste from the existing image to increase the image size without stretching or distortion. The only difference is that you need to copy and paste different parts of the image in order to preserve the background gradient and/or top and bottom borders. In order to complete our theme customization, the width of .widecolumn will need to be increased from 450px to 700px (and the 150px margin should be converted to a 45px margin, the same as .narrowcolumn). Also, the kubrikwide.jpg background image will need to be modified with an image editing program to increase the size from 760px to 980px. Then, the individual post view will look as good as the homepage. By following the same steps as above, you should now be prepared to make this final customization yourself. Our next goal is to increase the sizes of the sidebar fonts. Firebug helps us to pinpoint the relevant CSS. #sidebar h2 has a font-size of 1.2em (around line 123 of style.css). Let's change this to 1.75em. #sidebar has font-size of 1em. Let's increase this to 1.25em. To use a graphic in the header, open up kubrickheader.jpg in a new Photoshop document. Use the magic wand tool to select and delete the blue gradient with rounded corners. Now, use the rounded rectangle tool to insert your own custom header area. You can apply another gradient, if desired. We choose to apply a bevel and emboss texture to our grey rectangle. Then, to paste in some photos, decreasing their opacity to 50%. In a short time, we've been able to modify Kubrik by re-writing CSS and using an image-editing program. This is the most basic technique for theme modification. Here is the result:

0
0
4924

article-image-building-financial-functions-excel-2010

Packt

07 Jul 2011

5 min read

Building Financial Functions into Excel 2010

Packt

07 Jul 2011

5 min read

Excel 2010 Financials Cookbook Powerful techniques for financial organization, analysis, and presentation in Microsoft Excel Till now, in the previous articles, we have focused on manipulating data within and outside of Excel in order to prepare to make financial decisions. Now that the data has been prepared, re-arranged, or otherwise adjusted, we are able to leverage the functions within Excel to make actual decisions. Utilizing these functions and the individual scenarios, we will be able to effectively eliminate the uncertainty due to poor analysis. Since this article utilizes financial scenarios for demonstrating the use of the various functions, it is important to note that these scenarios take certain "unknowns" for granted, and makes a number of assumptions in order to minimize the complexity of the calculation. Real-world scenarios will require a greater focus on calculating and accounting for all variables. Determining standard deviation for assessing risk In the recipes mentioned so far, we have shown the importance of monitoring and analyzing frequency to determine the likelihood that an event will occur. Standard deviation will now allow for an analysis of the frequency in a different manner, or more specifically, through variance. With standard deviation, we will be able to determine the basic top and bottom thresholds of data, and plot general movement within that threshold to determine the variance within the data range. This variance will allow the calculation of risk within investments. As a financial manager, you must determine the risk associated with investing capital in order to gain a return. In this particular instance, you will invest in stock. In order to minimize loss of investment capital, you must determine the risk associated between investing between two different stocks, Stock A, and Stock B. In this recipe, we will utilize standard deviation to determine which stock, either A or B, presents a higher risk, and hence a greater risk of loss. How to do it... We will begin by entering the selling prices of Stock A and Stock B in columns A and B, respectively: Within this list of selling prices, at first glance we can see that Stock B has a higher selling price. The stock opening price and selling price over the course of 52 weeks almost always remains above that of Stock A. As an investor looking to gain a higher return, we may wish to choose Stock B based on this cursory review; however, high selling price does not negate the need for consistency. In cell C2, enter the formula =STDEV(A2:A53) and press Enter: In cell C3, enter the formula =STDEV(B2:B53) and press Enter: We can see from the calculation of standard deviation, that Stock B has a deviation range or variance of over $20, whereas Stock A's variance is just over $9: Given this information, we can determine that Stock A presents a lower risk than Stock B. If we invest in Stock A, at any given time, utilizing past performance, our average risk of loss is $9, whereas in Stock B we an average risk of $20. How it works... The function of STDEV or standard deviation in Excel utilizes the given numbers as a complete population. This means that it does not account for any other changes or unknowns. Excel will use this data set as a complete set and determine the greatest change from high to low within the numbers. This range of change is your standard deviation. Excel also includes the function STDEVP that treats the data as a selection of a larger population. This function should be sed if you are calculating standard deviation on a subset of data (for example, six months out of an entire year). If we translate these numbers into a line graph with standard deviation bars, as shown in the following screenshot for Stock A, you can see the selling prices of the stock, and how they travel within the deviation range: If we translate these numbers into a line graph with standard deviation bars, as shown in the following screenshot for Stock B, you can see the selling prices of the stock, and understand how they travel within the deviation range: The bars shown on the graphs represent the standard deviation as calculated by Excel. We can see visually that not only does Stock B represent a greater risk with the larger deviation, but also many of the stock prices fall below our deviation, representing further risk to the investor. With funds to invest as a finance manager, Stock A represents a lower risk investment. There's more... Standard deviation can be calculated for almost any data set. For this recipe, we calculated deviation over the course of one year; however, if we expand the data to include multiple years we can further determine long-term risk. While Stock B represents high short-term risk, in the long-term analysis, Stock B may present as a less risky investment. Combining standard deviation with a five-number summary analysis, we can further gain risk and performance information.

0
0
4923

article-image-chatgpt-for-information-retrieval-and-competitive-intelligence

Valentina Alto

02 Jun 2023

2 min read

ChatGPT for Information Retrieval and Competitive Intelligence

Valentina Alto

02 Jun 2023

2 min read

This article is an excerpt from the book Modern Generative AI with ChatGPT and OpenAI Models, by Valentina Alto. This book will provide you with insights into the inner workings of the LLMs and guide you through creating your own language models. Information retrieval and competitive intelligence are fields where ChatGPT is a game-changer. It can retrieve information from its knowledge base and reframe it in an original way.One example is using ChatGPT as a search engine to provide summaries, reviews, and recommendations for books: Alternatively, we could ask for some suggestions for a new book we wish to read based on our preferences: If we design the prompt with specific information, ChatGPT can serve as a tool for pointing us towards the right references for research or studies. For example, asking ChatGPT to list relevant references for feedforward neural networks: ChatGPT can also be useful for competitive intelligence. For example, generating a list of existing books with similar content: Or providing advice on how to be competitive in the market: ChatGPT can also suggest improvements regarding book content to make it stand out: Overall, ChatGPT can be a valuable assistant for information retrieval and competitive intelligence. However, it's important to remember that the knowledge base cutoff is 2021, so real-time information may not be available. About the AuthorValentina Alto graduated in 2021 in Data Science. Since 2020 she has been working in Microsoft as Azure Solution Specialist and, since 2022, she focused on Data&AI workloads within the Manufacturing and Pharmaceutical industry. She has been working on customers’ projects closely with system integrators to deploy cloud architecture with a focus on datalake house and DWH, data integration and engineering, IoT and real-time analytics, Azure Machine Learning, Azure cognitive services (including Azure OpenAI Service), and PowerBI for dashboarding. She holds a BSc in Finance and an MSc degree in Data Science from Bocconi University, Milan, Italy. Since her academic journey she has been writing Tech articles about Statistics, Machine Learning, Deep Learning and AI on various publications. She has also written a book about the fundamentals of Machine Learning with Python. You can connect with Valentina on:LinkedInMedium

0
0
4923

Packt

20 Feb 2015

13 min read

The Spark Programming Model

Packt

20 Feb 2015

13 min read

In this article by Nick Pentreath, author of the book Machine Learning with Spark, we will delve into a high-level overview of Spark's design, we will introduce the SparkContext object as well as the Spark shell, which we will use to interactively explore the basics of the Spark programming model. While this section provides a brief overview and examples of using Spark, we recommend that you read the following documentation to get a detailed understanding:Spark Quick Start: http://spark.apache.org/docs/latest/quick-start.htmlSpark Programming guide, which covers Scala, Java, and Python: http://spark.apache.org/docs/latest/programming-guide.html (For more resources related to this topic, see here.) SparkContext and SparkConf The starting point of writing any Spark program is SparkContext (or JavaSparkContext in Java). SparkContext is initialized with an instance of a SparkConf object, which contains various Spark cluster-configuration settings (for example, the URL of the master node). Once initialized, we will use the various methods found in the SparkContext object to create and manipulate distributed datasets and shared variables. The Spark shell (in both Scala and Python, which is unfortunately not supported in Java) takes care of this context initialization for us, but the following lines of code show an example of creating a context running in the local mode in Scala: val conf = new SparkConf().setAppName("Test Spark App").setMaster("local[4]")val sc = new SparkContext(conf) This creates a context running in the local mode with four threads, with the name of the application set to Test Spark App. If we wish to use default configuration values, we could also call the following simple constructor for our SparkContext object, which works in exactly the same way: val sc = new SparkContext("local[4]", "Test Spark App") The Spark shell Spark supports writing programs interactively using either the Scala or Python REPL (that is, the Read-Eval-Print-Loop, or interactive shell). The shell provides instant feedback as we enter code, as this code is immediately evaluated. In the Scala shell, the return result and type is also displayed after a piece of code is run. To use the Spark shell with Scala, simply run ./bin/spark-shell from the Spark base directory. This will launch the Scala shell and initialize SparkContext, which is available to us as the Scala value, sc. Your console output should look similar to the following screenshot: To use the Python shell with Spark, simply run the ./bin/pyspark command. Like the Scala shell, the Python SparkContext object should be available as the Python variable sc. You should see an output similar to the one shown in this screenshot: Resilient Distributed Datasets The core of Spark is a concept called the Resilient Distributed Dataset (RDD). An RDD is a collection of "records" (strictly speaking, objects of some type) that is distributed or partitioned across many nodes in a cluster (for the purposes of the Spark local mode, the single multithreaded process can be thought of in the same way). An RDD in Spark is fault-tolerant; this means that if a given node or task fails (for some reason other than erroneous user code, such as hardware failure, loss of communication, and so on), the RDD can be reconstructed automatically on the remaining nodes and the job will still complete. Creating RDDs RDDs can be created from existing collections, for example, in the Scala Spark shell that you launched earlier: val collection = List("a", "b", "c", "d", "e")val rddFromCollection = sc.parallelize(collection) RDDs can also be created from Hadoop-based input sources, including the local filesystem, HDFS, and Amazon S3. A Hadoop-based RDD can utilize any input format that implements the Hadoop InputFormat interface, including text files, other standard Hadoop formats, HBase, Cassandra, and many more. The following code is an example of creating an RDD from a text file located on the local filesystem: val rddFromTextFile = sc.textFile("LICENSE") The preceding textFile method returns an RDD where each record is a String object that represents one line of the text file. Spark operations Once we have created an RDD, we have a distributed collection of records that we can manipulate. In Spark's programming model, operations are split into transformations and actions. Generally speaking, a transformation operation applies some function to all the records in the dataset, changing the records in some way. An action typically runs some computation or aggregation operation and returns the result to the driver program where SparkContext is running. Spark operations are functional in style. For programmers familiar with functional programming in Scala or Python, these operations should seem natural. For those without experience in functional programming, don't worry; the Spark API is relatively easy to learn. One of the most common transformations that you will use in Spark programs is the map operator. This applies a function to each record of an RDD, thus mapping the input to some new output. For example, the following code fragment takes the RDD we created from a local text file and applies the size function to each record in the RDD. Remember that we created an RDD of Strings. Using map, we can transform each string to an integer, thus returning an RDD of Ints: val intsFromStringsRDD = rddFromTextFile.map(line => line.size) You should see output similar to the following line in your shell; this indicates the type of the RDD: intsFromStringsRDD: org.apache.spark.rdd.RDD[Int] = MappedRDD[5] at map at <console>:14 In the preceding code, we saw the => syntax used. This is the Scala syntax for an anonymous function, which is a function that is not a named method (that is, one defined using the def keyword in Scala or Python, for example). The line => line.size syntax means that we are applying a function where the input variable is to the left of the => operator, and the output is the result of the code to the right of the => operator. In this case, the input is line, and the output is the result of calling line.size. In Scala, this function that maps a string to an integer is expressed as String => Int.This syntax saves us from having to separately define functions every time we use methods such as map; this is useful when the function is simple and will only be used once, as in this example. Now, we can apply a common action operation, count, to return the number of records in our RDD: intsFromStringsRDD.count The result should look something like the following console output: 14/01/29 23:28:28 INFO SparkContext: Starting job: count at <console>:17...14/01/29 23:28:28 INFO SparkContext: Job finished: count at <console>:17, took 0.019227 sres4: Long = 398 Perhaps we want to find the average length of each line in this text file. We can first use the sum function to add up all the lengths of all the records and then divide the sum by the number of records: val sumOfRecords = intsFromStringsRDD.sumval numRecords = intsFromStringsRDD.countval aveLengthOfRecord = sumOfRecords / numRecords The result will be as follows: aveLengthOfRecord: Double = 52.06030150753769 Spark operations, in most cases, return a new RDD, with the exception of most actions, which return the result of a computation (such as Long for count and Double for sum in the preceding example). This means that we can naturally chain together operations to make our program flow more concise and expressive. For example, the same result as the one in the preceding line of code can be achieved using the following code: val aveLengthOfRecordChained = rddFromTextFile.map(line => line.size).sum / rddFromTextFile.count An important point to note is that Spark transformations are lazy. That is, invoking a transformation on an RDD does not immediately trigger a computation. Instead, transformations are chained together and are effectively only computed when an action is called. This allows Spark to be more efficient by only returning results to the driver when necessary so that the majority of operations are performed in parallel on the cluster. This means that if your Spark program never uses an action operation, it will never trigger an actual computation, and you will not get any results. For example, the following code will simply return a new RDD that represents the chain of transformations: val transformedRDD = rddFromTextFile.map(line => line.size).filter(size => size > 10).map(size => size * 2) This returns the following result in the console: transformedRDD: org.apache.spark.rdd.RDD[Int] = MappedRDD[8] at map at <console>:14 Notice that no actual computation happens and no result is returned. If we now call an action, such as sum, on the resulting RDD, the computation will be triggered: val computation = transformedRDD.sum You will now see that a Spark job is run, and it results in the following console output: ...14/11/27 21:48:21 INFO SparkContext: Job finished: sum at <console>:16, took 0.193513 scomputation: Double = 60468.0 The complete list of transformations and actions possible on RDDs as well as a set of more detailed examples are available in the Spark programming guide (located at http://spark.apache.org/docs/latest/programming-guide.html#rdd-operations), and the API documentation (the Scala API documentation) is located at http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD). Caching RDDs One of the most powerful features of Spark is the ability to cache data in memory across a cluster. This is achieved through use of the cache method on an RDD: rddFromTextFile.cache Calling cache on an RDD tells Spark that the RDD should be kept in memory. The first time an action is called on the RDD that initiates a computation, the data is read from its source and put into memory. Hence, the first time such an operation is called, the time it takes to run the task is partly dependent on the time it takes to read the data from the input source. However, when the data is accessed the next time (for example, in subsequent queries in analytics or iterations in a machine learning model), the data can be read directly from memory, thus avoiding expensive I/O operations and speeding up the computation, in many cases, by a significant factor. If we now call the count or sum function on our cached RDD, we will see that the RDD is loaded into memory: val aveLengthOfRecordChained = rddFromTextFile.map(line => line.size).sum / rddFromTextFile.count Indeed, in the following output, we see that the dataset was cached in memory on the first call, taking up approximately 62 KB and leaving us with around 270 MB of memory free: ...14/01/30 06:59:27 INFO MemoryStore: ensureFreeSpace(63454) called with curMem=32960, maxMem=31138775014/01/30 06:59:27 INFO MemoryStore: Block rdd_2_0 stored as values to memory (estimated size 62.0 KB, free 296.9 MB)14/01/30 06:59:27 INFO BlockManagerMasterActor$BlockManagerInfo:Added rdd_2_0 in memory on 10.0.0.3:55089 (size: 62.0 KB, free: 296.9 MB)... Now, we will call the same function again: val aveLengthOfRecordChainedFromCached = rddFromTextFile.map(line => line.size).sum / rddFromTextFile.count We will see from the console output that the cached data is read directly from memory: ...14/01/30 06:59:34 INFO BlockManager: Found block rdd_2_0 locally... Spark also allows more fine-grained control over caching behavior. You can use the persist method to specify what approach Spark uses to cache data. More information on RDD caching can be found here: http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence. Broadcast variables and accumulators Another core feature of Spark is the ability to create two special types of variables: broadcast variables and accumulators. A broadcast variable is a read-only variable that is made available from the driver program that runs the SparkContext object to the nodes that will execute the computation. This is very useful in applications that need to make the same data available to the worker nodes in an efficient manner, such as machine learning algorithms. Spark makes creating broadcast variables as simple as calling a method on SparkContext as follows: val broadcastAList = sc.broadcast(List("a", "b", "c", "d", "e")) The console output shows that the broadcast variable was stored in memory, taking up approximately 488 bytes, and it also shows that we still have 270 MB available to us: 14/01/30 07:13:32 INFO MemoryStore: ensureFreeSpace(488) called with curMem=96414, maxMem=31138775014/01/30 07:13:32 INFO MemoryStore: Block broadcast_1 stored as values to memory(estimated size 488.0 B, free 296.9 MB)broadCastAList: org.apache.spark.broadcast.Broadcast[List[String]] = Broadcast(1) A broadcast variable can be accessed from nodes other than the driver program that created it (that is, the worker nodes) by calling value on the variable: sc.parallelize(List("1", "2", "3")).map(x => broadcastAList.value ++ x).collect This code creates a new RDD with three records from a collection (in this case, a Scala List) of ("1", "2", "3"). In the map function, it returns a new collection with the relevant record from our new RDD appended to the broadcastAList that is our broadcast variable. Notice that we used the collect method in the preceding code. This is a Spark action that returns the entire RDD to the driver as a Scala (or Python or Java) collection. We will often use collect when we wish to apply further processing to our results locally within the driver program. Note that collect should generally only be used in cases where we really want to return the full result set to the driver and perform further processing. If we try to call collect on a very large dataset, we might run out of memory on the driver and crash our program.It is preferable to perform as much heavy-duty processing on our Spark cluster as possible, preventing the driver from becoming a bottleneck. In many cases, however, collecting results to the driver is necessary, such as during iterations in many machine learning models. On inspecting the result, we will see that for each of the three records in our new RDD, we now have a record that is our original broadcasted List, with the new element appended to it (that is, there is now either "1", "2", or "3" at the end): ...14/01/31 10:15:39 INFO SparkContext: Job finished: collect at <console>:15, took 0.025806 sres6: Array[List[Any]] = Array(List(a, b, c, d, e, 1), List(a, b, c, d, e, 2), List(a, b, c, d, e, 3)) An accumulator is also a variable that is broadcasted to the worker nodes. The key difference between a broadcast variable and an accumulator is that while the broadcast variable is read-only, the accumulator can be added to. There are limitations to this, that is, in particular, the addition must be an associative operation so that the global accumulated value can be correctly computed in parallel and returned to the driver program. Each worker node can only access and add to its own local accumulator value, and only the driver program can access the global value. Accumulators are also accessed within the Spark code using the value method. For more details on broadcast variables and accumulators, see the Shared Variables section of the Spark Programming Guide: http://spark.apache.org/docs/latest/programming-guide.html#shared-variables. Summary In this article, we learned the basics of Spark's programming model and API using the interactive Scala console. Resources for Article: Further resources on this subject: Ridge Regression [article] Clustering with K-Means [article] Machine Learning Examples Applicable to Businesses [article]

0
0
4922

article-image-leveraging-python-world-big-data

Packt

07 Sep 2015

26 min read

Leveraging Python in the World of Big Data

Packt

07 Sep 2015

26 min read

0
0
4921

How-To Tutorials

article-image-building-queries-visually-mysql-query-browser

Packt

23 Oct 2009

3 min read

Building Queries Visually in MySQL Query Browser

Packt

23 Oct 2009

3 min read

MySQL Query Browser, one of the open source MySQL GUI tools from MySQL AB, is used for building MySQL database queries visually. In MySQL Query Browser, you build database queries using just your mouse—click, drag and drop! MySQL Query Browser has plenty of visual query building functions and features. This article shows two examples, building Join and Master-detail queries. These examples will demonstrate some of these functions and features. Join Query A pop-up query toolbar will appear when you drag a table or column from the Object Browser’s Schemata tab to the Query Area. You drop the table or column on the pop-up query toolbar’s button to build your query. The following example demonstrates the use of the pop-up query toolbar to build a join query that involves three tables and two types of join (equi and left outer). Drag and drop the product table from the Schemata to Add Table(s) button. A SELECT query on the product table is written in the Query Area. Drag and drop the item table from Schemata to the JOIN Table(s) button on the Pop-up Query Toolbar. The two tables are joined on the foreign-key, product_code. If no foreign-key relationship exists, the drag and drop won’t have any effect. Drag and drop the order table from Schemata to the LEFT OUTER JOIN button on the Pop-up Query Toolbar. Maximize query area by pressing F11. You get a larger query area, and your lines are sequentially numbered (for easier identification). Move the FROM clause to its next line, by putting your cursor just before the FROM word and press Enter. Similarly, move the ON clause to its next line. Now, you can see all lines completely, and that the item table is left join to the order table on their foreign-key relationship column, the order_number column. As of now our query is SELECT *, i.e. selecting all columns from all tables. Let’s now select the columns we’d like to show at the query’s output. For example, drag and drop the order_number from the item table, product_name from the product table, and then quantity from the item table. (If necessary, expand the table folders to see their columns). The sequence of the selecting the columns is reflected in the SELECT clause (from left to right). Note that you can’t select column from the left join of the order table (if you try, nothing will happen) Next, add an additional condition. Drag and drop the amount column on the WHERE button in the Pop-up Query Toolbar. The column is added, with an AND, in the WHERE clause of the query. Type in its condition value, for example, > 1000. To finalize our query, drag and drop product_name on the ORDER button, and then, order_number (from item table, not order table) on the GROUP button. You’ll see that the GROUP BY and ORDER clauses are ordered correctly, i.e. the GROUP BY clause first before the ORDER BY, regardless of your drag & drop sequence. To test your query, click the Execute button. Your query should run without any error, and display its output in the query area (below the query).

0
0
4920

article-image-flash-multiplayer-virtual-world-setting-smartfoxserver-third-party-http-and-database-s

Packt

17 Aug 2010

7 min read

Flash Multiplayer Virtual World: Setting up SmartFoxServer with Third-party HTTP and Database Server

Packt

17 Aug 2010

7 min read

0
0
4918

Packt

21 May 2014

19 min read

An Introduction to the Terminal

Packt

21 May 2014

19 min read

0
0
4917

Packt

08 Jul 2015

23 min read

Why Meteor Rocks!

Packt

08 Jul 2015

23 min read

In this article by Isaac Strack, the author of the book, Getting Started with Meteor.js JavaScript Framework - Second Edition, has discussed some really amazing features of Meteor that has contributed a lot to the success of Meteor. Meteor is a disruptive (in a good way!) technology. It enables a new type of web application that is faster, easier to build, and takes advantage of modern techniques, such as Full Stack Reactivity, Latency Compensation, and Data On The Wire. (For more resources related to this topic, see here.) This article explains how web applications have changed over time, why that matters, and how Meteor specifically enables modern web apps through the above-mentioned techniques. By the end of this article, you will have learned: What a modern web application is What Data On The Wire means and how it's different How Latency Compensation can improve your app experience Templates and Reactivity—programming the reactive way! Modern web applications Our world is changing. With continual advancements in displays, computing, and storage capacities, things that weren't even possible a few years ago are now not only possible but are critical to the success of a good application. The Web in particular has undergone significant change. The origin of the web app (client/server) From the beginning, web servers and clients have mimicked the dumb terminal approach to computing where a server with significantly more processing power than a client will perform operations on data (writing records to a database, math calculations, text searches, and so on), transform the data and render it (turn a database record into HTML and so on), and then serve the result to the client, where it is displayed for the user. In other words, the server does all the work, and the client acts as more of a display, or a dumb terminal. This design pattern for this is called…wait for it…the client/server design pattern. The diagrammatic representation of the client-server architecture is shown in the following diagram: This design pattern, borrowed from the dumb terminals and mainframes of the 60s and 70s, was the beginning of the Web as we know it and has continued to be the design pattern that we think of when we think of the Internet. The rise of the machines (MVC) Before the Web (and ever since), desktops were able to run a program such as a spreadsheet or a word processor without needing to talk to a server. This type of application could do everything it needed to, right there on the big and beefy desktop machine. During the early 90s, desktop computers got even more beefy. At the same time, the Web was coming alive, and people started having the idea that a hybrid between the beefy desktop application (a fat app) and the connected client/server application (a thin app) would produce the best of both worlds. This kind of hybrid app—quite the opposite of a dumb terminal—was called a smart app. Many business-oriented smart apps were created, but the easiest examples can be found in computer games. Massively Multiplayer Online games (MMOs), first-person shooters, and real-time strategies are smart apps where information (the data model) is passed between machines through a server. The client in this case does a lot more than just display the information. It performs most of the processing (or acts as a controller) and transforms the data into something to be displayed (the view). This design pattern is simple but very effective. It's called the Model View Controller (MVC) pattern. The model is essentially the data for an application. In the context of a smart app, the model is provided by a server. The client makes requests to the server for data and stores that data as the model. Once the client has a model, it performs actions/logic on that data and then prepares it to be displayed on the screen. This part of the application (talking to the server, modifying the data model, and preparing data for display) is called the controller. The controller sends commands to the view, which displays the information. The view also reports back to the controller when something happens on the screen (a button click, for example). The controller receives the feedback, performs the logic, and updates the model. Lather, rinse, repeat! Since web browsers were built to be "dumb clients", the idea of using a browser as a smart app back then was out of question. Instead, smart apps were built on frameworks such as Microsoft .NET, Java, or Macromedia (now Adobe) Flash. As long as you had the framework installed, you could visit a web page to download/run a smart app. Sometimes, you could run the app inside the browser, and sometimes, you would download it first, but either way, you were running a new type of web app where the client application could talk to the server and share the processing workload. The browser grows up Beginning in the early 2000s, a new twist on the MVC pattern started to emerge. Developers started to realize that, for connected/enterprise "smart apps", there was actually a nested MVC pattern. The server code (controller) was performing business logic against the database (model) through the use of business objects and then sending processed/rendered data to the client application (a "view"). The client was receiving this data from the server and treating it as its own personal "model". The client would then act as a proper controller, perform logic, and send the information to the view to be displayed on the screen. So, the "view" for the server MVC was the "model" for the client MVC. As browser technologies (HTML and JavaScript) matured, it became possible to create smart apps that used the Nested MVC design pattern directly inside an HTML web page. This pattern makes it possible to run a full-sized application using only JavaScript. There is no longer any need to download multiple frameworks or separate apps. You can now get the same functionality from visiting a URL as you could previously by buying a packaged product. A giant Meteor appears! Meteor takes modern web apps to the next level. It enhances and builds upon the nested MVC design pattern by implementing three key features: Data On The Wire through the Distributed Data Protocol (DDP) Latency Compensation with Mini Databases Full Stack Reactivity with Blaze and Tracker Let's walk through these concepts to see why they're valuable, and then, we'll apply them to our Lending Library application. Data On The Wire The concept of Data On The Wire is very simple and in tune with the nested MVC pattern; instead of having a server process everything, render content, and then send HTML across the wire, why not just send the data across the wire and let the client decide what to do with it? This concept is implemented in Meteor using the Distributed Data Protocol, or DDP. DDP has a JSON-based syntax and sends messages similar to the REST protocol. Additions, deletions, and changes are all sent across the wire and handled by the receiving service/client/device. Since DDP uses WebSockets rather than HTTP, the data can be pushed whenever changes occur. But the true beauty of DDP lies in the generic nature of the communication. It doesn't matter what kind of system sends or receives data over DDP—it can be a server, a web service, or a client app—they all use the same protocol to communicate. This means that none of the systems know (or care) whether the other systems are clients or servers. With the exception of the browser, any system can be a server, and without exception, any server can act as a client. All the traffic looks the same and can be treated in a similar manner. In other words, the traditional concept of having a single server for a single client goes away. You can hook multiple servers together, each serving a discreet purpose, or you can have a client connect to multiple servers, interacting with each one differently. Think about what you can do with a system like that: Imagine multiple systems all coming together to create, for example, a health monitoring system. Some systems are built with C++, some with Arduino, some with…well, we don't really care. They all speak DDP. They send and receive data on the wire and decide individually what to do with that data. Suddenly, very difficult and complex problems become much easier to solve. DDP has been implemented in pretty much every major programming language, allowing you true freedom to architect an enterprise application. Latency Compensation Meteor employs a very clever technique called Mini Databases. A mini database is a "lite" version of a normal database that lives in the memory on the client side. Instead of the client sending requests to a server, it can make changes directly to the mini database on the client. This mini database then automatically syncs with the server (using DDP of course), which has the actual database. Out of the box, Meteor uses MongoDB and Minimongo: When the client notices a change, it first executes that change against the client-side Minimongo instance. The client then goes on its merry way and lets the Minimongo handlers communicate with the server over DDP. If the server accepts the change, it then sends out a "changed" message to all connected clients, including the one that made the change. If the server rejects the change, or if a newer change has come in from a different client, the Minimongo instance on the client is corrected, and any affected UI elements are updated as a result. All of this doesn't seem very groundbreaking, but here's the thing—it's all asynchronous, and it's done using DDP. This means that the client doesn't have to wait until it gets a response back from the server. It can immediately update the UI based on what is in the Minimongo instance. What if the change was illegal or other changes have come in from the server? This is not a problem as the client is updated as soon as it gets word from the server. Now, what if you have a slow internet connection or your connection goes down temporarily? In a normal client/server environment, you couldn't make any changes, or the screen would take a while to refresh while the client waits for permission from the server. However, Meteor compensates for this. Since the changes are immediately sent to Minimongo, the UI gets updated immediately. So, if your connection is down, it won't cause a problem: All the changes you make are reflected in your UI, based on the data in Minimongo. When your connection comes back, all the queued changes are sent to the server, and the server will send authorized changes to the client. Basically, Meteor lets the client take things on faith. If there's a problem, the data coming in from the server will fix it, but for the most part, the changes you make will be ratified and broadcast by the server immediately. Coding this type of behavior in Meteor is crazy easy (although you can make it more complex and therefore more controlled if you like): lists = new Mongo.Collection("lists"); This one line declares that there is a lists data model. Both the client and server will have a version of it, but they treat their versions differently. The client will subscribe to changes announced by the server and update its model accordingly. The server will publish changes, listen to change requests from the client, and update its model (its master copy) based on these change requests. Wow, one line of code that does all that! Of course, there is more to it, but that's beyond the scope of this article, so we'll move on. To better understand Meteor data synchronization, see the Publish and subscribe section of the meteor documentation at http://docs.meteor.com/#/full/meteor_publish. Full Stack Reactivity Reactivity is integral to every part of Meteor. On the client side, Meteor has the Blaze library, which uses HTML templates and JavaScript helpers to detect changes and render the data in your UI. Whenever there is a change, the helpers re-run themselves and add, delete, and change UI elements, as appropriate, based on the structure found in the templates. These functions that re-run themselves are called reactive computations. On both the client and the server, Meteor also offers reactive computations without having to use a UI. Called the Tracker library, these helpers also detect any data changes and rerun themselves accordingly. Because both the client and the server are JavaScript-based, you can use the Tracker library anywhere. This is defined as isomorphic or full stack reactivity because you're using the same language (and in some cases the same code!) on both the client and the server. Re-running functions on data changes has a really amazing benefit for you, the programmer: you get to write code declaratively, and Meteor takes care of the reactive part automatically. Just tell Meteor how you want the data displayed, and Meteor will manage any and all data changes. This declarative style is usually accomplished through the use of templates. Templates work their magic through the use of view data bindings. Without getting too deep, a view data binding is a shared piece of data that will be displayed differently if the data changes. Let's look at a very simple data binding—one for which you don't technically need Meteor—to illustrate the point. Let's perform the following set of steps to understand the concept in detail: In LendLib.html, you will see an HTML-based template expression: <div id="categories-container"> {{> categories}} </div> This expression is a placeholder for an HTML template that is found just below it: <template name="categories"> <h2 class="title">my stuff</h2>.. So, {{> categories}} is basically saying, "put whatever is in the template categories right here." And the HTML template with the matching name is providing that. If you want to see how data changes will affect the display, change the h2 tag to an h4 tag and save the change: <template name="categories"> <h4 class="title">my stuff</h4> You'll see the effect in your browser. (my stuff will become itsy bitsy.) That's view data binding at work. Change the h4 tag back to an h2 tag and save the change, unless you like the change. No judgment here...okay, maybe a little bit of judgment. It's ugly, and tiny, and hard to read. Seriously, you should change it back before someone sees it and makes fun of you! Alright, now that we know what a view data binding is, let's see how Meteor uses it. Inside the categories template in LendLib.html, you'll find even more templates: <template name="categories"> <h4 class="title">my stuff</h4> <div id="categories" class="btn-group"> {{#each lists}} <div class="category btn btn-primary"> {{Category}} </div> {{/each}} </div> </template> Meteor uses a template language called Spacebars to provide instructions inside templates. These instructions are called expressions, and they let us do things like add HTML for every record in a collection, insert the values of properties, and control layouts with conditional statements. The first Spacebars expression is part of a pair and is a for-each statement. {{#each lists}} tells the interpreter to perform the action below it (in this case, it tells it to make a new div element) for each item in the lists collection. lists is the piece of data, and {{#each lists}} is the placeholder. Now, inside the {{#each lists}} expression, there is one more Spacebars expression: {{Category}} Since the expression is found inside the #each expression, it is considered a property. That is to say that {{Category}} is the same as saying this.Category, where this is the current item in the for-each loop. So, the placeholder is saying, "add the value of the Category property for the current record." Now, if we look in LendLib.js, we will see the reactive values (called reactive contexts) behind the templates: lists : function () { return lists.find(... Here, Meteor is declaring a template helper named lists. The helper, lists, is found inside the template helpers belonging to categories. The lists helper happens to be a function that returns all the data in the lists collection, which we defined previously. Remember this line? lists = new Mongo.Collection("lists"); This lists collection is returned by the above-mentioned helper. When there is a change to the lists collection, the helper gets updated and the template's placeholder is changed as well. Let's see this in action. On your web page pointing to http://localhost:3000, open the browser console and enter the following line: > lists.insert({Category:"Games"}); This will update the lists data collection. The template will see this change and update the HTML code/placeholder. Each of the placeholders will run one additional time for the new entry in lists, and you'll see the following screen: When the lists collection was updated, the Template.categories.lists helper detected the change and reran itself (recomputed). This changed the contents of the code meant to be displayed in the {{> categories}} placeholder. Since the contents were changed, the affected part of the template was re-run. Now, take a minute here and think about how little we had to do to get this reactive computation to run: we simply created a template, instructing Blaze how we want the lists data collection to be displayed, and we put in a placeholder. This is simple, declarative programming at its finest! Let's create some templates We'll now see a real-life example of reactive computations and work on our Lending Library at the same time. Adding categories through the console has been a fun exercise, but it's not a long-term solution. Let's make it so that we can do that on the page instead as follows: Open LendLib.html and add a new button just before the {{#each lists}} expression: <div id="categories" class="btn-group"> <div class="category btn btn-primary" id="btnNewCat"> <span class="glyphicon glyphicon-plus"></span> </div> {{#each lists}} This will add a plus button on the page, as follows: Now, we want to change the button into a text field when we click on it. So let's build that functionality by using the reactive pattern. We will make it based on the value of a variable in the template. Add the following {{#if…else}} conditionals around our new button: <div id="categories" class="btn-group"> {{#if new_cat}} {{else}} <div class="category btn btn-primary" id="btnNewCat"> <span class="glyphicon glyphicon-plus"></span> </div> {{/if}} {{#each lists}} The first line, {{#if new_cat}}, checks to see whether new_cat is true or false. If it's false, the {{else}} section is triggered, and it means that we haven't yet indicated that we want to add a new category, so we should be displaying the button with the plus sign. In this case, since we haven't defined it yet, new_cat will always be false, and so the display won't change. Now, let's add the HTML code to display when we want to add a new category: {{#if new_cat}} <div class="category form-group" id="newCat"> <input type="text" id="add-category" class="form-control" value="" /> </div> {{else}} ... {{/if}} There's the smallest bit of CSS we need to take care of as well. Open ~/Documents/Meteor/LendLib/LendLib.css and add the following declaration: #newCat { max-width: 250px; } Okay, so now we've added an input field, which will show up when new_cat is true. The input field won't show up unless it is set to true; so, for now, it's hidden. So, how do we make new_cat equal to true? Save your changes if you haven't already done so, and open LendLib.js. First, we'll declare a Session variable, just below our Meteor.isClient check function, at the top of the file: if (Meteor.isClient) { // We are declaring the 'adding_category' flag Session.set('adding_category', false); Now, we'll declare the new template helper new_cat, which will be a function returning the value of adding_category. We need to place the new helper in the Template.categories.helpers() method, just below the declaration for lists: Template.categories.helpers({ lists: function () { ... }, new_cat: function(){ //returns true if adding_category has been assigned //a value of true return Session.equals('adding_category',true); } }); Note the comma (,) on the line above new_cat. It's important that you add that comma, or your code will not execute. Save these changes, and you'll see that nothing has changed. Ta-da! In reality, this is exactly as it should be because we haven't done anything to change the value of adding_category yet. Let's do this now: First, we'll declare our click event handler, which will change the value in our Session variable. To do this, add the following highlighted code just below the Template.categories.helpers() block: Template.categories.helpers({ ... }); Template.categories.events({ 'click #btnNewCat': function (e, t) { Session.set('adding_category', true); Tracker.flush(); focusText(t.find("#add-category")); } }); Now, let's take a look at the following line of code: Template.categories.events({ This line declares that events will be found in the category template. Now, let's take a look at the next line: 'click #btnNewCat': function (e, t) { This tells us that we're looking for a click event on the HTML element with an id="btnNewCat" statement (which we already created in LendLib.html). Session.set('adding_category', true); Tracker.flush(); focusText(t.find("#add-category")); Next, we set the Session variable, adding_category = true, flush the DOM (to clear up anything wonky), and then set the focus onto the input box with the id="add-category" expression. There is one last thing to do, and that is to quickly add the focusText(). helper function. To do this, just before the closing tag for the if (Meteor.isClient) function, add the following code: /////Generic Helper Functions///// //this function puts our cursor where it needs to be. function focusText(i) { i.focus(); i.select(); }; } //<------closing bracket for if(Meteor.isClient){} Now, when you save the changes and click on the plus button, you will see the input box: Fancy! However, it's still not useful, and we want to pause for a second and reflect on what just happened; we created a conditional template in the HTML page that will either show an input box or a plus button, depending on the value of a variable. This variable is a reactive variable, called a reactive context. This means that if we change the value of the variable (like we do with the click event handler), then the view automatically updates because the new_cat helpers function (a reactive computation) will rerun. Congratulations, you've just used Meteor's reactive programming model! To really bring this home, let's add a change to the lists collection (which is also a reactive context, remember?) and figure out a way to hide the input field when we're done. First, we need to add a listener for the keyup event. Or, to put it another way, we want to listen when the user types something in the box and hits Enter. When this happens, we want to add a category based on what the user typed. To do this, let's first declare the event handler. Just after the click handler for #btnNewCat, let's add another event handler: 'click #btnNewCat': function (e, t) { ... }, 'keyup #add-category': function (e,t){ if (e.which === 13) { var catVal = String(e.target.value || ""); if (catVal) { lists.insert({Category:catVal}); Session.set('adding_category', false); } } } We add a "," character at the end of the first click handler, and then add the keyup event handler. Now, let's check each of the lines in the preceding code: This line checks to see whether we hit the Enter/Return key. if (e.which === 13) This line of code checks to see whether the input field has any value in it: var catVal = String(e.target.value || ""); if (catVal) If it does, we want to add an entry to the lists collection: lists.insert({Category:catVal}); Then, we want to hide the input box, which we can do by simply modifying the value of adding_category: Session.set('adding_category', false); There is one more thing to add and then we'll be done. When we click away from the input box, we want to hide it and bring back the plus button. We already know how to do this reactively, so let's add a quick function that changes the value of adding_category. To do this, add one more comma after the keyup event handler and insert the following event handler: 'keyup #add-category': function (e,t){ ... }, 'focusout #add-category': function(e,t){ Session.set('adding_category',false); } Save your changes, and let's see this in action! In your web browser on http://localhost:3000, click on the plus sign, add the word Clothes, and hit Enter. Your screen should now resemble the following screenshot: Feel free to add more categories if you like. Also, experiment by clicking on the plus button, typing something in, and then clicking away from the input field. Summary In this article, you learned about the history of web applications and saw how we've moved from a traditional client/server model to a nested MVC design pattern. You learned what smart apps are, and you also saw how Meteor has taken smart apps to the next level with Data On The Wire, Latency Compensation, and Full Stack Reactivity. You saw how Meteor uses templates and helpers to automatically update content, using reactive variables and reactive computations. Lastly, you added more functionality to the Lending Library. You made a button and an input field to add categories, and you did it all using reactive programming rather than directly editing the HTML code. Resources for Article: Further resources on this subject: Building the next generation Web with Meteor [article] Quick start - creating your first application [article] Meteor.js JavaScript Framework: Why Meteor Rocks! [article]

0
0
4916

How-To Tutorials

article-image-blogger-improving-your-blog-google-analytics-and-search-engine-optimization

Packt

22 Oct 2009

7 min read

Blogger: Improving Your Blog with Google Analytics and Search Engine Optimization

Packt

22 Oct 2009

7 min read

If you've ever wondered how people find your website or how to generate more traffic, then this article tells you more about your visitors. Knowing where they come from, what posts they like, how long they stay, and other site metrics are all valuable information to have as a blogger. You would expect to pay for such a deep look into the underbelly of your blog, but Google wants to give it to you for free. Why for free? The better your site does, the more likely you are to pay for AdWords or use other Google tools. The Google Analytics online statistics application is a delicious carrot to encourage content rich sites and better ad revenue for everyone involved. You also want people to find your blog when they perform a search about your topic. The painful truth is that search engines have to find your blog first before it will show up in their results. There are thousands of new blogs being created everyday. If you want people to be able to find your blog in the increasingly crowded blogosphere, optimizing your blog for search engines will improve the odds. Improving Your Blog with Google Analytics Analytics gives you an overwhelming amount of data to use for measuring the success of your sites, and ads. Once you've had time to analyze that data, you will want to take action to improve the performance of your blog, and ads. We'll now look at how Analytics can help you make decisions about the design, and content of your site. Analyzing Navigation The Navigation section of the Content Overview report reveals how your visitors actually navigate your blog. Visitors move around a site in ways we can't predict. Seeing how they actually navigate a site and where they entered the site are powerful tools we can use to diagnose where we need to improve our blog. Exploring the Navigation Summary The Navigation Summary shows you the path people take through your site, including how they get there and where they go. We can see from the following graphical representation that our visitors entered the site through the main page of the blog most of the time. After reaching that page, over half the time, they went to other pages within the site. Entrance Paths We can see the path, the visitors take to enter our blog using the Entrance Paths report. It will show us from where they entered our site, which pages they looked at, and the last page they viewed before exiting. Visitors don't always enter by the main page of a site, especially if they find the site using search engines or trackbacks. The following screenshot displays a typical entrance path. The visitor comes to the site home page, and then goes to the full page of one of the posts. It looks like our visitors are highly attracted to the recipe posts. Georgia may want to feature more posts about recipes that tie in with her available inventory. Optimizing your Landing Page The Landing Page reports tell you where your visitors are coming from, and if they have used keywords to find you. You have a choice between viewing the source visitors used to get to your blog, or the keywords. Knowing the sources will give you guidance on the areas you should focus your marketing or advertising efforts on. Examining Entrance Sources You can quickly see how visitors are finding your site, whether through a direct link, or a search engine, locally from Blogger, or from social networking applications such as Twitter.com. In the Entrance Sources graph shown in the following screenshot, we can see that the largest among the number of people are coming to the blog using a direct link. Blogger is also responsible for a large share of our visitors, which is over 37%. There is even a visitor drawn to the blog from Twitter.com, where Georgia has an account. Discovering Entrance Keywords When visitors arrive at your site using keywords, the words they use will show up on the report. If they are using words in a pattern that do not match your site content, you may see a high bounce rate. You can use this report to redesign your landing page to better represent the purpose of your site by the words, and phrases that you use. Interpreting Click Patterns When visitors visit your site they show their attraction to links, and interactive content by clicking on them. Click Patterns are the representation of all those mouse clicks over a set time period. Using the Site Overlay reporting feature, you can visually see the mouse clicks represented in a graphical pattern. Much like collared pins stuck on a wall chart they will quickly reveal to you, which areas of your site visitors clicked on the most, and which links they avoided. Understanding Site Overlay Site Overlay shows the number of clicks for your site by laying them transparently in a graphical format on top of your site. Details with the number of clicks, and goal tracking information pop up in a little box when you hover over a click graphic with your mouse. At the top of the screen are options that control the display of the Site Overlay. Clicking the Hide Overlay link will hide the overlay from view. The Displaying drop-down list lets you choose how to view mouse Clicks on the page, or goals. The date range is the last item displayed. The graphical bars shown on top of the page content indicate where visitors clicked, and how many of them did so. You can quickly see what areas of the page interest your visitors the most. Based on the page clicks you see, you will have an idea of the content, and advertising that is most interesting to your visitors. Yes, Site Overlay will show the content areas of the page the visitors clicked on, and the advertisement areas. It will also help you see which links are tied to goals, and whether they are enticing your visitors to click. Optimizing Your Blog for Search Engines We are going to take our earlier checklists and use them as guides on where to make changes to our blog. When the changes are complete, the blog will be more attractive to search engines and visitors. We will start with changes we can make "On-site", and then progress to ways we can improve search engine results with "Off-site" improvements. Optimizing On-site The most crucial improvements we identified earlier were around the blog settings, template, and content. We will start with the easiest fixes, then dive into the template to correct validation issues. Let's begin with the settings in our Blogger blog. Seeding the Blog Title and Description with Keywords When you created your blog, did you take a moment to think about what words potential visitors were likely to type in when searching for your blog? Using keywords in the title and description of your blog gives potential visitors a preview and explanation of the topics they can expect to encounter in your blog. This information is what will also display in search results when potential visitors perform a search. Updating the Blog Title and Description It's never too late to seed your blog title and description with keywords. We will edit the blog title and description to optimize them for search engines. Login to your blog and navigate to Settings | Basic. We are going to replace the current title text with a phrase that more closely fits the blog. Type Organic Fruit for All into the Title field. Now, we are going to change the description of the blog. Type Organic Fruit Recipes, seasonal tips, and guides to healthy living into the description field. Scroll down to the bottom of the screen and click the Save Settings button. Y ou can enter up to 500 characters of descriptive text. What Just Happened? When we changed the title and description of our blog in the Basic Settings section, Blogger saved the changes and updated the template information as well. Now, when search engines crawl our blog, they will see richer descriptions of our blog in the blog title and blog description. The next optimization task is to verify that search engines can index our blog.

0
0
4913

Packt

23 Oct 2009

4 min read

Zen Gift of Education

Packt

23 Oct 2009

4 min read

Zen Gift of Education Many distributions have special releases around Christmas and New Year. I was planning to look at some of these this month like last year's Ubuntu Christmas Edition. But instead I found a release that's useful enough to maintain all year around. ZenEdu is a Live distribution that packs a whole bunch of educational tools on top of the Slackware-based light-weight and zippy Zenwalk Linux. As per Zenwalk's Wiki, ZenEdu was initiated by a user on the distro's French forum last year in December. That time the distro contained mostly French-only educational programs. This year, several members of the Zenwalk Linux community decided to release an international edition of ZenEdu. The distro is a goldmine of open source educational software and also packs a detailed user manual, which shows the developers' serious approach to do things properly. The educational apps included in the distro cover a broad range of subjects. The ZenEdu ISO is about 700 MB and includes apps that'll help users with subjects like Astronomy, Mathematics, and Chemistry. Since learning is the core idea behind the distro, it goes beyond traditional curriculum subjects and also packs tools that'll teach students the basics of programming and music. Some of the tools I particularly like are Stellarium - the popular 3D planetarium, Stardict - a multi-language dictionary, ghemical - a comprehensive computational chemistry package, Little Wizard that introduces the basics of programming to young students, and Maxima, for the manipulation of symbolic and numerical expressions, including differentiation, integration, ordinary differential equations, systems of linear equations, etc. If you want to learn music, train your ears with Solfege, and use TuxGuitar to edit and play guitar tablatures. What sets ZenEdu apart from other educational distros is that it bundles other productivity tools as well. This includes general-purpose applications like the IceWeasel web browser, IceDove email client, Pidgin for instant messaging, Kompozer for authoring web pages, and OpenOffice.org for word processing. Furthermore, the distro packs several other apps, which according to the developers, were chosen based on their usefulness to students while keeping in mind the things that might interest them. This includes a simple program to manage personal tasks and todo lists, a drawing program, a comic book viewer, a video editor, and a program to create a wide array of 3D content. However, there are dozens of free software educational tools that aren't included in this CD due to size considerations. But that's no problem. Since ZenEdu is based on Zenwalk, it too can be expanded with drag-and-drop modules. To create a new customize ZenEdu Live CD, browse and download the modules of educational apps you want and use the remastering application, isomaster to add them to your customized ZenEdu Live CD! The highlight of this distro though is the iTALC tool for teachers. iTALC, which stands for Intelligent Teaching And Learning with Computers, is a powerful cross-platform didactical tool that lets teachers view and control other computers in their network. Using iTALC teachers can see what's going on in computer labs and take snapshots, remote-control computers to support and help students, run a demo on all students' computers in real-time, send text-messages to students, cycle power and rebooting computers remotely, etc. ZenEdu has a special 'teacher' account pre-configured to run iTALC. Once logged in from that user, you can start iTALC and navigate through its interface, first adding student computers, and then controlling or monitoring them. ZenEdu's wiki page advices that if you'll be using the program regularly, you should save the 'teacher' account's iTALC directory (/home/teacher/.italc/) inside zenlive/rootcopy of the Live CD via isomaster. This will load the iTALC configuration the next time you boot the remastered Live CD. If you'll be using iTALC regularly you'd be well off installing ZenEdu on to your hard disk. Unfortunately, ZenEdu isn't installable. It's only a Live CD, and at best can be installed onto a USB Flash stick for portability. Most of the specialized distros I've played with, tend to be too specialized. They do what they are supposed to, but nothing more. ZenEdu is different in that, in a single CD, the developers have managed to squeeze a good number of educational apps as well as everyday tools. I hope members of the Zenwalk community, actively develop and maintain ZenEdu. Some more articles by Mayank Sharma: Meet the Distro guy Making a Complete yet Small Linux Distribution

0
0
4908

How-To Tutorials

article-image-how-to-win-kaggle-competition-with-apache-sparkml

Savia Lobo

27 Feb 2018

11 min read

How to win Kaggle competition with Apache SparkML

Savia Lobo

27 Feb 2018

11 min read

[box type="note" align="" class="" width=""]This article is an excerpt taken from a book Mastering Apache Spark 2.x - Second Edition written by Romeo Kienzler. The book will introduce you to Project Tungsten and Catalyst, two of the major advancements of Apache Spark 2.x.[/box] In today’s tutorial we will show how to take advantage of Apache SparkML to win a Kaggle competition. We'll use an archived competition offered by BOSCH, a German multinational engineering and electronics company, on production line performance data. The data for this competition represents measurement of parts as they move through Bosch's production line. Each part has a unique Id. The goal is to predict which part will fail quality control (represented by a 'Response' = 1). For more details on the competition data you may visit the website: https://www.kaggle.com/c/bosch-production-line-p erformance/data. Data preparation The challenge data comes in three ZIP packages but we only use two of them. One contains categorical data, one contains continuous data, and the last one contains timestamps of measurements, which we will ignore for now. If you extract the data, you'll get three large CSV files. So the first thing that we want to do is re-encode them into parquet in order to be more space-efficient: def convert(filePrefix : String) = { val basePath = "yourBasePath" var df = spark .read .option("header",true) .option("inferSchema", "true") .csv("basePath+filePrefix+".csv") df = df.repartition(1) df.write.parquet(basePath+filePrefix+".parquet") } convert("train_numeric") convert("train_date") convert("train_categorical") First, we define a function convert that just reads the .csv file and rewrites it as a .parquet file. As you can see, this saves a lot of space: Now we read the files in again as DataFrames from the parquet files : var df_numeric = spark.read.parquet(basePath+"train_numeric.parquet") var df_categorical = spark.read.parquet(basePath+"train_categorical.parquet") Here is the output of the same: This is very high-dimensional data; therefore, we will take only a subset of the columns for this illustration: df_categorical.createOrReplaceTempView("dfcat") var dfcat = spark.sql("select Id, L0_S22_F545 from dfcat") In the following picture, you can see the unique categorical values of that column: Now let's do the same with the numerical dataset: df_numeric.createOrReplaceTempView("dfnum") var dfnum = spark.sql("select Id,L0_S0_F0,L0_S0_F2,L0_S0_F4,Response from dfnum") Here is the output of the same: Finally, we rejoin these two relations: var df = dfcat.join(dfnum,"Id") df.createOrReplaceTempView("df") Then we have to do some NA treatment: var df_notnull = spark.sql(""" select Response as label, case when L0_S22_F545 is null then 'NA' else L0_S22_F545 end as L0_S22_F545, case when L0_S0_F0 is null then 0.0 else L0_S0_F0 end as L0_S0_F0, case when L0_S0_F2 is null then 0.0 else L0_S0_F2 end as L0_S0_F2, case when L0_S0_F4 is null then 0.0 else L0_S0_F4 end as L0_S0_F4 from df """) Feature engineering Now it is time to run the first transformer (which is actually an estimator). It is StringIndexer and needs to keep track of an internal mapping table between strings and indexes. Therefore, it is not a transformer but an estimator: import org.apache.spark.ml.feature.{OneHotEncoder, StringIndexer} var indexer = new StringIndexer() .setHandleInvalid("skip") .setInputCol("L0_S22_F545") .setOutputCol("L0_S22_F545Index") var indexed = indexer.fit(df_notnull).transform(df_notnull) indexed.printSchema As we can see clearly in the following image, an additional column called L0_S22_F545Index has been created: Finally, let's examine some content of the newly created column and compare it with the source column. We can clearly see how the category string gets transformed into a float index: Now we want to apply OneHotEncoder, which is a transformer, in order to generate better features for our machine learning model: var encoder = new OneHotEncoder() .setInputCol("L0_S22_F545Index") .setOutputCol("L0_S22_F545Vec") var encoded = encoder.transform(indexed) As you can see in the following figure, the newly created column L0_S22_F545Vec contains org.apache.spark.ml.linalg.SparseVector objects, which is a compressed representation of a sparse vector: Note: Sparse vector representations: The OneHotEncoder, as many other algorithms, returns a sparse vector of the org.apache.spark.ml.linalg.SparseVector type as, according to the definition, only one element of the vector can be one, the rest has to remain zero. This gives a lot of opportunity for compression as only the position of the elements that are non-zero has to be known. Apache Spark uses a sparse vector representation in the following format: (l,[p],[v]), where l stands for length of the vector, p for position (this can also be an array of positions), and v for the actual values (this can be an array of values). So if we get (13,[10],[1.0]), as in our earlier example, the actual sparse vector looks like this: (0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0). So now that we are done with our feature engineering, we want to create one overall sparse vector containing all the necessary columns for our machine learner. This is done using VectorAssembler: import org.apache.spark.ml.feature.VectorAssembler import org.apache.spark.ml.linalg.Vectors var vectorAssembler = new VectorAssembler() .setInputCols(Array("L0_S22_F545Vec", "L0_S0_F0", "L0_S0_F2","L0_S0_F4")) setOutputCol("features") var assembled = vectorAssembler.transform(encoded) We basically just define a list of column names and a target column, and the rest is done for us: As the view of the features column got a bit squashed, let's inspect one instance of the feature field in more detail: We can clearly see that we are dealing with a sparse vector of length 16 where positions 0, 13, 14, and 15 are non-zero and contain the following values: 1.0, 0.03, -0.034, and -0.197. Done! Let's create a Pipeline out of these components. Testing the feature engineering pipeline Let's create a Pipeline out of our transformers and estimators: import org.apache.spark.ml.Pipeline import org.apache.spark.ml.PipelineModel //Create an array out of individual pipeline stages var transformers = Array(indexer,encoder,assembled) var pipeline = new Pipeline().setStages(transformers).fit(df_notnull) var transformed = pipeline.transform(df_notnull) Note that the setStages method of Pipeline just expects an array of transformers and estimators, which we had created earlier. As parts of the Pipeline contain estimators, we have to run fit on our DataFrame first. The obtained Pipeline object takes a DataFrame in the transform method and returns the results of the transformations: As expected, we obtain the very same DataFrame as we had while running the stages individually in a sequence. Training the machine learning model Now it's time to add another component to the Pipeline: the actual machine learning algorithm-RandomForest: import org.apache.spark.ml.classification.RandomForestClassifier var rf = new RandomForestClassifier() .setLabelCol("label") .setFeaturesCol("features") var model = new Pipeline().setStages(transformers :+ rf).fit(df_notnull) var result = model.transform(df_notnull) This code is very straightforward. First, we have to instantiate our algorithm and obtain it as a reference in rf. We could have set additional parameters to the model but we'll do this later in an automated fashion in the CrossValidation step. Then, we just add the stage to our Pipeline, fit it, and finally transform. The fit method, apart from running all upstream stages, also calls fit on the RandomForestClassifier in order to train it. The trained model is now contained within the Pipeline and the transform method actually creates our predictions column: As we can see, we've now obtained an additional column called prediction, which contains the output of the RandomForestClassifier model. Of course, we've only used a very limited subset of available features/columns and have also not yet tuned the model, so we don't expect to do very well; however, let's take a look at how we can evaluate our model easily with Apache SparkML. Model evaluation Without evaluation, a model is worth nothing as we don't know how accurately it performs. Therefore, we will now use the built-in BinaryClassificationEvaluator in order to assess prediction performance and a widely used measure called areaUnderROC (going into detail here is beyond the scope of this book): import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator val evaluator = new BinaryClassificationEvaluator() import org.apache.spark.ml.param.ParamMap var evaluatorParamMap = ParamMap(evaluator.metricName -> "areaUnderROC") var aucTraining = evaluator.evaluate(result, evaluatorParamMap) As we can see, there is a built-in class called org.apache.spark.ml.evaluation.BinaryClassificationEvaluator and there are some other classes for other prediction use cases such as RegressionEvaluator or MulticlassClassificationEvaluator. The evaluator takes a parameter map--in this case, we are telling it to use the areaUnderROC metric--and finally, the evaluate method evaluates the result: As we can see, areaUnderROC is 0.5424418446501833. An ideal classifier would return a score of one. So we are only doing a bit better than random guesses but, as already stated, the number of features that we are looking at is fairly limited. Note : In the previous example we are using the areaUnderROC metric which is used for evaluation of binary classifiers. There exist an abundance of other metrics used for different disciplines of machine learning such as accuracy, precision, recall and F1 score. The following provides a good overview http://www.cs.cornell.edu/courses/cs578/2003fa/performance_measures.pdf This areaUnderROC is in fact a very bad value. Let's see if choosing better parameters for our RandomForest model increases this a bit in the next section. This areaUnderROC is in fact a very bad value. Let's see if choosing better parameters for our RandomForest model increases this a bit in the next section. CrossValidation and hyperparameter tuning As explained before, a common step in machine learning is cross-validating your model using testing data against training data and also tweaking the knobs of your machine learning algorithms. Let's use Apache SparkML in order to do this for us, fully automated! First, we have to configure the parameter map and CrossValidator: import org.apache.spark.ml.tuning.{CrossValidator, ParamGridBuilder} var paramGrid = new ParamGridBuilder() .addGrid(rf.numTrees, 3 :: 5 :: 10 :: 30 :: 50 :: 70 :: 100 :: 150 :: Nil) .addGrid(rf.featureSubsetStrategy, "auto" :: "all" :: "sqrt" :: "log2" :: "onethird" :: Nil) .addGrid(rf.impurity, "gini" :: "entropy" :: Nil) .addGrid(rf.maxBins, 2 :: 5 :: 10 :: 15 :: 20 :: 25 :: 30 :: Nil) .addGrid(rf.maxDepth, 3 :: 5 :: 10 :: 15 :: 20 :: 25 :: 30 :: Nil) .build() var crossValidator = new CrossValidator() .setEstimator(new Pipeline().setStages(transformers :+ rf)) .setEstimatorParamMaps(paramGrid) .setNumFolds(5) .setEvaluator(evaluator) var crossValidatorModel = crossValidator.fit(df_notnull) var newPredictions = crossValidatorModel.transform(df_notnull) The org.apache.spark.ml.tuning.ParamGridBuilder is used in order to define the hyperparameter space where the CrossValidator has to search and finally, the org.apache.spark.ml.tuning.CrossValidator takes our Pipeline, the hyperparameter space of our RandomForest classifier, and the number of folds for the CrossValidation as parameters. Now, as usual, we just need to call fit and transform on the CrossValidator and it will basically run our Pipeline multiple times and return a model that performs the best. Do you know how many different models are trained? Well, we have five folds on CrossValidation and five-dimensional hyperparameter space cardinalities between two and eight, so let's do the math: 5 * 8 * 5 * 2 * 7 * 7 = 19600 times! Using the evaluator to assess the quality of the cross-validated and tuned model Now that we've optimized our Pipeline in a fully automatic fashion, let's see how our best model can be obtained: var bestPipelineModel = crossValidatorModel.bestModel.asInstanceOf[PipelineModel] var stages = bestPipelineModel.stages import org.apache.spark.ml.classification.RandomForestClassificationModel val rfStage = stages(stages.length-1).asInstanceOf[RandomForestClassificationModel] rfStage.getNumTrees rfStage.getFeatureSubsetStrategy rfStage.getImpurity rfStage.getMaxBins rfStage.getMaxDepth The crossValidatorModel.bestModel code basically returns the best Pipeline. Now we use bestPipelineModel.stages to obtain the individual stages and obtain the tuned RandomForestClassificationModel using stages(stages.length 1).asInstanceOf[RandomForestClassificationModel]. Note that stages.length-1 addresses the last stage in the Pipeline, which is our RandomForestClassifier. So now, we can basically run evaluator using the best model and see how it performs: You might have noticed that 0.5362224872557545 is less than 0.5424418446501833, as we've obtained before. So why is this the case? Actually, this time we used cross-validation, which means that the model is less likely to over fit and therefore the score is a bit lower. So let's take a look at the parameters of the best model: Note that we've limited the hyperparameter space, so numTrees, maxBins, and maxDepth have been limited to five, and bigger trees will most likely perform better. So feel free to play around with this code and add features, and also use a bigger hyperparameter space, say, bigger trees. Finally, we've applied the concepts that we discussed on a real dataset from a Kaggle competition, which is a good starting point for your own machine learning project with Apache SparkML. If you found our post useful, do check out this book Mastering Apache Spark 2.x - Second Edition to know more about advanced analytics on your Big Data with latest Apache Spark 2.x.

0
0
4908

Packt

31 Dec 2009

3 min read

Five Years of Ubuntu

Packt

31 Dec 2009

3 min read

Community If there is any one word that could sum up Ubuntu, it would be Community. Even the definition of the word "Ubuntu" makes reference to community, and how the betterment of the individual and community are interconnected. Nearly everyone I've met through Ubuntu in the last five years cites the community as the single major reason for their use. In many aspects, Ubuntu is technically equal to its competitors, but nowhere else will you find the same level of community support. Nowhere else will you find the same level of friendship and positive atmosphere. Over the last five years I have tested many alternate Linux distributions and I have yet to find any other community that is as accepting, or that goes out of their way to invite you into the group. The Ubuntu community has so many people actively engaged in trying to provide a positive environment, it is truly amazing. If you find yourself at an Ubuntu event, don't be surprised if you actually see hugs! And watch out, it is contagious! The source of this positive community atmosphere is the guidance of the Ubuntu Code of Conduct. From the beginning, Ubuntu has had a Code of Conduct, and active contributing community members are expected to understand and sign the document. By outlining in clear terms what is expected of a community member, and keeping it forefront in members minds, Ubuntu is able to foster an atmosphere of mutual respect. This one simple document is what sets Ubuntu apart from every other online community. Sure you can find a community of contributors to any project, but nowhere else will you find the same respect and welcoming atmosphere that you'll find in Ubuntu. Simplicity Ubuntu introduced a level of simplicity to the Linux environment that hasn't been seen before. What has historically been a hobbyist operating system has been tuned and refined to the point that truly anyone can install and use it. Before Ubuntu, a user would need to be familiar with partitioning and package selections (at minimum!) in order to install a machine. With Ubuntu a machine can be installed with a very sane set of default tools without any technical decision making from the user. With tools such as Wubi, now included on Ubuntu installation disks, a user is able to install Ubuntu alongside an existing Windows installation and have the two peacefully coexist. Giving the user the ability to try-before-you-buy, and leave current installations and data intact have also drastically improved the userbase and adoption rate. Ubuntu was also the first to promote a single CD installation. Where most other Linux distributions were offering DVD based installations, Ubuntu packaged a core selection onto a single CD and offered those for download. They even offered to send free CDs through the mail to anyone that requested them! It doesn't get much simpler than than! The truly amazing thing about this simplicity is that, despite being limited to a single 700MB CD, Ubuntu comes with a plethora of software. The base installation provides a comprehensive desktop environment, including a full office suite, web browser, mail client, audio and video tools and more! This emphasis on delivering a highly refined, usable environment from the start is a very important aspect of Ubuntu.

0
0
4905

Flash 10 Multiplayer Game: The Lobby and New Game Screen Implementation

Joomla! 1.5 Top Extensions for Using Languages

Customizing Kubrik with Wordpress

Building Financial Functions into Excel 2010

ChatGPT for Information Retrieval and Competitive Intelligence

The Spark Programming Model

Leveraging Python in the World of Big Data

Building Queries Visually in MySQL Query Browser

Flash Multiplayer Virtual World: Setting up SmartFoxServer with Third-party HTTP and Database Server

An Introduction to the Terminal

Trending Topics

Why Meteor Rocks!

Blogger: Improving Your Blog with Google Analytics and Search Engine Optimization

Zen Gift of Education

How to win Kaggle competition with Apache SparkML

Five Years of Ubuntu

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access