Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7018 Articles
article-image-getting-store
Packt
28 Oct 2013
21 min read
Save for later

Getting into the Store

Packt
28 Oct 2013
21 min read
(For more resources related to this topic, see here.) This all starts by visiting https://appdev.microsoft.com/StorePortals, which will get you to the store dashboard that you use to submit and manage your applications. If you already have an account you'll just log in here and proceed. If not, we'll take a look at ways of getting it set up. There are a couple of ways to get a store account, which you will need before you can submit any game or application to the store. There are also two different types of accounts: Individual accounts Company accounts In most cases you will only need the first option. It's cheaper and easier to get, and you won't require the enterprise features provided by the company account for a game. For this reason we'll focus on the individual account. To register you'll need a credit card for verification, even if you gain a free account another way. Just follow the registration instructions, pay the fee, and complete verification, after which you'll be ready to go. Free accounts Students and developers with MSDN subscriptions can access registration codes that waive the fee for a minimum of one year. If you meet either of these requirements you can gain a code using the respective methods, and use that code during the registration process to set the fee to zero. Students can access their free accounts using the DreamSpark service that Microsoft runs. To access this you need to create an account on www.dreamspark.com and create an account. From there follow the steps to verify your student status and visit https://www.dreamspark.com/Student/Windows-Store-Access.aspx to get your registration code. If you have access to an MSDN subscription you can use this to gain a store account for free. Just log in to your account and in your account benefits overview you should be able to generate your registration code. Submitting your game So your game is polished and ready to go. What do you need to do to get it in the store? First log in to the dashboard and select Submit an App from the menu on the left. Here you can see the steps required to submit the app. This may look like a lot to do, but don't worry. Most of these are very simple to resolve and can be done before you even start working on the game. The first step is to choose a name for your game, and this can be done whenever you want. By reserving a name and creating the application entry you have a year to submit your application, giving you plenty of time to complete it. This is why it's a good idea to jump in and register your application once you have a name for it. If you change your mind later and want a different name you can always change it. The next step is to choose how and where you will sell your game. The other thing you need to choose here are the markets you want to sell your game in. This can be an area you need to be careful of, because the markets you choose here define the localization or content you need to watch for in your game. Certain markets are restrictive and including content that isn't appropriate for a market you say you want to sell in can cause you to fail the certification process. Once that is done you need to choose when you want to release your game—you can choose to release as soon as certification finishes or on a specific date, and then you choose the app category, which in this case will be Games. Don't forget to specify the genre of your game as the sub-category so players can find it. The final option on the Selling Details page which applies to us is the Hardware requirements section. Here we define the DirectX feature-level required for the game, and the minimum RAM required to run it. This is important because the store can help ensure that players don't try to play your game on systems that cannot run it. The next section allows you to define the in-app offers that will be made available to players. The Age rating and rating certificates section allows you to define the minimum age required to play the game, as well as submit official ratings certificates from ratings boards so that they may be displayed in the store to meet legal requirements. The latter part is optional in some cases, and may affect where you can submit your game depending on local laws. Aside from official ratings, all applications and games submitted to the store require a voluntary rating, chosen from one of the following age options: 3+ 7+ 12+ 16+ 18+ While all content is checked, the 7+ and 3+ ratings both have extra checks because of the extra requirements for those age ranges. The 3+ rating is especially restrictive as apps submitted with that age limit may not contain features that could connect to online services, collect personal information, or use the webcam or microphone. To play it safe it's recommended the 12+ rating is chosen, and if you're still uncertain, higher is safer. GDF Certificates The other entry required here if you have official ratings certificates is a GDF file. This is a Game Definition File, which defines the different ratings in a single location and provides the necessary information to display the rating and inform any parental settings. To do this you need to use the GDFMAKER.exe utility that ships with the Windows 8 SDK, and generate a GDF file which you can submit to the store. Alongside that you need to create a DLL containing that file (as a resource) without any entry point to include in the application package. For full details on how to create the GDF as well as the DLL, view the following MSDN article: http://msdn.microsoft.com/en-us/library/windows/apps/hh465153.aspx The final section before you need to submit your compiled application package is the cryptography declaration. For most games you should be able to declare that you aren't using any cryptography within the game and quickly move through this step. If you are using cryptography, including encrypting game saves or data files, you will need to declare that here and follow the instructions to either complete the step or provide an Export Control Classification Number (ECCN). Now you need to upload the compiled app package before you can continue, so we'll take a look at what it takes to do that before you continue. App packages To submit your game to the store, you need to package it up in a format that makes it easy to upload, and easy for the store to distribute. This is done by compiling the application as an .appx file. But before that happens we need to ensure we have defined all of the required metadata, and fulfill the certification requirements, otherwise we'll be uploading a package only to fail soon after. Part of this is done through the application manifest editor, which is accessible in Visual Studio by double-clicking on the Package.appxmanifest file in solution explorer. This editor is where you specify the name that will be seen in the start menu, as well as the icons used by the application. To pass certification all icons have to be provided at 100 percent DPI, which is referred to as Scale 100 in the editor. Icon/Image Base resolution Required Standard 150 x 150 px Yes Wide 310 x 150 px If Wide Tile Enabled Small 30 x 30 px Yes Store 50 x 50 px Yes Badge 24 x 24 px If Toasts Enabled Splash 620 x 300 px Yes If you wish to provide a higher quality images for people running on high DPI setups, you can do so with a simple filename change. If you add scale-XXX to your filename, just before the extension, and replace XXX with one of the following values, Windows will automatically make use of it at the appropriate DPI. scale-100 scale-140 scale-180 In the following image you can see the options available for editing the visual assets in the application. These all apply to the start menu and application start-up experience, including the splash screen and toast notifications. Toast Notifications in Windows 8 are pop-up notifications that slide in from the edge of the screen and show the users some information for a short period of time. They can click on the toast to open the application if they want. Alongside Live Tiles, Toast Notifications allow you to give the user information when the application is not running (although they work when the application is running). The previous table shows the different images you required for a Windows 8 application, and if they are mandatory or just required in certain situations. Note that this does not include the imagery required for the store, which includes some screenshots of the application and optional promotional art in case you want your application to be featured. You must replace all of the required icons with your own. Automated checks during certification will detect the use of the default "box" icon shown in the previous screenshot and automatically fail the submission. Capabilities Once you have the visual aspects in place, you need to declare the capabilities that the application will receive. Your game may not need any, however you should still only specify what you need to run, as some of these capabilities come with extra implications and non-obvious requirements. Adding a privacy policy One of those requirements is the privacy policy. Even if you are creating a game, there may be situations where you are collecting private information, which requires you to have a privacy policy. The biggest issue here is connecting to the internet. If your game marks any of the Internet capabilities in the manifest, you automatically trigger a check for a privacy policy as private information—in this case an IP address—is being shared. To avoid failing certification for this, you need to put together a privacy policy if you collect privacy information, or use any of the capabilities that would indicate you collect information. These include the Internet capabilities as well as location, webcam, and microphone capabilities. This privacy policy just needs to describe what you will do with the information, and directly mention your game and publisher name. Once you have the policy written, it needs to be posted in two locations. The first is a publicly accessible website, which you will provide a link to when filling out the description after uploading your game. The second is within the game itself. It is recommended you place this policy in the Windows 8 provided settings menu, which you can build using XAML or your own code. If you're going with a completely native Windows 8 application you may want to display the policy in your own way and link to it from options within your game. Declarations Once you've indicated the capabilities you want, you need to declare any operating system integration you've done. For most games you won't use this, however if you're taking advantage of Windows 8 features such as share targets (the destination for data shared using the Share Charm), or you have a Game Definition File, you will need to declare it here and provide the required information for the operating system. In the case of the GDF, you need to provide the file so that the parental controls system can make use of the ratings to appropriately control access. Certification kit The next step is to make sure you aren't going to fail the automated tests during certification. Microsoft provides the same automated tests used when you submit your app in the Windows Application Certification Kit (WACK). WACK is installed by default with Visual Studio 2012 or higher version. There are two ways to run the test: after you build your application package, or by running the kit directly against an installed app. We'll look at the latter first, as you might want to run the test on your deployed test game well before you build anything for the store. This is also the only way to run the WACK on a WinRT device, if you want to cover all bases. If you haven't already deployed or tested your app, deploy it using the Build menu in Visual Studio and then search for the Windows App Cert Kit using the start menu (just start typing). When you run this you will be given an option to choose which type of application you want to validate. In this case we want to select the Windows Store App option, which will then give you access to the list of apps installed on your machine. From there it's just a matter of selecting the app you want and starting the test. At this point you will want to leave your machine alone until the automated tests are complete. Any interference could lead to an incorrect failure of the certification tests. The results will indicate ways you can fix any issues; however, you should be fine for most of the tests. The biggest issues will arise from third party libraries that haven't been developed or ported to Windows 8. In this case the only option is to fix them yourself (if they're open source) or find an alternative. Once you have the test passing, or you feel confident that it won't be an issue, you need to create app packages that are compatible with the store. At this point your game will be associated with the submission you have created in the Windows Store dashboard so that it is prepared for upload. Creating your app packages To do this, right click on your game project in Visual Studio and click on Create App Packages inside the Store menu. Once you do that, you'll be asked if you want to create a package for the store. The difference between the two options comes down to how the package is signed. If you choose No here, you can create a package with your test certificate, which can be distributed for testing. These packages must be manually installed and cannot be submitted to the store. You can, however, use this type of package on other machines to install your game for testers to try out. Choosing No will give you a folder with a .ps1 file (Powershell), which you can run to execute the install script. Choosing Yes at this option will take you to a login screen where you can enter your Windows Store developer account details. Once you've logged in you will be presented with a list of applications that you have registered with the store. If you haven't yet reserved the name of your application, you can click on the Reserve Name link, which will take you directly to the appropriate page in the store dashboard. Otherwise select the name of the game you're trying to build and click on Next. The next screen will allow you to specify which architectures to build for, and the version number of the built package. As this is a C++ game, we need to provide separate packages for the ARM, x86, and x64 builds, depending on what you want to support. Simply providing an x86 and ARM build will cover the entire market; 64 bit can be nice to have if you need a lot of memory, but ultimately it is optional, and some users may not even be able to run x64 code. When you're ready click on Create and Visual Studio will proceed to build your game and compile the requested packages, placing them in the directory specified. If you've built for the store, you will need the .appxupload files from this directory when you proceed to upload your game. Once the build has completed you will be asked if you want to launch the Windows Application Certification Kit. As mentioned previously this will test your game for certification failures, and if you're submitting to the store it's strongly recommended you run this. Doing so at this screen will automatically deploy the built package and run the test, so ensure you have a little bit of time to let it run. Uploading and submitting Now that you have a built app package you can return to the store dashboard to submit your game. Just edit the submission you made previously and enter the Packages section, which will take you to the page where you can upload the appxupload file. Once you have successfully uploaded your game you will gain access to the next section, the Description. This is where you define the details that will be displayed in the store. This is also where your marketing skills come into play as you prepare the content that will hopefully get players to buy your game. You start with the description of your game, and any big feature bullet points you want to emphasize. This is the best place to mention any reviews or praise, as well as give a quick description that will help the players decide if they want to try your game. You can have a number of app features listed; however, like any "back of the box" bullet points, keep them short and exciting. Along with the description, the store requires at least one screenshot to display to the potential player. These screenshots need to be of the entire screen, and that means they need to be at least 1366x768, which is the minimum resolution of Windows 8. These are also one of the best ways to promote your game, so ensure you take some great screenshots that show off the fun and appeal of your game. There are a few ways to take a screenshot of your game. If you're testing in the simulator you can use the screenshot icon on the right toolbar of the simulator to take the screenshot. If not, you can use Windows Key + Prt Scr SysRq to take a screenshot of your entire screen, and then use that (or edit it if you have multiple monitors). Screenshots taken with either of these tools can be found in the Screenshots folder within your Pictures library. There are two other small pieces of information that are required during this stage: Copyright Info and Support contact info. For the support info, an e-mail address will usually suffice. At this point you can also include your website and, if applicable to your game, a link to the privacy policy included in your game. Note that if you require a privacy policy, it must be included in two places: your game, and the privacy policy field on this form. The last items you may want to add here are promotional images. These images are intended for use in store promotions and allow Microsoft to easily feature your game with larger promotional imagery in prominent locations within the store. If you are serious about maximizing the reach of your game, you will want to include these images. If you don't, the number of places your game can be featured will be reduced. At a minimum the 414x180 px image should be included if you want some form of promotion. Now you're almost done! The next section allows you to leave notes for the testing team. This is where you would leave test account details for any features in your game that require an account so that they can test those features. This is also the location to leave any notes about testing in case there are situations where you can point out any features that might not be obvious. In certain situations you may have an exemption from Microsoft for a certification requirement; this would be where you include that exemption. When every step has been completed and you have tick marks in all of the stages, the Submit for Certification button will unlock, allowing you to complete your submission and send it off for certification. At this stage a number of automated tests will run before human testers will try your game on a variety of devices to ensure it fits the requirements for the store. If all goes well, you will receive an email notifying you of your successful certification and, depending on if you set the release date as ASAP, you will find your game in the store a few hours later (it may take a few hours to appear in the store once you receive an email informing you that your game or app is in the store). Certification tips Your first stop should be the certification requirements page, which lists all of the current requirements your game will be tested for: http://msdn.microsoft.com/en-us/library/windows/apps/hh694083.aspx. There are some requirements that you should take note of, and in this section we'll take a look at ways to help ensure you pass those requirements. Privacy The first of course is the privacy policy. As mentioned before, if your game collects any sort of personal information, you will need that policy in two places: In full text within the game Accessible through an Internet link The default app template generated by Visual Studio automatically enables the Internet capability, and by simply having that enabled you require a privacy policy. If you aren't connecting to the Internet at all in your game, you should always ensure that none of the Internet options are enabled before you package your game. If you share any personal information, then you need to provide players with a method of opting in to the sharing. This could be done by gating the functionality behind a login screen. Note that this functionality can be locked away, and the requirement doesn't demand that you find a way to remain fully functional even if the user opts out. Features One requirement is that your game support both touch input and keyboard/mouse input. You can easily support this by using an input system like the one described in this article; however, by supporting touch input you get mouse input for free and technically fulfill this requirement. It's all about how much effort you want to put into the experience your player will have, and that's why including gamepad input is recommended as some players may want to use a connected Xbox 360 gamepad for their input device in games. Legacy APIs Although your game might run while using legacy APIs, it won't pass certification. This is checked through an automated test that also occurs during the WACK testing process, so you can easily check if you have used any illegal APIs. This often arises in third party libraries which make use of parts of the standard IO library such as the console, or the insecure versions of functions such as strcpy or fopen. Some of these APIs don't exist in WinRT for good reason; the console, for example, just doesn't exist, so calling APIs that work directly with the console makes no sense, and isn't allowed. Debug Another issue that may arise through the use of third-party libraries is that some of them may be compiled in debug mode. This could present issues at runtime for your app, and the packaging system will happily include these when compiling your game, unless it has to compile them itself. This is detected by the WACK and can be resolved by finding a release mode version, or recompiling the library. WACK The final tip is: run WACK. This kit quickly and easily finds most of the issues you may encounter during certification, and you see the issues immediately rather than waiting for it to fail during the certification process. Your final step before submitting to the store should be to run WACK, and even while developing it's a good idea to compile in release mode and run the tests to just make sure nothing is broken. Summary By now you should know how to submit your game to the store, and get through certification with little to no issues. We've looked at what you require for the store including imagery and metadata, as well as how to make use of the Windows Application Certification Kit to find problems early on and fix them up without waiting hours or days for certification to fail your game. One area unique to games that we have covered in this article is game ratings. If you're developing your game for certain markets where ratings are required, or if you are developing children's games, you may need to get a rating certificate, and hopefully you have an idea of where to look to do this. Resources for Article: Further resources on this subject: Introduction to Game Development Using Unity 3D [Article] HTML5 Games Development: Using Local Storage to Store Game Data [Article] Unity Game Development: Interactions (Part 1) [Article]
Read more
  • 0
  • 0
  • 4710

article-image-getting-started-codeblocks
Packt
28 Oct 2013
7 min read
Save for later

Getting Started with Code::Blocks

Packt
28 Oct 2013
7 min read
(For more resources related to this topic, see here.) Why Code::Blocks? Before we go on learning more about Code::Blocks let us understand why we shall use Code::Blocks over other IDEs. It is a cross-platform Integrated Development Environment (IDE). It supports Windows, Linux, and Mac operating system. It supports GCC compiler and GNU debugger on all supported platforms completely. It supports numerous other compilers to various degrees on multiple platforms. It is scriptable and extendable. It comes with several plugins that extend its core functionality. It is lightweight on resources and doesn't require a powerful computer to run it. Finally, it is free and open source. Installing Code::Blocks on Windows Our primary focus of this article will be on Windows platform. However, we'll touch upon other platforms wherever possible. Official Code::Blocks binaries are available from www.codeblocks.org. Perform the following steps for successful installation of Code::Blocks: For installation on Windows platform download codeblocks-12.11mingw-setup.exe file from http://www.codeblocks.org/downloads/26 or from sourceforge mirror http://sourceforge.net/projects/codeblocks/files/Binaries/12.11/Windows/codeblocks-12.11mingw-setup.exe/download and save it in a folder. Double-click on this file and run it. You'll be presented with the following screen: As shown in the following screenshot click on the Next button to continue. License text will be presented. The Code::Blocks application is licensed under GNU GPLv3 and Code::Blocks SDK is licensed under GNU LGPLv3. You can learn more about these licenses at this URL—https://www.gnu.org/licenses/licenses.html. Click on I Agree to accept the License Agreement. The component selection page will be presented in the following screenshot: You may choose any of the following options: Default install: This is the default installation option. This will install Code::Block's core components and core plugins. Contrib Plugins: Plugins are small programs that extend Code::Block's functionality. Select this option to install plugins contributed by several other developers. C::B Share Config: This utility can copy all/parts of configuration file. MinGW Compiler Suite: This option will install GCC 4.7.1 for Windows. Select Full Installation and click on Next button to continue. As shown in the following screenshot installer will now prompt to select installation directory: You can install it to default installation directory. Otherwise choose Destination Folder and then click on the Install button. Installer will now proceed with installation. As shown in the following screenshot Code::Blocks will now prompt us to run it after the installation is completed: Click on the No button here and then click on the Next button. Installation will now be completed: Click on the Finish button to complete installation. A shortcut will be created on the desktop. This completes our Code::Blocks installation on Windows. Installing Code::Blocks on Linux Code::Blocks runs numerous Linux distributions. In this section we'll learn about installation of Code::Blocks on CentOS linux. CentOS is a Linux distro based on Red Hat Enterprise Linux and is a freely available, enterprise grade Linux distribution. Perform the following steps to install Code::Blocks on Linux OS: Navigate to Settings | Administration | Add/Remove Software menu option. Enter wxGTK in the Search box and hit the Enter key. As of writing wxGTK-2.8.12 is the latest wxWidgets stable release available. Select it and click on the Apply button to install wxGTK package via the package manager, as shown in the following screenshot. Download packages for CentOS 6 from this URL—http://www.codeblocks.org/downloads/26. Unpack the .tar.bz2 file by issuing the following command in shell: tar xvjf codeblocks-12.11-1.el6.i686.tar.bz2 Right-click on the codeblocks-12.11-1.el6.i686.rpm file as shown in the following screenshot and choose the Open with Package Installer option. The following window will be displayed. Click on the Install button to begin installation, as shown in the following screenshot: You may be asked to enter the root password if you are installing it from a user account. Enter the root password and click on the Authenticate button. Code::Blocks will now be installed. Repeat steps 4 to 6 to install other rpm files. We have now learned to install Code::Blocks on the Windows and Linux platforms. We are now ready for C++ development. Before doing that we'll learn about the Code::Blocks user interface. First run On the Windows platform navigate to the Start | All Programs | CodeBlocks | CodeBlocks menu options to launch Code::Blocks. Alternatively you may double-click on the shortcut displayed on the desktop to launch Code::Blocks, as in the following screenshot: On Linux navigate to Applications | Programming | Code::Blocks IDE menu options to run Code::Blocks. Code::Blocks will now ask the user to select the default compiler. Code::Blocks supports several compilers and hence, is able to detect the presence of other compilers. The following screenshot shows that Code::Blocks has detected GNU GCC Compiler (which was bundled with the installer and has been installed). Click on it to select and then click on Set as default button, as shown in the following screenshot: Do not worry about the items highlighted in red in the previous screenshot. Red colored lines indicate Code::Blocks was unable to detect the presence of a particular compiler. Finally, click on the OK button to continue with the loading of Code::Blocks. After the loading is complete the Code::Blocks window will be shown. The following screenshot shows main window of Code::Blocks. Annotated portions highlight different User Interface (UI) components: Now, let us understand more about different UI components: Menu bar and toolbar: All Code::Blocks commands are available via menu bar. On the other hand toolbars provide quick access to commonly used commands. Start page and code editors: Start page is the default page when Code::Blocks is launched. This contains some useful links and recent project and file history. Code editors are text containers to edit C++ (and other language) source files. These editors offer syntax highlighting—a feature that highlights keywords in different colors. Management pane: This window shows all open files (including source files, project files, and workspace files). This pane is also used by other plugins to provide additional functionalities. In the preceding screenshot FileManager plugin is providing a Windows Explorer like facility and Code Completion plugin is providing details of currently open source files. Log windows: Log messages from different tools, for example, compiler, debugger, document parser, and so on, are shown here. This component is also used by other plugins. Status bar: This component shows various status information of Code::Blocks, for example, file path, file encoding, line numbers, and so on. Introduction to important toolbars Toolbars provide easier access to different functions of Code::Blocks. Amongst the several toolbars following ones are most important. Main toolbar The main toolbar holds core component commands. From left to right there are new file, open file, save, save all, undo, redo, cut, copy, paste, find, and replace buttons. Compiler toolbar The compiler toolbar holds commonly used compiler related commands. From left to right there are build, run, build and run, rebuild, stop build, build target buttons. Compilation of C++ source code is also called a build and this terminology will be used throughout the article. Debugger toolbar The debugger toolbar holds commonly used debugger related commands. From left to right there are debug/continue, run to cursor, next line, step into, step out, next instruction, step into instruction, break debugger, stop debugger, debugging windows, and various info buttons. Summary In this article we have learned to download and install Code::Blocks. We also learnt about different interface elements. Resources for Article: Further resources on this subject: OpenGL 4.0: Building a C++ Shader Program Class [Article] Application Development in Visual C++ - The Tetris Application [Article] Building UI with XAML for Windows 8 Using C [Article]
Read more
  • 0
  • 0
  • 8147

article-image-low-level-index-control
Packt
28 Oct 2013
12 min read
Save for later

Low-Level Index Control

Packt
28 Oct 2013
12 min read
(For more resources related to this topic, see here.) Altering Apache Lucene scoring With the release of Apache Lucene 4.0 in 2012, all the users of this great, full text search library, were given the opportunity to alter the default TF/IDF based algorithm. Lucene API was changed to allow easier modification and extension of the scoring formula. However, that was not the only change that was made to Lucene when it comes to documents score calculation. Lucene 4.0 was shipped with additional similarity models, which basically allows us to use different scoring formula for our documents. In this section we will take a deeper look at what Lucene 4.0 brings and how those features were incorporated into ElasticSearch. Setting per-field similarity Since ElasticSearch 0.90, we are allowed to set a different similarity for each of the fields we have in our mappings. For example, let's assume that we have the following simple mapping that we use, in order to index blog posts (stored in the posts_no_similarity.json file): { "mappings" : { "post" : { "properties" : { "id" : { "type" : "long", "store" : "yes", "precision_step" : "0" }, "name" : { "type" : "string", "store" : "yes", "index" : "analyzed" }, "contents" : { "type" : "string", "store" : "no", "index" : "analyzed" } } } } } What we would like to do is, use the BM25 similarity model for the name field and the contents field. In order to do that, we need to extend our field definitions and add the similarity property with the value of the chosen similarity name. Our changed mappings (stored in the posts_similarity.json file) would appear as shown in the following code: { "mappings" : { "post" : { "properties" : { "id" : { "type" : "long", "store" : "yes", "precision_step" : "0" }, "name" : { "type" : "string", "store" : "yes", "index" : "analyzed", "similarity" : "BM25" }, "contents" : { "type" : "string", "store" : "no", "index" : "analyzed", "similarity" : "BM25" } } } } } And that's all, nothing more is needed. After the preceding change, Apache Lucene will use the BM25 similarity to calculate the score factor for the name and contents fields. In case of the Divergence from randomness and Information based similarity model, we need to configure some additional properties to specify the behavior of those similarities. How to do that is covered in the next part of the current section. Default codec properties When using the default codec we are allowed to configure the following properties: min_block_size: It specifies the minimum block size Lucene term dictionary uses to encode blocks. It defaults to 25. max_block_size: It specifies the maximum block size Lucene term dictionary uses to encode blocks. It defaults to 48. Direct codec properties The direct codec allows us to configure the following properties: min_skip_count: It specifies the minimum number of terms with a shared prefix to allow writing of a skip pointer. It defaults to 8. low_freq_cutoff: The codec will use a single array object to hold postings and positions that have document frequency lower than this value. It defaults to 32. Memory codec properties By using the memory codec we are allowed to alter the following properties: pack_fst: It is a Boolean option that defaults to false and specifies if the memory structure that holds the postings should be packed into the FST. Packing into FST will reduce the memory needed to hold the data. acceptable_overhead_ratio: It is a compression ratio of the internal structure specified as a float value which defaults to 0.2. When using the 0 value, there will be no additional memory overhead but the returned implementation may be slow. When using the 0.5 value, there can be a 50 percent memory overhead, but the implementation will be fast. Values higher than 1 are also possible, but may result in high memory overhead. Pulsing codec properties When using the pulsing codec we are allowed to use the same properties as with the default codec and in addition to them one more property, which is described as follows: freq_cut_off: It defaults to 1. The document frequency at which the postings list will be written into the term dictionary. The documents with the frequency equal to or less than the value of freq_cut_off will be processed. Bloom filter-based codec properties If we want to configure a bloom filter based codec, we can use the bloom_filter type and set the following properties: delegate: It specifies the name of the codec we want to wrap, with the bloom filter. ffp: It is a value between 0 and 1.0 which specifies the desired false positive probability. We are allowed to set multiple probabilities depending on the amount of documents per Lucene segment. For example, the default value of 10k=0.01, 1m=0.03 specifies that the fpp value of 0.01 will be used when the number of documents per segment is larger than 10.000 and the value of 0.03 will be used when the number of documents per segment is larger than one million. For example, we could configure our custom bloom filter based codec to wrap a direct posting format as shown in the following code (stored in posts_bloom_custom.json file): { "settings" : { "index" : { "codec" : { "postings_format" : { "custom_bloom" : { "type" : "bloom_filter", "delegate" : "direct", "ffp" : "10k=0.03, 1m=0.05" } } } } }, "mappings" : { "post" : { "properties" : { "id" : { "type" : "long", "store" : "yes", "precision_step" : "0" }, "name" : { "type" : "string", "store" : "yes", "index" : "analyzed", "postings_format" : "custom_bloom" }, "contents" : { "type" : "string", "store" : "no", "index" : "analyzed" } } } } } NRT, flush, refresh, and transaction log In an ideal search solution, when new data is indexed it is instantly available for searching. At the first glance it is exactly how ElasticSearch works even in multiserver environments. But this is not the truth (or at least not all the truth) and we will show you why it is like this. Let's index an example document to the newly created index by using the following command: curl -XPOST localhost:9200/test/test/1 -d '{ "title": "test" }' Now, we will replace this document and immediately we will try to find it. In order to do this, we'll use the following command chain: curl –XPOST localhost:9200/test/test/1 -d '{ "title": "test2" }' ; curl localhost:9200/test/test/_search?pretty The preceding command will probably result in the response, which is very similar to the following response: {"ok":true,"_index":"test","_type":"test","_id":"1","_version":2}{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 1.0, "hits" : [ { "_index" : "test", "_type" : "test", "_id" : "1", "_score" : 1.0, "_source" : { "title": "test" } } ] } } The first line starts with a response to the indexing command—the first command. As you can see everything is correct, so the second, search query should return the document with the title field test2, however, as you can see it returned the first document. What happened? But before we give you the answer to the previous question, we should take a step backward and discuss about how underlying Apache Lucene library makes the newly indexed documents available for searching. Updating index and committing changes The segments are independent indices, which means that queries that are run in parallel to indexing, from time to time should add newly created segments to the set of those segments that are used for searching. Apache Lucene does that by creating subsequent (because of write-once nature of the index) segments_N files, which list segments in the index. This process is called committing. Lucene can do this in a secure way—we are sure that all changes or none of them hits the index. If a failure happens, we can be sure that the index will be in consistent state. Let's return to our example. The first operation adds the document to the index, but doesn't run the commit command to Lucene. This is exactly how it works. However, a commit is not enough for the data to be available for searching. Lucene library use an abstraction class called Searcher to access index. After a commit operation, the Searcher object should be reopened in order to be able to see the newly created segments. This whole process is called refresh. For performance reasons ElasticSearch tries to postpone costly refreshes and by default refresh is not performed after indexing a single document (or a batch of them), but the Searcher is refreshed every second. This happens quite often, but sometimes applications require the refresh operation to be performed more often than once every second. When this happens you can consider using another technology or requirements should be verified. If required, there is possibility to force refresh by using ElasticSearch API. For example, in our example we can add the following command: curl –XGET localhost:9200/test/_refresh If we add the preceding command before the search, ElasticSearch would respond as we had expected. Changing the default refresh time The time between automatic Searcher refresh can be changed by using the index.refresh_interval parameter either in the ElasticSearch configuration file or by using the update settings API. For example: curl -XPUT localhost:9200/test/_settings -d '{ "index" : { "refresh_interval" : "5m" } }' The preceding command will change the automatic refresh to be done every 5 minutes. Please remember that the data that are indexed between refreshes won't be visible by queries. As we said, the refresh operation is costly when it comes to resources. The longer the period of refresh is, the faster your indexing will be. If you are planning for very high indexing procedure when you don't need your data to be visible until the indexing ends, you can consider disabling the refresh operation by setting the index.refresh_interval parameter to -1 and setting it back to its original value after the indexing is done. The transaction log Apache Lucene can guarantee index consistency and all or nothing indexing, which is great. But this fact cannot ensure us that there will be no data loss when failure happens while writing data to the index (for example, when there isn't enough space on the device, the device is faulty or there aren't enough file handlers available to create new index files). Another problem is that frequent commit is costly in terms of performance (as you recall, a single commit will trigger a new segment creation and this can trigger the segments to merge). ElasticSearch solves those issues by implementing transaction log. Transaction log holds all uncommitted transactions and from time to time, ElasticSearch creates a new log for subsequent changes. When something goes wrong, transaction log can be replayed to make sure that none of the changes were lost. All of these tasks are happening automatically, so, the user may not be aware of the fact that commit was triggered at a particular moment. In ElasticSearch, the moment when the information from transaction log is synchronized with the storage (which is Apache Lucene index) and transaction log is cleared is called flushing. Please note the difference between flush and refresh operations. In most of the cases refresh is exactly what you want. It is all about making new data available for searching. From the opposite side, the flush operation is used to make sure that all the data is correctly stored in the index and transaction log can be cleared. In addition to automatic flushing, it can be forced manually using the flush API. For example, we can run a command to flush all the data stored in the transaction log for all indices, by running the following command: curl –XGET localhost:9200/_flush Or we can run the flush command for the particular index, which in our case is the one called library: curl –XGET localhost:9200/library/_flush curl –XGET localhost:9200/library/_refresh In the second example we used it together with the refresh, which after flushing the data opens a new searcher. The transaction log configuration If the default behavior of the transaction log is not enough ElasticSearch allows us to configure its behavior when it comes to the transaction log handling. The following parameters can be set in the elasticsearch.yml file as well as using index settings update API to control transaction log behavior: index.translog.flush_threshold_period: It defaults to 30 minutes (30m). It controls the time, after which flush will be forced automatically even if no new data was being written to it. In some cases this can cause a lot of I/O operation, so sometimes it's better to do flush more often with less data being stored in it. index.translog.flush_threshold_ops: It specifies the maximum number of operations after which the flush operation will be performed. It defaults to 5000. index.translog.flush_threshold_size: It specifies the maximum size of the transaction log. If the size of the transaction log is equal to or greater than the parameter, the flush operation will be performed. It defaults to 200 MB. index.translog.disable_flush: This option disables automatic flush. By default flushing is enabled, but sometimes it is handy to disable it temporarily, for example, during import of large amount of documents. All of the mentioned parameters are specified for an index of our choice, but they are defining the behavior of the transaction log for each of the index shards. Of course, in addition to setting the preceding parameters in the elasticsearch.yml file, they can also be set by using Settings Update API. For example: curl -XPUT localhost:9200/test/_settings -d '{ "index" : { "translog.disable_flush" : true } }' The preceding command was run before the import of a large amount of data, which gave us a performance boost for indexing. However, one should remember to turn on flushing when the import is done. Near Real Time GET Transaction log gives us one more feature for free that is, real-time GET operation, which provides the possibility of returning the previous version of the document including non-committed versions. The real-time GET operation fetches data from the index, but first it checks if a newer version of that document is available in the transaction log. If there is no flushed document, data from the index is ignored and a newer version of the document is returned—the one from the transaction log. In order to see how it works, you can replace the search operation in our example with the following command: curl -XGET localhost:9200/test/test/1?pretty ElasticSearch should return the result similar to the following: { "_index" : "test", "_type" : "test", "_id" : "1", "_version" : 2, "exists" : true, "_source" : { "title": "test2" } } If you look at the result, you would see that again, the result was just as we expected and no trick with refresh was required to obtain the newest version of the document.
Read more
  • 0
  • 0
  • 2106

article-image-availability-management
Packt
28 Oct 2013
29 min read
Save for later

Availability Management

Packt
28 Oct 2013
29 min read
(For more resources related to this topic, see here.) Reducing planned and unplanned downtime Whether we are talking about a highly available and critically productive environment or not, any planned or unexpected downtime means financial losses. Historically, solutions that could provide high availability and redundancy were costly and complex. With the virtualization technologies available today, it becomes easier to provide higher levels of availability for environments where they are needed. With VMware products, and vSphere in particular, it's possible to do the following things: Have higher availability that is independent of hardware, operating systems, or applications Choose a planned downtime for many maintenance tasks, and shorten or eliminate them Provide automatic recovery in case of failure Planned downtime Planned downtime usually happens during hardware maintenance, firmware or operating system updates, and server migrations. To reduce the impact of planned downtime, IT administrators are forced to schedule small maintenance windows outside the working hours. vSphere makes it possible to dramatically reduce the planned downtime. With vSphere, IT administrators can perform many maintenance tasks at any point in time as it allows downtime elimination for many common maintenance operations. This is possible mainly because workloads in a vSphere can be dynamically moved between different physical servers and storage resources without any service interruption. The main availability capabilities that are built into vSphere allow the use of HA and redundancy features, and are as follows: Shared storage: Storage resources such as Fibre Channel, iSCSI, Storage Area Network (SAN), or Network Access Storage (NAS) help eliminate the single points of failure. SAN mirroring and replication features can be used to have fresh copies of the virtual disk at disaster recovery sites. Network interface teaming: This feature provides tolerance for individual network card failures. Storage multipathing: This helps to tolerate storage path failures. vSphere vMotion® and Storage vMotion functionalities allow the migration of VMs between ESXi hosts and their underlying storage without service interruption, as shown in the following figure: In other words, vMotion is the live migration of VMs between ESXi hosts, and Storage vMotion is the live migration of VMs between storage LUNs. In both cases, VM retains its network and disk connection. With vSphere 5.1 and the later versions, it's possible to combine vMotion with Storage vMotion into a single migration that simplifies administration. The entire process takes less than two seconds on a GB network. vMotion keeps track of the ongoing memory transaction while memory and system states get copied to the target host. Once copying is done, vMotion suspends the source VM, copies the transactions that happened during the process to the target host, and resumes the VM on the target host. This way, vMotion ensures transaction integrity. vSphere requirements for vMotion vSphere requirements for vMotion are as follows: All the hosts must have the following features: They should be correctly licensed for vMotion Have access to the shared storage Use a GB Ethernet adapter for vMotion, preferably a dedicated one The VMkernel port group is configured for vMotion with the same name (the name is case sensitive) Have access to the same subnets Must be members of all the vSphere distributed switches that VMs use for networking Use jumbo frames for best vMotion performance All the virtual machines that need to be vMotioned must have the following features: Shouldn't use raw disks if migration between storage LUNs is needed Shouldn't use devices that are not available on the destination host (for example, a CD drive or USB devices not enabled for vMotion) Should be located on a shared storage resource Shouldn't use devices connected from the client computer Migration with vMotion Migration with vMotion happens in three stages: vCenter server verifies that the existing VM is in a stable state and that the CPU on the target host is compatible with the CPU this VM is currently using vCenter migrates VM state information such as memory, registers, and network connections to the target host The virtual machine resumes its activities on the new host VMs with snapshots can be vMotioned regardless of their power state as long as their files stay on the same storage. Obviously, this storage has to be accessible for both the source and destination hosts. If migration involves moving configuration files or virtual disks, the following additional requirements apply: Both the source and destination hosts must be of ESX or ESXi version 3.5 or later All the VM files should be kept in a single directory on a shared storage resource To vMotion a VM in vCenter, right-click on a VM and choose Migrate… as shown in the following screenshot: This opens a migration wizard where you can select whether it's going to migrate between hosts or storage or both. The Change hostoption is the standard vMotion, and Change datastore is the Storage vMotion. As you can see, the Change both host and datastore option is not available because this VM is currently running. As mentioned earlier, vSphere 5.1 and later support vMotion and Storage vMotion in one transaction. In the next steps, you are able to choose the destination as well as the priority for this migration. Multiple VMs can be migrated at the same time if you make multiple selections in the Virtual Machines tab for the host or the cluster. VM vMotion is widely used to perform host maintenance such as upgrading the ESX operating system, memory, or any other configuration changes. When maintenance is needed on a host, all the VMs can be migrated to other hosts and this host can be switched into the maintenance mode. This can be accomplished by right-clicking on the host and selecting Enter Maintenance Mode. Unplanned downtime Environments, especially critical ones, need to be protected from any unplanned downtime caused by possible hardware or application failures. vSphere has important capabilities that can address this challenge and help to eliminate unplanned downtime. These vSphere capabilities are transparent to the guest operating system and any applications running inside the VMs; they are also a part of the virtual infrastructure. The following features can be configured for VMs in order to reduce the cost and complexity of HA. More detail on these features will be given in the following sections of this article. High availability (HA) vSphere's HA is a feature that allows a group of hosts connected together to provide high levels of availability for VMs running on these hosts. It protects VMs and their applications in the following ways: In case of ESX server failure, it restarts VMs on the other hosts that are members of the cluster In case of guest OS failure, it resets the VM If application failure is detected, it can reset a VM With vSphere HA, there is no need to install any additional software in a VM. After vSphere HA is configured, all the new VMs will be protected automatically. The HA option can be combined with vSphere DRS to protect against failures and to provide load balancing across the hosts within a cluster The advantages of HA over traditional failover solutions are listed in the VMware article at http://pubs.vmware.com/vsphere-51/index.jsp?topic=%2Fcom.vmware.vsphere.avail.doc%2FGUID-CB46CEC4-87CD-4704-A9AC-058281CFD8F8.html. Creating a vSphere HA cluster Before HA can be enabled, a cluster itself needs to be created. To create a new cluster, right-click on the datacenter object in the Hosts and Clusters view and select New Cluster... as shown in the following screenshot: The following prerequisites have to be considered before setting up a HA cluster: All the hosts must be licensed for vSphere HA. ESX/ESXi 3.5 hosts are supported for vSphere HA with the following patches installed; these fix an issue involving file locks: ESX 3.5: patch ESX350-201012401-SG and prerequisites ESXi 3.5: patch ESXe350-201012401-I-BG and prerequisites At least two hosts must exist in the cluster. All the hosts' IP addresses need to be assigned statically or configured via DHCP with static reservations to ensure address consistency across host reboots. At least one network should exist that is shared by all the hosts, that is, a management network. It is best practice to have at least two. To ensure VMs can run on any host, all the hosts should also share the same datastores and virtual networks. All the VMs must be stored on shared, and not local, storage. VMware tools must be installed for VM monitoring to work. Host certificate checking should be enabled. Once all of the requirements have been met, vSphere HA can be enabled in vCenter under the cluster settings dialog. In the following screenshot, it appears as PRD-CLUSTER Settings: Once HA is enabled, all the cluster hosts that are running and are not in maintenance mode become a part of HA. HA settings The following HA settings can also be changed at the same time: Host monitoring status is enabled by default Admission control is enabled by default Virtual machine options (restart priority is Medium by default and isolation response by default is set to Leave powered on) VM monitoring is disabled by default Datastore heartbeating is selected by vCenter by default More details on each of these settings can be found in the following sections of this article. Host monitoring status When a HA cluster is created, an agent is uploaded to all the hosts and configured to communicate with other agents within the cluster. One of the hosts becomes the master host, and the rest become slave hosts. There is an election process to choose the master host, and the host that mounts more datastores has an advantage in this election. In cases where we have a tie, the host with the lexically-highest Managed Object ID(MOID) is chosen. MOID, also called MoRef ID, is a value generated by vCenter for each object: host, datastore, VM, and so on. It is guaranteed to be unique across the infrastructure managed by this particular vCenter server. When it comes to the election process for choosing the master host, a host with ID 99 will have higher priority than a host with ID 100. If a master host fails or becomes unavailable, a new election process is initiated. Slave hosts monitor whether the VMs are running locally and report to the master host. In its turn, the master host communicates with vCenter and monitors other hosts for failures. Its main responsibilities are listed as follows: Monitoring the state of the slave hosts and in case of failure, identifying which VMs must be restarted Monitoring the state of all the protected VMs and restarting them in case of failure Managing a list of hosts and protected VMs Communicating with vCenter and reporting the cluster's health state Host availability monitoring is done through a network heartbeat exchange, which happens every second by default. In cases where we lose network heartbeats with a host, before declaring it as failed, the master host checks whether this host communicates with any of the existing datastores using datastore heartbeats and responds to pings sent to its management IP address or not. The master host detects the following types of host failure: Type of failure Network heartbeats ICMP ping Datastore heartbeats Lost connectivity to the master host - + + Network isolation - - + Failure - - - If host failure is detected, the host's VMs will be restarted on other hosts. Host network isolation happens when a host is running but doesn't see any traffic from vSphere HA agents, which means that it's disconnected from the management network. Isolation is handled as a special case of failure in VMware HA. If a host becomes network isolated, the master host continues to monitor this host and the VMs running on it. Depending on the isolation settings chosen for individual VMs, some of them may be restarted on another host. The master host has to communicate with vCenter, therefore, it can't be in the isolation mode. Once that happens, a new master host will be elected. When network isolation happens, certain hosts are not able to communicate with vCenter, which may result in configuration changes not having effect on certain parts of the infrastructure. If a network infrastructure is configured correctly and has redundant network paths, isolation should happen rarely. Datastore heartbeating Datastore heartbeating was introduced in vSphere 5. In the previous versions of vSphere, once a host became unreachable through the management network, HA always initiated VM restart, even if the VMs were still running. This, of course, created unnecessary downtime and additional stress to the host. Datastore heartbeating allows HA to make a distinction between hosts that are isolated or partitioned and hosts that have failed, which adds more stability to the way HA works. vCenter server selects a list of datastores for heartbeat verification to maximize the number of hosts that can be verified. It uses a selection algorithm designed to select datastores that are connected to the highest number of hosts. This algorithm attempts to choose datastores that are hosted on different storage arrays or NFS servers. It also prefers VMFS-formatted LUNs over NFS-hosted datastores. vCenter selects datastores for heartbeating in the following scenarios: When HA is enabled If a new datastore is added If the accessibility to a datastore changes By default, two datastores are selected. This is the minimum amount of datastores needed. It can be changed using the das.heartbeatDsPerHost parameter under Advanced Settings for up to five datastores. The PRD-CLUSTER Settings dialog box can be used to verify or change the datastores selected for heartbeating, as shown in the following screenshot: It is recommended, however, to let vCenter choose the datastores. Only the datastores that are mounted to more than one host are available in the list. Datastore heartbeating leverages the existing VMFS filesystem locking mechanism. There is a so-called heartbeat region that exists on each datastore and is updated as long as the lock on a file exists. A host updates the datastore's heartbeat region if it has at least one file opened on this volume. HA creates a file for datastore heartbeating purposes only to make sure there is at least one file opened on a volume. Each host creates its own file and HA to be able to determine whether an unresponsive host still has connection to a datastore, and simply checks whether the heartbeat region has been updated or not. By default, an isolation response is triggered after 5 seconds for the master host and after approximately 30 seconds if the host was a slave in vSphere 5. This time difference occurs because of the fact that if the host was a slave, it would need to go through the election process to determine whether there are any other hosts that exist, or whether the master host is simply down. This election starts within 10 seconds after the slave host has lost its heartbeats. If there is no response for 15 seconds, the HA agent on this host elects itself as the master. The isolation response time can be increased using the das.config.fdm.isolationPolicyDelaySec parameter under Advanced Settings. This is, however, not recommended as it increases the downtime when a problem occurs. If a host becomes a master in a cluster with more than one host and has no slaves, it continuously starts checking whether it's in the isolation mode or not. It keeps doing so until it becomes a master with slaves or connects to a master as a slave. At this point, the host will ping its isolation address to determine whether the management network is available again. By default, the isolation address is a gateway configured for the management network. This option can be changed using the das.isolationaddress[X] parameter under Advanced Settings. [X] takes values from 1 to 10 and allows configuration of multiple isolation addresses. Additionally, the das.usedefaultisolationaddress parameter can be used to indicate whether the default gateway address should be used as an isolation address or not. This parameter should be set to False if the default gateway is not configured to respond to the ICMP ping packets. Generally, it's recommended to have one isolation address for each management network. If this network uses redundant paths, the isolation address should always be available under normal circumstances. In certain cases, a host may be isolated, that is, not accessible via the management network but still able to receive election traffic. This host is called partitioned. Have a look at the following figure to gain more insight about this: When multiple hosts are isolated but can still communicate with each other, it's called a network partition. This can happen for various reasons; one of them is when a cluster spans multiple sites over a metropolitan area network. This is called the stretched cluster configuration. When a cluster partition occurs, one subset of hosts is able to communicate with the master while the other is not. Depending on the isolation response selected for VMs, they may be left running or restarted. When a network partition happens, the master election process is initiated within the subset of hosts that loses its connection to the master. This is done to make sure that the host failure or isolation results in appropriate action on the VMs. Therefore, a cluster will have multiple masters; each one in a different partition as long as the partition exists. Once the partition is resolved, the masters are able to communicate with each other and discover the multiplicity of master hosts. Each time this happens, one of them becomes a slave. The hosts' HA state is reported by vCenter through the Summary tab for each host as shown in the following screenshot: This is done under the Hosts tab for cluster objects as shown in the following screenshot: Running (Master) indicates that HA is enabled and the host is a master host. Connected (Slave) means that HA is enabled and the host is a slave host. Only the running VMs are protected by HA. Therefore, the master host monitors the VM's state and once it changes from powered off to powered on, the master adds this VM to the list of protected machines. Virtual machine options Each VM's HA behavior can be adjusted under vSphere HA settings or in the Virtual Machine Options option found in the PRD-CLUSTER Settings page as shown in the following screenshot: Restart priority The restart priority setting determines which VMs will be restarted first after the host failure. The default setting is Medium. Depending on the applications running on a VM, it may need to be restarted before other VMs, for example, if it's a database, a DNS, or a DHCP server. It may be restarted after others if it's not a critical VM. If you select Disabled, this VM will never be restarted if there is a host failure. In other words, HA will be disabled for this VM. Isolation response The isolation response setting defines HA actions against a VM if its host loses connection to the management network but is still running. The default setting is Leave powered on. To be able to understand why this setting is important, imagine the situation where a host loses connection to the management network and at the same time or shortly afterwards, to the storage network as well—a so-called split-brain situation. In vSphere, only one host can have access to a VM at a time. For this purpose, the .vmdk file is locked and there is an additional .lck file present in the same folder where .vmdk file is stored. As HA is enabled, VMs will fail over to another host, however, their original instances will keep running on the old host. Once this host comes out of isolation, we will end up with two copies of VMs. Therefore, the isolated host will not have access to the .vmdk file as it's locked. In vCenter, however, this VM will look as if it is flipping between two hosts. With the default settings, the original host is not able to reacquire the disk locks and will be querying the VM. HA will send a reply instead which allows the host to power off the second running copy. If the Power Off option is selected for a VM under the isolation response settings, this VM will be immediately stopped when isolation occurs. This can cause inconsistency with the filesystem on a virtual drive. However, the advantage of this is that VM restart on another host will happen more quickly, thus reducing the downtime. The Shut down option attempts to gracefully shut down a VM. By default, HA waits for 5 minutes for this to happen. When this time is over, if the VM is not off yet, it will be powered off. This timeout is controlled by the das.isolationshutdowntimeout parameter under the Advanced Settings option. VM must have VMware tools installed to be able to shut down gracefully. Otherwise, the shutdown option is equivalent to power off. VM monitoring Under VM Monitoring, the monitoring settings of individual applications can be adjusted as shown in the following screenshot: The default setting is Disabled. However, VM and Application Monitoring can be enabled so that if the VM heartbeat (VMware tool's heartbeat) or its application heartbeat is lost, the VM is restarted. To avoid false positives, the VM monitoring service also monitors VM's I/O activity. If a heartbeat is lost and there was no I/O activity (by default during the last 2 minutes), VM is considered as unresponsive. This feature allows you to power cycle nonresponsive VMs. I/O interval can be changed under the advanced attribute settings (for more details, check the HA Advanced attributes table later in this section). Monitoring sensitivity can be adjusted as well. Sensitivity means the time interval between the loss of heartbeats and restarting of the VM. The available options are listed in the table from the VMware documentation article available at http://pubs.vmware.com/vsphere-50/index.jsp?topic=%2Fcom.vmware.vsphere.avail.doc_50%2FGUID-62B80D7A-C764-40CB-AE59-752DA6AD78E7.html. To avoid repeating VM resets by default, they will be restarted only three times during the reset period. This can be changed in the Custom mode as shown in the following screenshot: In order to be able to monitor applications within a VM, they need to support VMware application monitoring. Alternatively, you can download the appropriate SDK and set up customized heartbeats for the application that needs to be monitored. Under Advanced Options, the following vSphere HA behaviors can be set. Some of them have already been mentioned in sections of this article. The following screenshot shows the Advanced Options (vSphere HA) window where advanced HA options can be added and set to specific values: vSphere HA advanced options are listed in the article from VMware documentation available at http://pubs.vmware.com/vsphere-50/index.jsp?topic=%2Fcom.vmware.vsphere.avail.doc_50%2FGUID-E0161CB5-BD3F-425F-A7E0-B F83B005FECA.html. This mentioned article also lists the options that are not supported in vCenter 5. You will get an error message if you try to add one of them. Also, the options will be deleted after being upgraded from the previous versions. Admission control Admission control ensures there are sufficient resources available to provide failover protection as and when VM resource reservations are kept. Admission control is available for the following: Hosts Resource pools vSphere HA Admission control can only be disabled for vSphere HA. The following screenshot shows PRD-CLUSTER Settings with the option to disable admission control: Examples of actions that may not be permitted because of insufficient resources are as follows: Power on a VM Migrate a VM to another host, cluster, or resource pool Increase CPU or memory reservation for a VM Admission control policies There are three possible types of admission control policies available for HA configuration that are as follows: Host failure cluster tolerates: When this option is chosen, HA ensures that a specified number of hosts can fail, but sufficient resources will still be available to accommodate all the VMs from these hosts. The decision to either allow or deny an operation is based on the following calculations: Slot size: A hypothetical VM that has the largest amount of memory and CPU that is assigned to an existing VM in the environment. For example, for the following VMs, the slot size will be 4 GHz and 6 GB: VM CPU RAM VM1 4 GHz 2 GB VM2 2 GHz 4 GB VM3 1 GHz 6 GB Host capacity: It gives the number of slots each host can hold based on the resources available for VMs; not the total host memory and CPU. For example, for the previous slot size, the host capacity will be as given in the following table: Host CPU RAM Slots Host1 4 GHz 128 GB 1 Host2 24 GHz 6 GB 1 Host3 8 GHz 14 GB 2 Cluster failover capacity: This gives the number of hosts that can fail before there aren't enough slots left to accommodate all the VMs. For example, for previous hosts with 1 host failure policy, the failover capacity is 2 slots. In case of Host3 failure (host with larger capacity), the cluster is left with only two slots. But if the current failover capacity is less than the allowed limit, admission control disallows the operation. For example, if we are running two VMs and need to power the third one, it will be denied as the cluster capacity is two and it may not be able to accommodate three VMs. This option is probably not the best one for an environment that has VMs with significantly more of resources assigned than the rest of the VMs. The Host failure cluster tolerates option can be used when all cluster hosts are sized pretty much equally. Otherwise, if you use this option, then excessive capacity is reserved such that the cluster tolerates the largest host failure. When this option is used, VM reservations should be kept similar across the cluster as well. Because vCenter uses the slot sizes model to calculate capacity, and the slot size is based on the largest reservation, having VMs with a large reservation will again result in additional unnecessary capacity being reserved. Percentage of cluster resources: With this policy enabled, HA ensures that a specified percentage of resources are reserved for failover across all the hosts. It also checks that there are at least two hosts available. The calculation happens as follows: The total resource requirement for all the running VMs is calculated. For example, for three VMs in the previous table, the total requirement will be 7 GHz and 12 GB. The total available host resources are calculated. For the previous example, the total is 34 GHz and 148 GB. The current CPU and memory failover capacity for the cluster is calculated as follows: CPU: (1-7/34)*100%=79% RAM: (1-12/148)*100%=92% If the current CPU and memory capacity is less than allowed, the operation is denied. With such different hosts from the example, the CPU and RAM capacity should be configured carefully to avoid a situation when, for example, the host with most amount of RAM fails and the other hosts are not able to accommodate all the VMs because of memory resources. Therefore, RAM should be configured at 87 percent based on the two smallest hosts (#2 and #3) and not 30% based on the number of hosts in the environment: [1-(6+14)/148]*100%=87% In other words, if the host with 128 GB fails, we need to make sure that the total resources needed by the VMs are less than the sum of 6 GB and 14 GB, which is only 13 percent of the total cluster's 148 GB. Therefore, we need to make sure that in all instances, the VMs use only 13 percent of the RAM or that the cluster has 87 percent of RAM that is free. Specified failover hosts: With this policy enabled, HA keeps the chosen failover hosts reserved, doesn't allow the powering on or migrating of any VMs to this host, and restarts VMs on this host only when failure occurs. If for some reason, it's not possible to use a designated failover host to restart the VMs, HA will restart them on other available hosts. It is recommended to use the Percentage of cluster resources reserved option in most cases. This option offers more flexibility in terms of host and VM sizing than other options. HA security and logging vSphere HA configuration files for each host are stored on the host's local storage and are protected by the filesystem permissions. These files are only available to the root user. For security reasons, ESXi 5 hosts log HA activity only to syslog. Therefore, logs are placed at a location where syslog is configured to keep them. Log entries related to HA are prepended with fdm, which stands for fault domain manager. This is what the vSphere HA ESX service is called. Older versions of ESXi write HA activity to fdm logfiles in /var/log/vmware/fdm stored on the local disk. There is also an option to enable syslog logging on these hosts. Older ESX hosts are able to save HA activity only in the fdm local logfile in /var/log/vmware/. HA agent logging configuration also depends on the ESX host version. For ESXi 5 hosts, the logging options that can be configured via the Advanced Options tab under HA are listed in the article under the logging section available at http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2033250. The das.config.log.maxFileNum option causes ESXi 5 hosts to maintain two copies of the logfiles: one is a file created by the Version 5 logging mechanism, and the other one is maintained by the pre-5.0 logging mechanism. After any of these options are changed, HA needs to be reconfigured. The following table provides log capacity recommendations according to VMware for environments of different sizes based on the requirement to keep one week of history: Size Minimum log capacity per host in MB 40 VMs in total with 8 VMs per host 4 375 VMs in total with 25 VMs per host 35 1,280 VMs in total with 40 VMs per host 120 3,000 VMs in total with 512 VMs per host 300 These are just recommendations; additional capacity may be needed depending on the environment. Increasing the log capacity involves specifying the number of rotations together with the file size as well as making sure there is enough space on the storage resource where the logfiles are kept. The vCenter server uses the vpxuser account to connect to the HA agents. When HA is enabled for the first time, vCenter creates this account with a random password and makes sure the password is changed periodically. The time period for a password change is controlled by the VirtualCenter.VimPasswordExpirationInDays parameter that can be set under the Advanced Settings option in vCenter. All communication between vCenter and HA agents, as well as agent-to-agent traffic, is secured with SSL. Therefore, for vSphere HA, it's necessary that each host has verified SSL certificates. New certificates require HA to be reconfigured. It will also be reconfigured automatically if a host has been disconnected before the certificate is replaced. SSL certificates are also used to verify election messages so if there is a rogue agent running, it will only be able to affect the host it's running on. This issue, if it occurs, is reported to the administrator. HA uses TCP/8182 and UDP/8182 ports for communication between agents. These ports are opened and closed automatically by the host's firewall. This helps to ensure that these ports are open only when they are needed. Using HA with DRS When vSphere HA restarts VMs on a different host after a failure, the main priority is the immediate availability of VMs. Based on CPU and memory reservations, HA determines which host to use to power the VMs on. This decision is based, of course, on the available capacity of the host. It's quite possible that after all the VMs have been restarted, some hosts become highly loaded while others are relatively lightly loaded. DRS is the load balancing and failover solution that can be enabled in vCenter for better host resource management. vSphere HA, together with DRS, is able to deliver automatic failover and load balancing solutions, which may result in a more balanced cluster. However, there are a few things to consider when it comes to using both features together. In a cluster with DRS, HA, and the admission control enabled; VMs may not be automatically evacuated from a host entering the maintenance mode. This occurs because of resources reserved for VMs that need to be restarted. In this case, the administrator needs to migrate these VMs manually. Some VMs may not fail over because of resource constraints. This can happen in one of the following cases: HA admission control is disabled and DPM is enabled, which may result in insufficient capacity available to perform failover as some hosts may be in the standby mode and therefore, fewer hosts would be available. VM to host affinity rules limit hosts where certain VMs can be placed. Total resources are sufficient but fragmented across multiple hosts. In this case, these resources can't be used by the VMs for failover. DPM is in the manual mode that requires an administrator's confirmation before a host can be powered on from the standby mode. DRS is in the manual mode, and an administrator's confirmation may be needed so that the migration of VMs can be started. What to expect when HA is enabled HA only restarts a VM if there is a host failure. In other words, it will power on all the VMs that were running on a failed host placed on another member of the cluster. Therefore, even with HA enabled, there will still be a short downtime for VMs that are running on faulty hosts. In fast environments, however, VM reboot happens quickly. So if you are using some kind of monitoring system, it may not even trigger an alarm. Therefore, if a bunch of VMs have been rebooted unexpectedly, you know there was an issue with one of the hosts and can review the logs to find out what the issue was. Of course, if you have set up vCenter notifications, you should get an alert. If you need VMs to be up all the time even if the host goes down, there is another feature that can be enabled called Fault Tolerance.
Read more
  • 0
  • 0
  • 2757

article-image-gesture
Packt
25 Oct 2013
5 min read
Save for later

Gesture

Packt
25 Oct 2013
5 min read
(For more resources related to this topic, see here.) Gestures While there are ongoing arguments in the courts of America at the time of writing over who invented the likes of dragging images, it is without a doubt that a key feature of iOS is the ability to use gestures. To put it simply, when you tap the screen to start an app or select a part of an image to enlarge it or anything like that, you are using gestures. A gesture (in terms of iOS) is any touch interaction between the UI and the device. With iOS 6, there are six gestures the user has the ability to use. These gestures, along with brief explanations, have been listed in the following table: Class Name and type Gesture UIPanGesture Recognizer PanGesture; Continuous type Pan images or over-sized views by dragging across the screen UISwipeGesture Recognizer SwipeGesture; Continuous type Similar to panning, except it is a swipe UITapGesture Recognizer TapGesture; Discrete type Tap the screen a number of times (configurable) UILongPressGesture Recognizer LongPressGesture; Discrete type Hold the finger down on the screen UIPinchGesture Recognizer PinchGesture; Continuous type Zoom by pinching an area and moving your fingers in or out UIRotationGesture Recognizer RotationGesture; Continuous type Rotate by moving your fingers in opposite directions Gestures can be added by programming or via Xcode. The available gestures are listed in the following screenshot with the rest of the widgets on the right-hand side of the designer: To add a gesture, drag the gesture you want to use under the view on the View bar (shown in the following screenshot): Design the UI as you want and while pressing the Ctrl key, drag the gesture to what you want to recognize using the gesture. In my example, the object you want to recognize is anywhere on the screen. Once you have connected the gesture to what you want to recognize, you will see the configurable options of the gesture. The Taps field is the number of taps required before the Recognizer is triggered, and the Touches field is the number of points onscreen required to be touched for the Recognizer to be triggered. When you come to connect up the UI, the gesture must also be added. Gesture code When using Xcode, it is simple to code gestures. The class defined in the Xcode design for the tapping gesture is called tapGesture and is used in the following code: private int tapped = 0; public override void ViewDidLoad() { base.ViewDidLoad(); tapGesture.AddTarget(this, new Selector("screenTapped")); View.AddGestureRecognizer(tapGesture); } [Export("screenTapped")]public void SingleTap(UIGestureRecognizer s) { tapped++; lblCounter.Text = tapped.ToString(); } There is nothing really amazing to the code; it just displays how many times the screen has been tapped. The Selector method is called by the code when the tap has been seen. The method name doesn't make any difference as long as the Selector and Export names are the same. Types When the gesture types were originally described, they were given a type. The type reflects the number of messages sent to the Selector method. A discrete one generates a single message. A continuous one generates multiple messages, which requires the Selector method to be more complex. The complexity is added by the Selector method having to check the State of the gesture to decide on what to do with what message and whether it has been completed. Adding a gesture in code It is not a requirement that Xcode be used to add a gesture. To perform the same task in the following code as my preceding code did in Xcode is easy. The code will be as follows: UITapGestureRecognizer t'pGesture = new UITapGestureRecognizer() { NumberOfTapsRequired = 1 }; The rest of the code from AddTarget can then be used. Continuous types The following code, a Pinch Recognizer, shows a simple rescaling. There are a couple of other states that I'll explain after the code. The only difference in the designer code is that I have UIImageView instead of a label and a UIPinchGestureRecognizer class instead of a UITapGestureRecognizer class. public override void ViewDidLoad() { base.ViewDidLoad(); uiImageView.Image =UIImage.FromFile("graphics/image.jpg") Scale(new SizeF(160f, 160f); pinchGesture.AddTarget(this, new Selector("screenTapped")); uiImageView.AddGestureRecognizer(pinchGesture); } [Export("screenTapped")]public void SingleTap(UIGestureRecognizer s) { UIPinchGestureRecognizer pinch = (UIPinchGestureRecognizer)s; float scale = 0f; PointF location; switch(s.State) { case UIGestureRecognizerState.Began: Console.WriteLine("Pinch begun"); location = s.LocationInView(s.View); break; case UIGestureRecognizerState.Changed: Console.WriteLine("Pinch value changed"); scale = pinch.Scale; uiImageView.Image = UIImage FromFile("graphics/image.jpg") Scale(new SizeF(160f, 160f), scale); break; case UIGestureRecognizerState.Cancelled: Console.WriteLine("Pinch cancelled"); uiImageView.Image = UIImage FromFile("graphics/image.jpg") Scale(new SizeF(160f, 160f)); scale = 0f; break; case UIGestureRecognizerState.Recognized: Console.WriteLine("Pinch recognized"); break; } } Other UIGestureRecognizerState values The following table gives a list of other Recognizer states: State Description Notes Possible Default state; gesture hasn't been recognized Used by all gestures Failed Gesture failed No messages sent for this state Translation Direction of pan Used in the pan gesture Velocity Speed of pan Used in the pan gesture In addition to these, it should be noted that discrete types only use Possible and Recognized states. Summary Gestures certainly can add a lot to your apps. They can enable the user to speed around an image, move about a map, enlarge and reduce, as well as select areas of anything on a view. Their flexibility underpins why iOS is recognized as being an extremely versatile device for users to manipulate images, video, and anything else on-screen. Resources for Article: Further resources on this subject: Creating and configuring a basic mobile application [Article] So, what is XenMobile? [Article] Building HTML5 Pages from Scratch [Article]
Read more
  • 0
  • 0
  • 7402

article-image-authenticating-your-application-devise
Packt
25 Oct 2013
11 min read
Save for later

Authenticating Your Application with Devise

Packt
25 Oct 2013
11 min read
(For more resources related to this topic, see here.) Signing in using authentication other than e-mails By default, Devise only allows e-mails to be used for authentication. For some people, this condition will lead to the question, "What if I want to use some other field besides e-mail? Does Devise allow that?" The answer is yes; Devise allows other attributes to be used to perform the sign-in process. For example, I will use username as a replacement for e-mail, and you can change it later with whatever you like, including userlogin, adminlogin, and so on. We are going to start by modifying our user model. Create a migration file by executing the following command inside your project folder: $ rails generate migration add_username_to_users username:string This command will produce a file, which is depicted by the following screenshot: The generated migration file Execute the migrate (rake db:migrate) command to alter your users table, and it will add a new column named username. You need to open the Devise's main configuration file at config/initializers/devise.rb and modify the code: config.authentication_keys = [:username] config.case_insensitive_keys = [:username] config.strip_whitespace_keys = [:username] You have done enough modification to your Devise configuration, and now you have to modify the Devise views to add a username field to your sign-in and sign-up pages. By default, Devise loads its views from its gemset code. The only way to modify the Devise views is to generate copies of its views. This action will automatically override its default views. To do this, you can execute the following command: $ rails generate devise:views It will generate some files, which are shown in the following screenshot: Devise views files As I have previously mentioned, these files can be used to customize another view. But we are going to talk about it a little later in this article. Now, you have the views and you can modify some files to insert the username field. These files are listed as follows: app/views/devise/sessions/new.html.erb: This is a view file for the sign-up page. Basically, all you need to do is change the email field into the username field. #app/views/devise/sessions/new.html.erb <h2>Sign in</h2> <%= notice %> <%= alert %> <%= form_for(resource, :as => resource_name, :url => session_path (resource_name)) do |f| %> <div><%= f.label :username %><br /> <%= f.text_field :username, :autofocus => true %><div> <div><%= f.label :password %><br /> <%= f.password_field :password %></div> <% if devise_mapping.rememberable? -%> <div><%= f.check_box :remember_me %> <%= f.label :remember_me %></div> <% end -%> <div><%= f.submit "Sign in" %></div> <% end %> %= render "devise/shared/links" %> You are now allowed to sign in with your username. The modification will be shown, as depicted in the following screenshot: The sign-in page with username app/views/devise/registrations/new.html.erb: This file is a view file for the registration page. It is a bit different from the sign-up page; in this file, you need to add the username field, so that the user can fill in their username when they perform the registration. #app/views/devise/registrations/new.html.erb <h2>Sign Up</h2> <%= form_for() do |f| %> <%= devise_error_messages! %> <div><%= f.label :email %><br /> <%= f.email_field :email, :autofocus => true %></div> <div><%= f.label :username %><br /> <%= f.text_field :username %></div> <div><%= f.label :password %><br /> <%= f.password_field :password %></div> <div><%= f.label :password_confirmation %><br /> <%= f.password_field :password_confirmation %></div> <div><%= f.submit "Sign up" %></div> <% end %> <%= render "devise/shared/links" %> Especially for registration, you need to perform extra modifications. Mass assignment rules written in the app/controller/application_controller.rb file, and now, we are going to modify them a little. Add username to the sanitizer for sign-in and sign-up, and you will have something as follows: #these codes are written inside configure_permitted_parameters function devise_parameter_sanitizer.for(:sign_in) {|u| u.permit(:email, :username )} devise_parameter_sanitizer.for(:sign_up) {|u| u.permit(:email, :username, :password, :password_confirmation)} These changes will allow you to perform a sign-up along with the username data. The result of the preceding example is shown in the following screenshot: The sign-up page with username I want to add a new case for your sign-in, which is only one field for username and e-mail. This means that you can sign in either with your e-mail ID or username like in Twitter's sign-in form. Based on what we have done before, you already have username and email columns; now, open /app/models/user.rb and add the following line: attr_accessor :signin Next, you need to change the authentication keys for Devise. Open /config/initializers/devise.rb and change the value for config.authentication_keys, as shown in the following code snippet: config.authentication_keys = [ :signin ] Let's go back to our user model. You have to override the lookup function that Devise uses when performing a sign-in. To do this, add the following method inside your model class: def self.find_first_by_auth_conditions(warden_conditions) conditions = warden_conditions.dup where(conditions).where(["lower(username) = :value OR lower(email) = :value", { :value => signin.downcase }]).first end As an addition, you can add a validation for your username, so it will be case insensitive. Add the following validation code into your user model: validates :username, :uniqueness => {:case_sensitive => false} Please open /app/controller/application_controller.rb and make sure you have this code to perform parameter filtering: before_filter :configure_permitted_parameters, if: :devise_controller? protected def configure_permitted_parameters devise_parameter_sanitizer.for(:sign_in) {|u| u.permit(:signin)} devise_parameter_sanitizer.for(:sign_up) {|u| u.permit(:email, : username, :password, :password_confirmation)} end We're almost there! Currently, I assume that you've already stored an account that contains the e-mail ID and username. So, you just need to make a simple change in your sign-in view file (/app/views/devise/sessions/new.html.erb). Make sure that the file contains this code: <h2>Sign in</h2> <%= notice %> <%= alert %> <%= form_for(resource, :as => resource_name, :url => session_path (resource_name)) do |f| %> <div><%= f.label "Username or Email" %><br /> <%= f.text_field :signin, :autofocus => true %></div> <div><%= f.label :password %><br /> <%= f.password_field :password %></div> <% if devise_mapping.rememberable? -%> <div><%= f.check_box :remember_me %> <%= f.label :remember_me %> </div> <% end -%> <div><%= f.submit "Sign in" %></div> <% end %> <%= render "devise/shared/links" %> You can see that you don't have a username or email field anymore. The field is now replaced by a single field named :signin that will accept either the e-mail ID or the username. It's efficient, isn't it? Updating the user account Basically, you are already allowed to access your user account when you activate the registerable module in the model. To access the page, you need to log in first and then go to /users/edit. The page is as shown in the following screenshot: The edit account page But, what if you want to edit your username or e-mail ID? How will you do that? What if you have extra information in your users table, such as addresses, birth dates, bios, and passwords as well? How will you edit these? Let me show you how to edit your user data including your password, or edit your user data without editing your password. Editing your data, including the password: To perform this action, the first thing that you need to do is modify your view. Your view should contain the following code: <div><%= f.label :username %><br /> <%= f.text_field :username %></div> Now, we are going to overwrite Devise's logic. To do this, you have to create a new controller named registrations_controller. Please use the rails command to generate the controller, as shown: $ rails generate controller registrations update It will produce a file located at app/controllers/. Open the file and make sure you write this code within the controller class: class RegistrationsController < Devise::RegistrationsController def update new_params = params.require(:user).permit(:email,:username, : current_password, :password,:password_confirmation) @user = User.find(current_user.id) if @user.update_with_password(new_params) set_flash_message :notice, :updated sign_in @user, :bypass => true redirect_to after_update_path_for(@user) else render "edit" end end end Let's look at the code. Currently, Rails 4 has a new method in organizing whitelist attributes. Therefore, before performing mass assignment attributes, you have to prepare your data. This is done in the first line of the update method. Now, if you see the code, there's a method defined by Devise named update_with_password. This method will use mass assignment attributes with the provided data. Since we have prepared it before we used it, it will be fine. Next, you have to edit your route file a bit. You should modify the rule defined by Devise, so instead of using the original controller, Devise will use the controller you created before. The modification should look as follows: devise_for :users, :controllers => {:registrations => "registrations"} Now you have modified the original user edit page, and it will be a little different. You can turn on your Rails server and see it in action. The view is as depicted in the following screenshot: The modified account edit page Now, try filling up these fields one by one. If you are filling them with different values, you will be updating all the data (e-mail, username, and password), and this sounds dangerous. You can modify the controller to have better data update security, and it all depends on your application's workflows and rules. Editing your data, excluding the password: Actually, you already have what it takes to update data without changing your password. All you need to do is modify your registrations_controller.rb file. Your update function should be as follows: class RegistrationsController < Devise::RegistrationsController def update new_params = params.require(:user).permit(:email,:username, : current_password, :password,:password_confirmation) change_password = true if params[:user][:password].blank? params[:user].delete("password") params[:user].delete("password_confirmation") new_params = params.require(:user).permit(:email,:username) change_password = false end @user = User.find(current_user.id) is_valid = false if change_password is_valid = @user.update_with_password(new_params) else @user.update_without_password(new_params) end if is_valid set_flash_message :notice, :updated sign_in @user, :bypass => true redirect_to after_update_path_for(@user) else render "edit" end end end The main difference from the previous code is now you have an algorithm that will check whether the user intends to update your data with their password or not. If not, the code will call the update_without_password method. Now, you have codes that allow you to edit with/without a password. Now, refresh your browser and try editing with or without a password. It won't be a problem anymore. Summary Now, I believe that you will be able to make your own Rails application with Devise. You should be able to make your own customizations based on your needs. Resources for Article: Further resources on this subject: Integrating typeahead.js into WordPress and Ruby on Rails [Article] Facebook Application Development with Ruby on Rails [Article] Designing and Creating Database Tables in Ruby on Rails [Article]
Read more
  • 0
  • 0
  • 10681
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-social-engineer-toolkit
Packt
25 Oct 2013
11 min read
Save for later

Social-Engineer Toolkit

Packt
25 Oct 2013
11 min read
(For more resources related to this topic, see here.) Social engineering is an act of manipulating people to perform actions that they don't intend to do. A cyber-based, socially engineered scenario is designed to trap a user into performing activities that can lead to the theft of confidential information or some malicious activity. The reason for the rapid growth of social engineering amongst hackers is that it is difficult to break the security of a platform, but it is far easier to trick the user of that platform into performing unintentional malicious activity. For example, it is difficult to break the security of Gmail in order to steal someone's password, but it is easy to create a socially engineered scenario where the victim can be tricked to reveal his/her login information by sending a fake login/phishing page. The Social-Engineer Toolkit is designed to perform such tricking activities. Just like we have exploits and vulnerabilities for existing software and operating systems, SET is a generic exploit of humans in order to break their own conscious security. It is an official toolkit available at https://www.trustedsec.com/, and it comes as a default installation with BackTrack 5. In this article, we will analyze the aspect of this tool and how it adds more power to the Metasploit framework. We will mainly focus on creating attack vectors and managing the configuration file, which is considered the heart of SET. So, let's dive deeper into the world of social engineering. Getting started with the Social-Engineer Toolkit (SET) Let's start our introductory recipe about SET, where we will be discussing SET on different platforms. Getting ready SET can be downloaded for different platforms from its official website: https://www.trustedsec.com/. It has both the GUI version, which runs through the browser, and the command-line version, which can be executed from the terminal. It comes pre-installed in BackTrack, which will be our platform for discussion in this article. How to do it... To launch SET on BackTrack, start the terminal window and pass the following path: root@bt:~# cd /pentest/exploits/set root@bt:/pentest/exploits/set# ./set Copyright 2012, The Social-Engineer Toolkit (SET) All rights reserved. Select from the menu: If you are using SET for the first time, you can update the toolkit to get the latest modules and fix known bugs. To start the updating process, we will pass the svn update command. Once the toolkit is updated, it is ready for use. The GUI version of SET can be accessed by navigating to Applications | BackTrack | Exploitation tools | Social-Engineer Toolkit. How it works... SET is a Python-based automation tool that creates a menu-driven application for us. Faster execution and the versatility of Python make it the preferred language for developing modular tools, such as SET. It also makes it easy to integrate the toolkit with web servers. Any open source HTTP server can be used to access the browser version of SET. Apache is typically considered the preferable server while working with SET. There's more... Sometimes, you may have an issue upgrading to the new release of SET in BackTrack 5 R3. Try out the following steps: You should remove the old SET using the following command: dpkg –r set We can remove SET in two ways. First, we can trace the path to /pentest/exploits/set, making sure we are in the directory and then opt for the 'rm' command for removing all files present there. Or, we can use the method shown previously. Then, for reinstallation, we can download its clone using the following command: Git clone https://github.com/trustedsec/social-engineer-toolkit /set Working with the SET config file In this recipe, we will take a close look at the SET config file, which contains default values for different parameters that are used by the toolkit. The default configuration works fine with most of the attacks, but there can be situations when you have to modify the settings according to the scenario and requirements. So, let's see what configuration settings are available in the config file. Getting ready To launch the config file, move to the config file and open the set_config file: root@bt:/pentest/exploits/set# nano config/set_config The configuration file will be launched with some introductory statements, as shown in the following screenshot: How to do it... Let's go through it step-by-step: First, we will see what configuration settings are available for us: # DEFINE THE PATH TO METASPLOIT HERE, FOR EXAMPLE /pentest/exploits/framework3 METASPLOIT_PATH=/pentest/exploits/framework3 The first configuration setting is related to the Metasploit installation directory. Metasploit is required by SET for proper functioning, as it picks up payloads and exploits from the framework: # SPECIFY WHAT INTERFACE YOU WANT ETTERCAP TO LISTEN ON, IF NOTHING WILL DEFAULT # EXAMPLE: ETTERCAP_INTERFACE=wlan0 ETTERCAP_INTERFACE=eth0 # # ETTERCAP HOME DIRECTORY (NEEDED FOR DNS_SPOOF) ETTERCAP_PATH=/usr/share/ettercap Ettercap is a multipurpose sniffer for switched LAN. Ettercap section can be used to perform LAN attacks like DNS poisoning, spoofing etc. The above SET setting can be used to either set ettercap ON of OFF depending upon the usability. # SENDMAIL ON OR OFF FOR SPOOFING EMAIL ADDRESSES SENDMAIL=OFF The sendmail e-mail server is primarily used for e-mail spoofing. This attack will work only if the target's e-mail server does not implement reverse lookup. By default, its value is set to OFF. The following setting shows one of the most widely used attack vectors of SET. This configuration will allow you to sign a malicious Java applet with your name or with any fake name, and then it can be used to perform a browser-based Java applet infection attack: # CREATE SELF-SIGNED JAVA APPLETS AND SPOOF PUBLISHER NOTE THIS REQUIRES YOU TO # INSTALL ---> JAVA 6 JDK, BT4 OR UBUNTU USERS: apt-get install openjdk-6-jdk # IF THIS IS NOT INSTALLED IT WILL NOT WORK. CAN ALSO DO apt-get install sun-java6-jdk SELF_SIGNED_APPLET=OFF We will discuss this attack vector in detail in a later recipe, that is, the Spear phishing attack vector . This attack vector will also require JDK to be installed on your system. Let's set its value to ON, as we will be discussing this attack in detail: SELF_SIGNED_APPLET=ON # AUTODETECTION OF IP ADDRESS INTERFACE UTILIZING GOOGLE, SET THIS ON IF YOU WANT # SET TO AUTODETECT YOUR INTERFACE AUTO_DETECT=ON The AUTO_DETECT flag is used by SET to auto-discover the network settings. It will enable SET to detect your IP address if you are using NAT/Port forwarding, and it allows you to connect to the external Internet. The following setting is used to set up the Apache web server to perform web-based attack vectors. It is always preferred to set it to ON for better attack performance: # USE APACHE INSTEAD OF STANDARD PYTHON WEB SERVERS, THIS WILL INCREASE SPEED OF # THE ATTACK VECTOR APACHE_SERVER=OFF # # PATH TO THE APACHE WEBROOT APACHE_DIRECTORY=/var/www The following setting is used to set up the SSL certificate while performing web attacks. Several bugs and issues have been reported for the WEBATTACK_SSL setting of SET. So, it is recommended to keep this flag OFF: # TURN ON SSL CERTIFICATES FOR SET SECURE COMMUNICATIONS THROUGH WEB_ATTACK VECTOR WEBATTACK_SSL=OFF The following setting can be used to build a self-signed certificate for web attacks, but there will be a warning message saying Untrusted certificate. Hence, it is recommended to use this option wisely to avoid alerting the target user: # PATH TO THE PEM FILE TO UTILIZE CERTIFICATES WITH THE WEB ATTACK VECTOR (REQUIRED) # YOU CAN CREATE YOUR OWN UTILIZING SET, JUST TURN ON SELF_SIGNED_CERT # IF YOUR USING THIS FLAG, ENSURE OPENSSL IS INSTALLED! # SELF_SIGNED_CERT=OFF The following setting is used to enable or disable the Metasploit listener once the attack is executed: # DISABLES AUTOMATIC LISTENER - TURN THIS OFF IF YOU DON'T WANT A METASPLOIT LISTENER IN THE BACKGROUND. AUTOMATIC_LISTENER=ON The following configuration will allow you to use SET as a standalone toolkit without using Metasploit functionalities, but it is always recommended to use Metasploit along with SET in order to increase the penetration testing performance: # THIS WILL DISABLE THE FUNCTIONALITY IF METASPLOIT IS NOT INSTALLED AND YOU JUST WANT TO USE SETOOLKIT OR RATTE FOR PAYLOADS # OR THE OTHER ATTACK VECTORS. METASPLOIT_MODE=ON These are a few important configuration settings available for SET. Proper knowledge of the config file is essential to gain full control over the SET. How it works... The SET config file is the heart of the toolkit, as it contains the default values that SET will pick while performing various attack vectors. A misconfigured SET file can lead to errors during the operation, so it is essential to understand the details defined in the config file in order to get the best results. The How to do it... section clearly reflects the ease with which we can understand and manage the config file. Working with the spear-phishing attack vector A spear-phishing attack vector is an e-mail attack scenario that is used to send malicious mails to target/specific user(s). In order to spoof your own e-mail address, you will require a sendmail server. Change the config setting to SENDMAIL=ON. If you do not have sendmail installed on your machine, then it can be downloaded by entering the following command: root@bt:~# apt-get install sendmail Reading package lists... Done Getting ready Before we move ahead with a phishing attack, it is imperative for us to know how the e-mail system works. Recipient e-mail servers, in order to mitigate these types of attacks, deploy gray-listing, SPF records validation, RBL verification, and content verification. These verification processes ensure that a particular e-mail arrived from the same e-mail server as its domain. For example, if a spoofed e-mail address, <richyrich@gmail.com>, arrives from the IP 202.145.34.23, it will be marked as malicious, as this IP address does not belong to Gmail. Hence, in order to bypass these security measures, the attacker should ensure that the server IP is not present in the RBL/SURL list. As the spear-phishing attack relies heavily on user perception, the attacker should conduct a recon of the content that is being sent and should ensure that the content looks as legitimate as possible. Spear-phishing attacks are of two types—web-based content and payload-based content. How to do it... The spear-phishing module has three different attack vectors at our disposal: Let's analyze each of them. Passing the first option will start our mass-mailing attack. The attack vector starts with selecting a payload. You can select any vulnerability from the list of available Metasploit exploit modules. Then, we will be prompted to select a handler that can connect back to the attacker. The options will include setting the vnc server or executing the payload and starting the command line, and so on. The next few steps will be starting the sendmail server, setting a template for a malicious file format, and selecting a single or mass-mail attack: Finally, you will be prompted to either choose a known mail service, such as Gmail or Yahoo, or use your own server: 1. Use a gmail Account for your email attack. 2. Use your own server or open relay set:phishing>1 set:phishing> From address (ex: moo@example.com):bigmoney@gmail.com set:phishing> Flag this message/s as high priority? [yes|no]:y Setting up your own server cannot be very reliable, as most of the mail services follow a reverse lookup to make sure that the e-mail has generated from the same domain name as the address name. Let's analyze another attack vector of spear-fishing. Creating a file format payload is another attack vector in which we can generate a file format with a known vulnerability and send it via e-mail to attack our target. It is preferred to use MS Word-based vulnerabilities, as they are difficult to detect whether they are malicious or not, so they can be sent as an attachment via an e-mail: set:phishing> Setup a listener [yes|no]:y [-] *** [-] * WARNING: Database support has been disabled [-] *** At last, we will be prompted on whether we want to set up a listener or not. The Metasploit listener will begin and we will wait for the user to open the malicious file and connect back to the attacking system. The success of e-mail attacks depends on the e-mail client that we are targeting. So, a proper analysis of this attack vector is essential. How it works... As discussed earlier, the spear-phishing attack vector is a social engineering attack vector that targets specific users. An e-mail is sent from the attacking machine to the target user(s). The e-mail will contain a malicious attachment, which will exploit a known vulnerability on the target machine and provide a shell connectivity to the attacker. The SET automates the entire process. The major role that social engineering plays here is setting up a scenario that looks completely legitimate to the target, fooling the target into downloading the malicious file and executing it.
Read more
  • 0
  • 0
  • 17989

Packt
25 Oct 2013
5 min read
Save for later

The solvers – these great unknown

Packt
25 Oct 2013
5 min read
(For more resources related to this topic, see here.) Variable-step versus fixed-step solvers A variable-step solver will choose to use a shorter time step when the signal derivative is changing faster in order to obtain a better approximation, while it will use a longer step when the result changes at a slower pace. The most used variable-step solver is the ode45 solver (this is the default Simulink solver as well). A fixed-step solver will, on the other hand, always use the same step size. The simplest is the ode1 solver (the famous Euler method). We've already run our example with a fixed-step solver. Let's change this to a variable-step solver by opening the Model Configuration Parameters window and setting the ode45 solver. We now have the option to configure more Solver parameters; we'll allow it to use a Max step size of 1 second and a Min step size of 0.2 seconds, setting other parameters to the default value. Running the simulation now will give us the following result: We can see that the step size is indeed variable: the second value has been computed after only 0.2 seconds, and then the step size grew bigger and stayed at 1 second. We got a good approximation in the problematic region while using longer steps in the almost constant region, thus achieving a faster simulation time than a fixed-step solver with a step size of 0.2 seconds. But how does the solver determine when it's necessary to reduce the step size? The answer is by looking at the tolerances defined in the Model Configuration Parameters window. Relative tolerance defines the maximum error as a percentage of the actual value. The default value (1e-3) means that an error of 0.1 percent is accepted. Absolute tolerance defines the maximum error as a scalar value. When it is set to auto, Simulink will assume 1e-6 initially (that is, 0.000001), then change it to the maximum error calculated with the relative tolerance. The Shape preservation option, off ( Disable all ) by default, helps to increase the accuracy when the solution's derivative varies rapidly at the cost of a more computing-intensive simulation (thus increasing the overall time to compute the result). Finally, there's the option Number of consecutive min steps , which defines how many times the solver can use a time step smaller than the minimum step size violations defined before issuing a warning or an error. Let's set the Relative tolerance parameter to 1e-5 and run the simulation. We'll obtain this result: This time, the solver chose to use more shorter steps while the signal was changing to improve the accuracy, thanks to the relative tolerance we've set; then the step size grew gradually back to 1 second. Variable-step solvers give us a good trade-off between result approximation and overall simulation times in models, whereas fixed-step solvers are too computationally intensive to be used efficiently. Fixed-step solvers are required for code generation and real-time control purposes because Simulink can't go back in time to give a more accurate result. Continuous versus discrete A system can be continuous or discrete based on the presence of integrators or derivatives in the model (these are called continuous states). In other words, if the system is represented by the means of one or more differential equations, the system is continuous. Our cruise controller is continuous because it has an integral component. The continuous solver can choose to perform several iteration cycles in a single time step to reach the best possible approximation of the final result. Such solvers fall into the ODE ( Ordinary Differential Equation solvers) category. If a system is discrete, there are no differential equations but only memories and mathematical operands. A discrete solver is unable to perform the numerical integration required by continuous states. The continuous system requires continuous solvers, while Simulink will switch to a discrete solver for discrete systems even if a continuous solver has been specified. Try to set a discrete solver in our example system and run a simulation. An error window will appear, telling us that a discrete solver can't be used because the model contains continuous states. Stiff versus nonstiff In a stiff continuous system, certain solvers are unable to compute the result unless the step size is small enough; thus, a stiff solver is needed. These systems can be identified by an overall slowly changing solution that can vary rapidly in a very small time period. Our example is mildly stiff: the ode45 solver had to use a small step size to compute the result close to t = 0, but it managed to get the job done by changing the time step. A more appropriate choice would have been the ode23t solver (moderately stiff), which would have given the following result: Let's choose another nonstiff solver, the fixed-step ode1 , implementing the Euler method. Set the Fixed step size parameter to 1.5 and run the simulation: That's interesting: Using the Euler method with a long time step shows the result to be oscillating around the mathematical result, but the solution is still convergent. Setting the step size to 2 seconds will show that it can't converge anymore, and any step size higher than 2 seconds will make the simulation diverge from the correct result and go towards infinite oscillation. The following figure shows the result with a step size of 2.1 and a simulation time of 20 seconds: The following article presents stiff solvers in a way that is easy to understand: http://blogs.mathworks.com/seth/2012/07/03/why-do-we-need-stiff-ode-solvers/. The documentation center, accessible by pressing F1 , has detailed information about the solver types and parameters. This information can be found by navigating to Simulink | Simulation | Configure simulation . Summary In this article, we've learned how solvers work and how to choose the right solver to run the simulations Resources for Article: Further resources on this subject: 2-Dimensional Image Filtering [Article] GoboLinux: An Interview with Lucas Villa Real [Article] The NGINX HTTP Server [Article]
Read more
  • 0
  • 0
  • 23587

article-image-managing-test-structure-robot-framework
Packt
25 Oct 2013
5 min read
Save for later

Managing Test Structure with Robot Framework

Packt
25 Oct 2013
5 min read
(For more resources related to this topic, see here.) To manage the increased number of tests and to ensure their execution and quick and effective analysis, robot framework can be used which is a tool written in python which can be used to write different tests that may call other testing tools/software which perform actual tests and report back to the test framework. You should use robot framework because of various advantages that it offers, some of which is detailed in the following section: Uniformity XTest compliant Library support Flexible Distributive Open Uniformity Robot framework tests are uniform and readable from the end user's perspective and even if different technologies are involved in different tests for the same software, the test structure (since it is written in robot framework) for all the tests is consistent across the test project. XTest Compliant Robot framework is also xtest compliant, which means that it follows the same set of testing jargon and process as in other popular testing tools like JUnit, NUnit, and so on; also the developer associated with writing tests in robot framework has the least of surprises. Library Support Robot framework is supported by a vide variety of libraries for third party tools and given its robust community, it is easy to create a custom library as well. Flexible Not only the tests are uniform, but they may also be written according to the needs of the testing project. It is not uncommon to extend robot framework using keywords and associate tests to various keywords. This reduces complexity and brings about changes in writing tests. Test writing strategies like simple, keyword driven or behavior driven tests can be created and the same test can be written in many interesting ways. Distributive Robot framework is network friendly tool, and can also be used remotely to collect and share information. It also works with various network protocols straight out of box and presents advantages of a mature tool that can communicate and control tests present remotely. Open Owing to the open source nature(licensed under Apache version 2.0) of the robot framework, it is possible to see how it has been created and not get limited to its binary. For large scale customization, the source can directly be tweaked to perform work as desired. Environments Being a python based tool, it can work in python and python based environments such as jyton or ironpython which ensures that tests can be executed for newer application platforms like the java and .Net natively. The installations of various environments can be obtained directly or by building the source code . As mentioned by the build script, by using appropriate python instance to call the install.py, robot framework can be installed in different environments where the different executable are generated pertaining to the environment. For python environment, following executable are generated: pybot and rebot. For java environment, following executable are generated: jybot and jyrebot. For .net environment, following executable are generated: ipybot and ipyrebot. Note that in all three environments, the test structure and syntax is the same. What differs is the target applications which could be clr/jvm dependant and running robot framework on the same environment offers it the flexibility to access the software under test (SUT) more clearly. Test Naming Conventions In order to name tests, robot framework is very peculiar; it uses the configuration file and folder names to determine the execution order and test naming. For example, consider the following test hierarchy: application/ tests/ Test1.txt Other tests/ Another test.txt Running the pybot in the application folder will result in the following: The testsuite will be named after the root of test configuration, in this case, tests. Next, the Test1.txt will result in a nested testsuite, Test1 within testsuite tests. Similarly the testsuite another test will be nested under the tests/other tests testsuites. In order to organize the results, it is important to ensure proper naming. So, instead of using spaces, underscore '_' should be used. In order to get proper ordering irrespective of the test names, prefixing the test configuration with numbers can be done. This test hierarchy will therefore result in more understandable tests as well as the execution order of the tests can easily be predetermined. As mentioned, the test hierarchy is similar to any other XUnit based testing solution. Here, note that you can go on inserting test suites with themselves, which is different from other testing tool hierarchies. This approach gives more flexibility over the tests and introduces a new level of nesting. Running tests Running tests in robot framework is also very descriptive and the results of the tests can instantly be seen. In a similar manner, the results of the test can be viewed not only in the form of a detailed report, but also in the form of a log. The test results are also saved in an XML format where the tests can be recreated/merged/compared with other tests at a later date. This is achieved through the rebot command which can recreate the test structure in the generated report and log files through reading the metadata present in the XML file. Summary Through this article, you can clearly see the benefits of grouping and structuring the tests using robot framework and allowing for scalability in the tests by making the tests encapsulated within one another through the use of nested test suites. Robot Framework will not only integrated with your testing tool/solution, but also act as an assistant for your tests and manage, run and report their results back to you in a concise manner. Resources for Article: Further resources on this subject: Cross-browser-distributed testing [Article] Testing Backbone.js Application [Article] Python Testing: Installing the Robot Framework [Article]
Read more
  • 0
  • 0
  • 8155

article-image-scratching-surface-zend-framework-2
Packt
25 Oct 2013
11 min read
Save for later

Scratching the Surface of Zend Framework 2

Packt
25 Oct 2013
11 min read
Bootstrap your app There are two ways to bootstrap your ZF2 app. The default is less flexible but handles the entire configuration, and the manual is really flexible but you have to take care of everything. The goal of the bootstrap is to provide to the application, ZendMvcApplication, with all the components and dependencies needed to successfully handle a request. A Zend Framework 2 application relies on the following six components: Configuration array ServiceManager instance EventManager instance ModuleManager instance Request object Response object As these are the pillars of a ZF2 application, we will take a look at how these components are configured to bootstrap the app. To begin with, we will see how the components interact from a high perspective and then we will jump into details of how each one works. When a new request arrives to our application, ZF2 needs to set up the environment to be able to fulfill it. This process implies reading configuration files and creating the required objects and services; attach them to the events that are going to be used and finally create the request object based on the request data. Once we have the request object, ZF2 will tell the router to do his job and will inspect the request object to determine who is responsible for processing the data. Once a controller and action has been identified as the one in charge of the request, ZF2 dispatches it and gives the controller/action the control of the program in order to execute the code that will interpret the request and will do something with it. This can be from accepting an uploaded image to showing a sign-up form and also changing data on an external database. When the controller processes the data, sometimes a view object is generated to encapsulate the data that we should send to the client who made the request, and a response object is created. After we have a response object, ZF2 sends it to the browser and the request ends. Now that we have seen a very simple overview of the lifecycle of a request we will jump into the details of how each object works, the options available and some examples of each one. Configuration array Let's dissect the first component of the list by taking a look at the index.php file: chdir(dirname(__DIR__)); // Setup autoloading require 'init_autoloader.php'; // Run the application! ZendMvcApplication::init(require 'config/application.config.php')->run(); As you can see we only do three things. The first thing is we change the current folder for the convenience of making everything relative to the root folder. Then we require the autoloader file; we will examine this file later. Finally, we initialize a ZendMvcApplication object by passing a configuration file and only then does the run method get called. The configuration file looks like the following code snippet: return array( 'modules' => array( 'Application', ), 'module_listener_options' => array( 'config_glob_paths' => array( 'config/autoload/{,*.}{global,local}.php', ), 'module_paths' => array( './module', './vendor', ), ), ); This file will return an array containing the configuration options for the application. Two options are used: modules and module_listener_options. As ZF2 uses a module organization approach, we should add the modules that we want to use on the application here. The second option we are using is passed as configuration to the ModuleManager object. The config_glob_path array is used when scanning the folders in search of config files and the module_paths array is used to tell ModuleManager a set of paths where the module resides. ZF2 uses a module approach to organize files. A module can contain almost anything, simple PHP files, view scripts, images, CSS, JavaScript, and so on. This approach will allow us to build reusable blocks of functionality and we will adhere to this while developing our project. PSR-0 and autoloaders Before continuing with the key components, let's take a closer look at the init_autoloader.php file used in the index.php file. As is stated on the first block comment, this file is more complicated than it's supposed to be. This is because ZF2 will try to set up different loading mechanisms and configurations. if (file_exists('vendor/autoload.php')) { $loader = include 'vendor/autoload.php'; } The first thing is to check if there is an autoload.php file inside the vendor folder; if it's found, we will load it. This is because the user might be using composer, in which case composer will provide a PSR-0 class loader. Also, this will register the namespaces defined by composer on the loader. PSR-0 is an autoloading standard proposed by the PHP Framework Interop Group (http://www.php-fig.org/) that describes the mandatory requirements for autoloader interoperability between frameworks. Zend Framework 2 is one of the projects that adheres to it. if (getenv('ZF2_PATH')) { $zf2Path = getenv('ZF2_PATH'); } elseif (get_cfg_var('zf2_path')) { $zf2Path = get_cfg_var('zf2_path'); } elseif (is_dir('vendor/ZF2/library')) { $zf2Path = 'vendor/ZF2/library'; } In the next section we will try to get the path of the ZF2 files from different sources. We will first try to get it from the environment, if not, we'll try from a directive value in the php.ini file. Finally, if the previous methods fail the code, we will try to check whether a specific folder exists inside the vendor folder. if ($zf2Path) { if (isset($loader)) { $loader->add('Zend', $zf2Path); } else { include $zf2Path . '/Zend/Loader/AutoloaderFactory.php'; ZendLoaderAutoloaderFactory::factory(array( 'ZendLoaderStandardAutoloader' => array( 'autoregister_zf' => true ) )); } } Finally, if the framework is found by any of these methods, based on the existence of the composer autoloader, the code will just add the Zend namespace or will instantiate an internal autoloader, ZendLoaderAutoloader, and use it as a default. As you can see, there are multiple ways to set up the autoloading mechanism on ZF2 and at the end what matters is which one you prefer, as all of them in essence will behave the same. ServiceManager After all this execution of code, we arrive at the last section of the index.php file where we actually instantiate the ZendMvcApplication object. As we said, there are two methods of creating an instance of ZendMvcApplication. In the default approach, we call the static method init of the class by passing an optional configuration as the first parameter. This method will take care of instantiating a new ServiceManager object, storing the configuration inside, loading the modules specified in the configuration, and getting a configured ZendMvcApplication. ServiceManager is a service/object locator that implements the Service Locator design pattern; its responsibility is to retrieve other objects. $serviceManager = new ServiceManager( new ServiceServiceManagerConfig($smConfig) ); $serviceManager->setService('ApplicationConfig', $configuration); $serviceManager->get('ModuleManager')->loadModules(); return $serviceManager->get('Application')->bootstrap(); As you can see, the init method calls the bootstrap() method of the ZendMvcApplication instance. Service Locator is a design pattern used in software development to encapsulate the process of obtaining other objects. The concept is based on a central repository that stores the objects and also knows how to create them if required. EventManager This component is designed to provide multiple functionalities. It can be used to implement simple observer patterns, and also can be used to do aspect-oriented design or even create event-driven architectures. The basic operations you can do over these components is attaching and detaching listeners to named events, trigger events, and interrupting the execution of listeners when an event is fired. Let's see a couple of examples on how to attach to an event and how to fire them: //Registering an event listener $events = new EventManager(); $events->attach(array('EVENT_NAME'), $callback); //Triggering an event $events->trigger('EVENT_NAME', $this, $params); Inside the bootstrap method of ZendMvcApplication, we are registering the events of RouteListener, DispatchListener, and ViewManager. After that, the code is instantiating a new custom event called MvcEvent that will be used as the target when firing events. Finally, this piece of code will fire the bootstrap event. ModuleManager Zend Framework 2 introduces a completely redesigned ModuleManager. This new module has been built with simplicity, flexibility, and reuse in mind. These modules can hold everything from PHP to images, CSS, library code, views, and so on. The responsibility of this component in the bootstrap process of an app is loading the available modules specified by the config file. This is accomplished by the following code line located in the init method of ZendMvcApplication: $serviceManager->get('ModuleManager')->loadModules(); This line, when executed, will retrieve the list of modules located at the config file and will load each module. Each module has to contain a file called Module.php with the initialization of the components of the module if needed. This will allow the module manager to retrieve the configuration of the module. Let's see the usual content of this file: namespace MyModule; class Module { public function getAutoloaderConfig() { return array( 'ZendLoaderClassMapAutoloader' => array( __DIR__ . '/autoload_classmap.php', ), 'ZendLoaderStandardAutoloader' => array( 'namespaces' => array( __NAMESPACE__ => __DIR__ . '/src/' . __NAMESPACE__, ), ), ); } public function getConfig() { return include __DIR__ . '/config/module.config.php'; } } As you can see we are defining a method called getAutoloaderConfig() that provides the configuration for the autoloader to ModuleManager. The last method getConfig() is used to provide the configuration of the module to ModuleManager; for example, this will contain the routes handled by the module. Request object This object encapsulates all data related to a request and allows the developer to interact with the different parts of a request. This object is used in the constructor of ZendMvcApplication and also is set inside MvcEvent to be able to retrieve when some events are fired. Response object This object encapsulates all the parts of an HTTP response and provides the developer with a fluent interface to set all the response data. This object is used in the same way as the request object. Basically it is instantiated on the constructor and added to MvcEvent to be able to interact with it across all the events and classes. The request object As we said, the request object will encapsulate all the data related to a request and provide the developer with a fluent API to access the data. Let's take a look at the details of the request object in order to understand how to use it and what it can offer to us: use ZendHttpRequest; $string = "GET /foo HTTP/1.1rnrnSome Content"; $request = Request::fromString($string); $request->getMethod(); $request->getUri(); $request->getUriString(); $request->getVersion(); $request->getContent(); This example comes directly from the documentation and shows how a request object can be created from a string, and then access some data related with the request using the methods provided. So, every time we need to know something related to the request, we will access this object to get the data we need. If we check the code on ZendHttpPhpEnvironmentRequest.php, the first thing we can notice is that the data is populated on the constructor using the superglobal arrays. All this data is processed and then populated inside the object to be able to expose it in a standard way using methods. To manipulate the URI of the request you can get/set the data with three methods, two getters and one setter. The only difference with the getters is that one returns a plain string and the other returns an HttpUri object. getUri() and getUriString() setUri() To retrieve the data passed in the request, there are a few specialized methods depending on the data you want to get: getQuery() getPost() getFiles() getHeader() and getHeaders() About the request method, the object has a general way to know the method used, returning a string or nine specialized functions that will test specific methods based on the RFC 2616, which defines the standard methods for an HTTP request. getMethod() isOptions() isGet() isHead() isPost() isPut() isDelete() isTrace() isConnect() isPatch() Finally, two more methods are available in this object that will test special requests such as AJAX and requests made by a flash object. isXmlHttpRequest() isFlashRequest() Notice that the data stored on the superglobal arrays when populated on the object are converted from an Array to a Parameters object. The Parameters object lives in the Stdlib section of ZF2, a folder where common objects can be found and used across the framework. In this case, the Parameters class is an extension of ArrayObject and implements ParametersInterface that will bring ArrayAccess, Countable, Serializable, and Traversable functionality to the parameters stored inside the object. The goal with this object is to provide a common interface to access data stored in the superglobal arrays. This expands the ways you can interact with the data in an object-oriented approach.
Read more
  • 0
  • 0
  • 1839
article-image-designing-sample-applications-and-creating-reports
Packt
25 Oct 2013
6 min read
Save for later

Designing Sample Applications and Creating Reports

Packt
25 Oct 2013
6 min read
(For more resources related to this topic, see here.) Creating a new application We will now start using Microsoft Visual Studio, which we installed in the previous article: From the Start menu, select All Programs, then select the Microsoft Visual Studio 2012 folder, and then select Visual Studio 2012. Since we are running Visual Studio for the first time, we will see the following screenshot. We can select the default environment setting. Also, we can change this setting in our application. We will use C# as the programming language, so we will select Visual C# Development Settings and click on the Start Visual Studio button. Now we will see the Start Page of Microsoft Visual Studio 2012. Select New Project or open it by navigating to File | New | Project. This is shown in the following screenshot: As shown in the following screenshot, we will select Windows as the application template. We will then select Windows Forms Application as the application type. Let us name our application Monitor. Choose where you want to save the application and click on OK. Now we will see the following screenshot. Let's start to design our application interface. Start by right-clicking on the form and navigating to Properties, as shown in the following screenshot: The Properties explorer will open as shown in the following screenshot. Change Text property from Form1 to monitor, change the Name property to MainForm, and then change Size to 322, 563 because we will add many controls to our form. After I finished the design process, I found that this was the suitable form size; you can change form size and all controls sizes as you wish in order to better suit your application interface. Adding controls to the form In this step, we will start designing the interface of our application by adding controls to the Main Form. We will divide the Main Form into three main parts as follows: Employees Products and orders Employees and products Employees In this section, we will see the different ways in which we can search for our employees and display the result in a report. Beginning your design process Let's begin to design the Employees section of our application using the following steps: From the Toolbox explorer, drag-and-drop GroupBox into the Main Form. Change the GroupBox Size property to 286, 181, the Location property to 12,12, and Text property to Employees. Add Label to the Main Form. Change the Text property to ID and the Location property to 12,25. Add Textbox to the Main Form and change the Name property to txtEmpId. Change the Location property to 70, 20. Add GroupBox to the Main Form. Change the Groupbox Size property to 250,103, the Location property to 6, 40, and the Text property to filters. Add Label to the Main Form. Change the Text property to Title and the Location property to 6,21. Add ComboBox to Main Form. Change the Name property to cbEmpTitle and the Location property to 56,17. Add Label to Main Form. Change the Text property to City and the Location property to 6,50. Add ComboBox to Main Form. Change the Name property to cbEmpCity and the Location property to 56,46. Add Label to Main Form. Change the Text property to Country and the Location property to 6,76. Add ComboBox to Main Form. Change the Name property to cbEmpCountry and the Location property to 56,72. Add Button to Main Form. Change the Name property to btnEmpAll, the Location property to 6,152, and the Text property to All. Add Button to Main Form. Change the Name property to btnEmpById, the Location property to 99,152, and the Text property to By Id. Add Button to Main Form. Change the Name property to btnEmpByFilters, the Location property to 196,152 and the Text property to By filters. The final design will look like the following screenshot: Products and orders In this part, we will see the relationship between products and orders in order to demonstrate, which products have higher orders. Because we have a lot of product categories and orders from different countries, we filter our report by products category and countries. Beginning your design process After we finished the design of employees section, let's start designing the products and orders section. From Toolbox explorer, drag-and-drop GroupBox to the Main Form. Change the GroupBox Size property to 284,136, the Location property to 12,204 and the Text property to Products – Orders. Add GroupBox to the Main Form. Change the GroupBox Size property to 264,77, the Location property to 6,20, and the Text property to filters. Add Label to the Main Form, change the Text property to Category, and the Location property to 6,25. Add ComboBox to the Main Form. Change the Name Property to cbProductsCategory, and the Location property to 61,20. Add Label to the Main Form. Change the Text property to Country and the Location property to 6,51. Add ComboBox to the Main Form. Change the Name property to cbOrdersCountry and the Location property to 61,47. Add Button to the Main Form. Change the Name property to btnProductsOrdersByFilters, the Location property to 108,103, and the Text property to By filters. The final design looks like the following screenshot: Employees and orders In this part we will see the relationship between employees and orders in order to evaluate the performance of each employee. In this section we will filter data by a period of time. Beginning your design process In this section we will start to design the last section in our application employees and orders. From Toolbox explorer, drag -and-drop GroupBox to the Main Form. Change the Size property of GroupBox to 284,175, the Location property to 14,346, and the Text property to Employees - Orders. Add GroupBox to the Main Form. Change the GroupBox Size property to 264,77, the Location property to 6,25, and the Text property to filters. Add Label to the Main Form. Change the Text property to From and the Location property to 7,29. Add DateTimePicker to the Main Form and change the Name property to dtpFrom. Change the Location property to 41,25, the Format property to Short, and the Value property to 1/1/1996. Add Label to the Main Form. Change the Text property to To and the Location property to 7,55. Add DateTimePicker to the Main Form. Change the Name property to dtpTo, the Location property to 41,51, the Format property to Short, and the Value property to 12/30/1996. Add Button to the Main Form. Change the Name property to btnBarChart, the Location property to 15,117, and the Text property to Bar chart. Add Button to the Main Form and change the Name property to btnPieChart. Change the Location property to 99,117 and the Text property to Pie chart. Add Button to the Main Form. Change the Name Property to btnGaugeChart, the Location property to 186, 117, and the Text property to Gauge chart. Add Button to the Main Form. Change the Name property to btnAllInOne, the Location property to 63,144, and the Text property to All In One. The final design looks like the following screenshot: You can change Location, Size, and Text properties as you wish; but don't change the Name property because we will use it in code in the next articles.
Read more
  • 0
  • 0
  • 1715

article-image-about-cassandra
Packt
25 Oct 2013
28 min read
Save for later

About Cassandra

Packt
25 Oct 2013
28 min read
(For more resources related to this topic, see here.) So, if Cassandra is so good at everything, why not everyone drop whatever database they are using and jump start with Cassandra? This is a natural question. Some applications require strong ACID compliance, such as a booking system. If you are a person who goes by statistics, you'd ask how Cassandra fares with other existing data stores. TilmannRabl et al in their paper, Solving Big Data Challenges for Enterprise Application Performance Management (http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf), told that, "In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments with a linear increasing throughput from one to 12 nodes. This comes at the price of a high write and read latency. Cassandra's performance is best for high insertion rates." If you go through the paper, Cassandra wins in almost all the criteria. Equipped with proven concepts of distributed computing, made to reliably serve from commodity servers, and simple and easy maintenance, Cassandra is one of the most scalable, fastest, and very robust NoSQL database. So, the next natural question is what makes Cassandra so blazing fast? Let us dive deeper into the Cassandra architecture. Cassandra architecture Cassandra is a relative latecomer in the distributed data-store war. It takes advantage of two proven and closely similar data-store mechanisms, namely Google BigTable, a distributed storage system for structured data, 2006 (http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/bigtable-osdi06.pdf [2006]), and Amazon Dynamo, Amazon's highly available key-value store, 2007 (http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf [2007]). Figure 2.3: Read throughputs shows linear scaling of Cassandra Like BigTable, it has tabular data presentation. It is not tabular in the strictest sense. It is rather a dictionary-like structure where each entry holds another sorted dictionary/map. This model is more powerful than the usual key-value store and it is named as column family. The properties such as Eventual Consistency and decentralization are taken from Dynamo. For now, assume a column family as a giant spreadsheet, such as MS Excel. But unlike spreadsheets, each row is identified by a row key with a number (token), and unlike spreadsheets, each cell can have its own unique name within the row. The columns in the rows are sorted by this unique column name. Also, since the number of rows is allowed to be very large (1.7*(10)^38), we distribute the rows uniformly across all the available machines by dividing the rows in equal token groups. These rows create a Keyspace. Keyspace is a set of all the tokens (row IDs). Ring representation Cassandra cluster is denoted as a ring. The idea behind this representation is to show token distribution. Let's take an example. Assume that there is a partitioner that generates tokens from zero to 127 and you have four Cassandra machines to create a cluster. To allocate equal load, we need to assign each of the four nodes to bear an equal number of tokens. So, the first machine will be responsible for tokens one to 32, the second will hold 33 to 64, the third, 65 to 96, and the fourth, 97 to 127 and 0. If you mark each node with the maximum token number that it can hold, the cluster looks like a ring. (Figure 2.22) Partitioner is the hash function that determines the range of possible row keys. Cassandra uses a partitioner to calculate the token equivalent to a row key (row ID). Figure 2.4: Token ownership and distribution in a balanced Cassandra ring When you start to configure Cassandra, one thing that you may want to set is the maximum token number that a particular machine could hold. This property can be set in the Cassandra.yaml file as initial_token. One thing that may confuse a beginner is that the value of the initial token is what the last token owns. Be aware that nodes can be rebalanced and these tokens can be changed as the new nodes join or old nodes get discarded. This is the initial token because this is just the initial value, and it may be changed later. How Cassandra works Diving into various components of Cassandra without having a context is really a frustrating experience. It does not makes sense why you are studying SSTable, MemTable, and Log Structured Merge (LSM) tree without being able to see how they fit into functionality and performance guarantees that Cassandra gives. So, first, we will see Cassandra's write and read mechanism. It is possible that some of the terms that we encounter during this discussion may not be immediately understandable. A rough overview of the Cassandra components is as shown in the following figure: Figure 2.5: Main components of the Cassandra service The main class of Storage Layer is StorageProxy. It handles all the requests. Messaging Layer is responsible for internode communications like gossip. Apart from this, process-level structures keep a rough idea about the actual data containers and where they live. There are four data buckets that you need to know. MemTable is a hash table-like structure that stays in memory. It contains actual column data. SSTable is the disk version of MemTables. When MemTables are full, SSTables are persisted to the hard disk. Bloom filters is a probabilistic data structure that lives in memory. It helps Cassandra to quickly detect which SSTable does not have the requested data. CommitLog is the usual commit log that contains all the mutations that are to be applied. It lives on the disk and helps to replay uncommitted changes. With this primer, we can start looking into how write and read works in Cassandra. We will see more explanation later. Write in action To write, clients need to connect to any of the Cassandra nodes and send a write request. This node is called as the coordinator node. When a node in Cassandra cluster receives a write request, it delegates it to a service called StorageProxy. This node may or may not be the right place to write the data to. The task of StorageProxy is to get the nodes (all the replicas) that are responsible to hold the data that is going to be written. It utilizes a replication strategy to do that. Once the replica nodes are identified, it sends the RowMutation message to them, the node waits for replies from these nodes, but it does not wait for all the replies to come. It only waits for as many responses as are enough to satisfy the client's minimum number of successful writes defined by ConsistencyLevel. So, the following figure and steps after that show all that can happen during a write mechanism: Figure 2.6: A simplistic representation of the write mechanism. The figure on the left represents the node-local activities on receipt of the write request If FailureDetector detects that there aren't enough live nodes to satisfy ConsistencyLevel, the request fails. If FailureDetector gives a green signal, but writes time-out after the request is sent due to infrastructure problems or due to extreme load, StorageProxy writes a local hint to replay when the failed nodes come back to life. This is called hinted handoff. One might think that hinted handoff may be responsible for Cassandra's eventual consistency. But it's not entirely true. If the coordinator node gets shut down or dies due to hardware failure and hints on this machine cannot be forwarded, eventual consistency will not occur. The Anti-entropy mechanism is responsible for consistency rather than hinted handoff. Anti-entropy makes sure that all replicas are in sync. If the replica nodes are distributed across datacenters, it will be a bad idea to send individual messages to all the replicas in other datacenters. It rather sends the message to one replica in each datacenter with a header instructing it to forward the request to other replica nodes in that datacenter. Now, the data is received by the node that should actually store that data. The data first gets appended to CommitLog, and pushed to a MemTable for the appropriate column family in the memory. When MemTable gets full, it gets flushed to the disk in a sorted structure named SSTable. With lots of flushes, the disk gets plenty of SSTables. To manage SSTables, a compaction process runs. This process merges data from smaller SSTables to one big sorted file. Read in action Similar to a write case, when StorageProxy of the node that a client is connected to gets the request, it gets a list of nodes containing this key based on Replication Strategy. StorageProxy then sorts the nodes based on their proximity to itself. The proximity is determined by the Snitch function that is set up for this cluster. Basically, there are the following types of Snitch: SimpleSnitch: A closer node is the one that comes first when moving clockwise in the ring. (A ring is when all the machines in the cluster are placed in a circular fashion with each having a token number. When you walk clockwise, the token value increases. At the end, it snaps back to the first node.) AbstractNetworkTopologySnitch: Implementation of the Snitch function works like this: nodes on the same rack are closest. The nodes in the same datacenter but in different rack are closer than those in other datacenters, but farther than the nodes in the same rack. Nodes in different datacenters are the farthest. To a node, the nearest node will be the one on the same rack. If there is no node on the same rack, the nearest node will be the one that lives in the same datacenter, but on a different rack. If there is no node in the datacenter, any nearest neighbor will be the one in the other datacenter. DynamicSnitch: This Snitch determines closeness based on recent performance delivered by a node. So, a quick-responding node is perceived closer than a slower one, irrespective of their location closeness or closeness in the ring. This is done to avoid overloading a slow-performing node. Now that we have the list of nodes that have desired row keys, it's time to pull data from them. The coordinator node (the one that the client is connected to) sends a command to the closest node to perform read (we'll discuss local read in a minute) and return the data. Now, based on ConsistencyLevel, other nodes will send a command to perform a read operation and send just the digest of the result. If we have Read Repair (discussed later) enabled, the remaining replica nodes will be sent a message to compute the digest of the command response. Let's take an example: say you have five nodes containing a row key K (that is, replication factor (RF) equals 5). Your read ConsistencyLevel is three. Then the closest of the five nodes will be asked for the data. And the second and third closest nodes will be asked to return the digest. We still have two left to be queried. If read Repair is not enabled, they will not be touched for this request. Otherwise, these two will be asked to compute digest. The request to the last two nodes is done in the background, after returning the result. This updates all the nodes with the most recent value, making all replicas consistent. So, basically, in all scenarios, you will have a maximum one wrong response. But with correct read and write consistency levels, we can guarantee an up-to-date response all the time. Let's see what goes within a node. Take a simple case of a read request looking for a single column within a single row. First, the attempt is made to read from MemTable, which is rapid-fast since there exists only one copy of data. This is the fastest retrieval. If the data is not found there, Cassandra looks into SSTable. Now, remember from our earlier discussion that we flush MemTables to disk as SSTables and later when compaction mechanism wakes up, it merges those SSTables. So, our data can be in multiple SSTables. Figure 2.7: A simplified representation of the read mechanism. The bottom image shows processing on the read node. Numbers in circles shows the order of the event. BF stands for Bloom Filter Each SSTable is associated with its Bloom Filter built on the row keys in the SSTable. Bloom Filters are kept in memory, and used to detect if an SSTable may contain (false positive) the row data. Now, we have the SSTables that may contain the row key. The SSTables get sorted in reverse chronological order (latest first). Apart from Bloom Filter for row keys, there exists one Bloom Filter for each row in the SSTable. This secondary Bloom Filter is created to detect whether the requested column names exist in the SSTable. Now, Cassandra will take SSTables one by one from younger to older. And use the index file to locate the offset for each column value for that row key and the Bloom filter associated with the row (built on the column name). On Bloom filter being positive for the requested column, it looks into the SSTable file to read the column value. Note that we may have a column value in other yet-to-be-read SSTables, but that does not matter, because we are reading the most recent SSTables first, and any value that was written earlier to it does not matter. So, the value gets returned as soon as the first column in the most recent SSTable is allocated. Components of Cassandra We have gone through how read and write takes place in highly distributed Cassandra clusters. It's time to look into individual components of it a little deeper. Messaging service Messaging service is the mechanism that manages internode socket communication in a ring. Communications, for example, gossip, read, read digest, write, and so on, processed via a messaging service, can be assumed as a gateway messaging server running at each node. To communicate, each node creates two socket connections per node. This implies that if you have 101 nodes, there will be 200 open sockets on each node to handle communication with other nodes. The messages contain a verb handler within them that basically tells the receiving node a couple of things: how to deserialize the payload message and what handler to execute for this particular message. The execution is done by the verb handlers (sort of an event handler). The singleton that orchestrates the messaging service mechanism is org.apache.cassandra.net.MessagingService. Gossip Cassandra uses the gossip protocol for internode communication. As the name suggests, the protocol spreads information in the same way an office rumor does. It can also be compared to a virus spread. There is no central broadcaster, but the information (virus) gets transferred to the whole population. It's a way for nodes to build the global map of the system with a small number of local interactions. Cassandra uses gossip to find out the state and location of other nodes in the ring (cluster). The gossip process runs every second and exchanges information with at the most three other nodes in the cluster. Nodes exchange information about themselves and other nodes that they come to know about via some other gossip session. This causes all the nodes to eventually know about all the other nodes. Like everything else in Cassandra, gossip messages have a version number associated with it. So, whenever two nodes gossip, the older information about a node gets overwritten with a newer one. Cassandra uses an Anti-entropy version of gossip protocol that utilizes Merkle trees (discussed later) to repair unread data. Implementation-wise the gossip task is handled by the org.apache.cassandra.gms.Gossiper class. Gossiper maintains a list of live and dead endpoints (the unreachable endpoints). At every one-second interval, this module starts a gossip round with a randomly chosen node. A full round of gossip consists of three messages. A node X sends a syn message to a node Y to initiate gossip. Y, on receipt of this syn message, sends an ack message back to X. To reply to this ack message, X sends an ack2 message to Y completing a full message round. Figure 2.8: Two nodes gossiping Failure detection Failure detection is one of the fundamental features of any robust and distributed system. A good failure detection mechanism implementation makes a fault-tolerant system such as Cassandra. The failure detector that Cassandra uses is a variation of The ϕ accrual failure detector (2004) by Xavier Défago et al. (The phi accrual detector research paper is available at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.106.3350.) The idea behind a FailureDetector is to detect a communication failure and take appropriate actions based on the state of the remote node. Unlike traditional failure detectors, phi accrual failure detector does not emit a Boolean alive or dead (true or false, trust or suspect) value. Instead, it gives a continuous value to the application and the application is left to decide the level of severity and act accordingly. This continuous suspect value is called phi (ϕ). Partitioner Cassandra is a distributed database management system. This means it takes a single logical database and distributes it over one or more machines in the database cluster. So, when you insert some data in Cassandra, it assigns each row to a row key; and based on that row key, Cassandra assigns that row to one of the nodes that's responsible for managing it. Let's try to understand this. Cassandra inherits the data model from Google's BigTable (BigTable research paper can be found at http://research.google.com/archive/bigtable.html.). This means we can roughly assume that the data is stored in some sort of a table that has an unlimited number of columns (not unlimited, Cassandra limits the maximum number of columns to be two billion) with rows binded with a unique key, namely, row key. Now, your terabytes of data on one machine will be restrictive from multiple points of views. One is disk space, another being limited parallel processing, and if not duplicated, a source of single point of failure. What Cassandra does is, it defines some rules to slice data across rows and assigns which node in the cluster is responsible for holding which slice. This task is done by a partitioner. There are several types of partitioners to choose from. In short, Cassandra (as of Version 1.2) offers three partitioners as follows: RandomPartitioner: It uses MD5 hashing to distribute data across the cluster. Cassandra 1.1.x and precursors have this as the default partitioner. Murmur3Partitioner: It uses Murmur hash to distribute the data. It performs better than RandomPartitioner. It is the default partitioner from Cassandra Version 1.2 onwards. ByteOrderPartitioner: Keeps keys distributed across the cluster by key bytes. This is an ordered distribution, so the rows are stored in lexical order. This distribution is commonly discouraged because it may cause a hotspot. Replication Cassandra runs on commodity hardware, and works reliably in network partitions. However, this comes with a cost: replication. To avoid data inaccessibility in case a node goes down or becomes unavailable, one must replicate data to more than one node. Replication brings features such as fault tolerance and no single point of failure to the system. Cassandra provides more than one strategy to replicate the data, and one can configure the replication factor while creating keyspace. Log Structured Merge tree Cassandra (also, HBase) is heavily influenced by Log Structured Merge (LSM). It uses an LSM tree-like mechanism to store data on a disk. The writes are sequential (in append fashion) and the data storage is contiguous. This makes writes in Cassandra superfast, because there is no seek involved. Contrast this with an RBDMS system that is based on the B+ Tree (http://en.wikipedia.org/wiki/B%2B_tree) implementation. LSM tree advocates the following mechanism to store data: note down the arriving modification into a log file (CommitLog), push the modification/new data into memory (MemTable) for faster lookup, and when the system has gathered enough updates in memory or after a certain threshold time, flush this data to a disk in a structured store file (SSTable). The logs corresponding to the updates that are flushed can now be discarded. Figure 2.12: Log Structured Merge (LSM) Trees CommitLog One of the promises that Cassandra makes to the end users is durability. In conventional terms (or in ACID terminology), durability guarantees that a successful transaction (write, update) will survive permanently. This means once Cassandra says write successful that means the data is persisted and will survive system failures. It is done the same way as in any DBMS that guarantees durability: by writing the replayable information to a file before responding to a successful write. This log is called the CommitLog in the Cassandra realm. This is what happens. Any write to a node gets tracked by org.apache.cassandra.db.commitlog.CommitLog, which writes the data with certain metadata into the CommitLog file in such a manner that replaying this will recreate the data. The purpose of this exercise is to ensure there is no data loss. If due to some reason the data could not make it into MemTable or SSTable, the system can replay the CommitLog to recreate the data. MemTable MemTable is an in-memory representation of column family. It can be thought of as a cached data. MemTable is sorted by key. Data in MemTable is sorted by row key. Unlike CommitLog, which is append-only, MemTable does not contain duplicates. A new write with a key that already exists in the MemTable overwrites the older record. This being in memory is both fast and efficient. The following is an example: Write 1: {k1: [{c1, v1}, {c2, v2}, {c3, v3}]} In CommitLog (new entry, append): {k1: [{c1, v1},{c2, v2}, {c3, v3}]} In MemTable (new entry, append): {k1: [{c1, v1}, {c2, v2}, {c3, v3}]} Write 2: {k2: [{c4, v4}]} In CommitLog (new entry, append): {k1: [{c1, v1}, {c2, v2}, {c3, v3}]} {k2: [{c4, v4}]} In MemTable (new entry, append): {k1: [{c1, v1}, {c2, v2}, {c3, v3}]} {k2: [{c4, v4}]} Write 3: {k1: [{c1, v5}, {c6, v6}]} In CommitLog (old entry, append): {k1: [{c1, v1}, {c2, v2}, {c3, v3}]} {k2: [{c4, v4}]} {k1: [{c1, v5}, {c6, v6}]} In MemTable (old entry, update): {k1: [{c1, v5}, {c2, v2}, {c3, v3}, {c6, v6}]} {k2: [{c4, v4}]} Cassandra Version 1.1.1 uses SnapTree (https://github.com/nbronson/snaptree) for MemTable representation, which claims it to be "... a drop-in replacement for ConcurrentSkipListMap, with the additional guarantee that clone() is atomic and iteration has snapshot isolation." See also copy-on-write and compare-and-swap (http://en.wikipedia.org/wiki/Copy-on-write, http://en.wikipedia.org/wiki/Compare-and-swap). Any write gets written first to CommitLog and then to MemTable. SSTable SSTable is a disk representation of the data. MemTables gets flushed to disk to immutable SSTables. All the writes are sequential, which makes this process fast. So, the faster the disk speed, the quicker the flush operation. The SSTables eventually get merged in the compaction process and the data gets organized properly into one file. This extra work in compaction pays off during reads. SSTables have three components: Bloom filter, index files, and datafiles. Bloom filter Bloom filter is a litmus test for the availability of certain data in storage (collection). But unlike a litmus test, a Bloom filter may result in false positives: that is, it says that a data exists in the collection associated with the Bloom filter, when it actually does not. A Bloom filter never results in a false negative. That is, it never states that a data is not there while it is. The reason to use Bloom filter, even with its false-positive defect, is because it is superfast and its implementation is really simple. Cassandra uses Bloom filters to determine whether an SSTable has the data for a particular row key. Bloom filters are unused for range scans, but they are good candidates for index scans. This saves a lot of disk I/O that might take in a full SSTable scan, which is a slow process. That's why it is used in Cassandra, to avoid reading many, many SSTables, which can become a bottleneck. Index files Index files are companion files of SSTables. The same as Bloom filter, there exists one index file per SSTable. It contains all the row keys in the SSTable and its offset at which the row starts in the datafile. At startup, Cassandra reads every 128th key (configurable) into the memory (sampled index). When the index is looked for a row key (after Bloom filter hinted that the row key might be in this SSTable), Cassandra performs a binary search on the sampled index in memory. Followed by a positive result from the binary search, Cassandra will have to read a block in the index file from the disk starting from the nearest value lower than the value that we are looking for. Datafiles Datafiles are the actual data. They contain row keys, metadata, and columns (partial or full). Reading data from datafiles is just one disk seek followed by a sequential read, as offset to a row key is already obtained from the associated index file. Compaction As mentioned earlier, a read require may require Cassandra to read across multiple SSTables to get a result. This is wasteful, costs multiple (disk) seeks, may require a conflict resolution, and if there are too many, SSTables were created. To handle this problem, Cassandra has a process in place, namely, compaction. Compaction merges multiple SSTable files into one. Off the shelf, Cassandra offers two types of compaction mechanism: Size Tiered Compaction Strategy and Level Compaction Strategy. The compaction process starts when the number of SSTables on disk reaches a certain threshold (N: configurable). Although the merge process is a little I/O intensive, it benefits in the long term with a lower number of disk seeks during reads. Apart from this, there are a few other benefits of compaction as follows: Removal of expired tombstones (Cassandra v0.8+) Merging row fragments Rebuilds primary and secondary indexes Tombstones Cassandra is a complex system with its data distributed among CommitLogs, MemTables, and SSTables on a node. The same data is then replicated over replica nodes. So, like everything else in Cassandra, deletion is going to be eventful. Deletion, to an extent, follows an update pattern except Cassandra tags the deleted data with a special value, and marks it as a tombstone. This marker helps future queries, compaction, and conflict resolution. Let's step further down and see what happens when a column from a column family is deleted. A client connected to a node (coordinator node, but it may not be the one holding the data that we are going to mutate), issues a delete command for a column C, in a column family CF. If the consistency level is satisfied, the delete command gets processed. When a node, containing the row key, receives a delete request, it updates or inserts the column in MemTable with a special value, namely, tombstone. The tombstone basically has the same column name as the previous one; the value is set to UNIX epoch. The timestamp is set to what the client has passed. When a MemTable is flushed to SSTable, all tombstones go into it as any regular column. On the read side, when the data is read locally on the node and it happens to have multiple versions of it in different SSTables, they are compared and the latest value is taken as the result of reconciliation. If a tombstone turns out to be a result of reconciliation, it is made a part of the result that this node returns. So, at this level, if a query has a deleted column, this exists in the result. But the tombstones will eventually be filtered out of the result before returning it back to the client. So, a client can never see a value that is a tombstone. Hinted handoff When we last talked about durability, we observed Cassandra provides CommitLogs to provide write durability. This is good. But what if the node, where the writes are going to be, is itself dead? No communication will keep anything new to be written to the node. Cassandra, inspired by Dynamo, has a feature called hinted handoff. In short, it's the same as taking a quick note locally that X cannot be contacted. Here is the mutation (operation that requires modification of the data such as insert, delete, and update) M that will be required to be replayed when it comes back. The coordinator node (the node which the client is connected to) on receipt of a mutation/write request, forwards it to appropriate replicas that are alive. If this fulfills the expected consistency level, write is assumed successful. The write requests to a node that does not respond to a write request or is known to be dead (via gossip) and is stored locally in the system.hints table. This hint contains the mutation. When a node comes to know via gossip that a node is recovered, it replays all the hints it has in store for that node. Also, every 10 minutes, it keeps checking any pending hinted handoffs to be written. Why worry about hinted handoff when you have written to satisfy consistency level? Wouldn't it eventually get repaired? Yes, that's right. Also, hinted handoff may not be the most reliable way to repair a missed write. What if the node that has hinted handoff dies? This is a reason why we do not count on hinted handoff as a mechanism to provide consistency (except for the case of the consistency level, ANY) guarantee; it's a single point of failure. The purpose of hinted handoff is, one, to make restored nodes quickly consistent with the other live ones; and two, to provide extreme write availability when consistency is not required. Read repair and Anti-entropy Cassandra promises eventual consistency and read repair is the process which does that part. Read repair, as the name suggests, is the process of fixing inconsistencies among the replicas at the time of read. What does that mean? Let's say we have three replica nodes A, B, and C that contain a data X. During an update, X is updated to X1 in replicas A and B. But it is failed in replica C for some reason. On a read request for data X, the coordinator node asks for a full read from the nearest node (based on the configured Snitch) and digest of data X from other nodes to satisfy consistency level. The coordinator node compares these values (something like digest(full_X) == digest_from_node_C). If it turns out that the digests are the same as the digests of full read, the system is consistent and the value is returned to the client. On the other hand, if there is a mismatch, full data is retrieved and reconciliation is done and the client is sent the reconciled value. After this, in background, all the replicas are updated with the reconciled value to have a consistent view of data on each node. See Figure 2.1 as shown: Figure 2.16: Image showing read repair dynamics. 1. Client queries for data x, from a node C (coordinator). 2. C gets data from replicas R1, R2, and R3; reconciles. 3. Sends reconciled data to client. 4. If there is a mismatch across replicas, repair is invoked. Merkle tree Merkle tree (A digital signature Based On A Conventional Encryption Function by Merkle, R. (1988), available at http://www.cse.msstate.edu/~ramkumar/merkle2.pdf.) is a hash tree where leaves of the tree hashes hold actual data in a column family and non-leaf nodes hold hashes of their children. The unique advantage of Merkle tree is a whole subtree can be validated just by looking at the value of the parent node. So, if nodes on two replica servers have the same hash values, then the underlying data is consistent and there is no need to synchronize. If one node passes the whole Merkle tree of a column family to another node, it can determine all the inconsistencies. Figure 2.17: Merkle tree to determine mismatch in hash values at parent nodes due to the difference in underlying data Summary By now, you are familiar with all the nuts and bolts of Cassandra. It is understandable that it may be a lot to take in for someone new to NoSQL systems. It is okay if you do not have complete clarity at this point. As you start working with Cassandra, tweaking it, experimenting with it, and going through the Cassandra mailing list discussions or talks, you will start to come across stuff that you have read in this article and it will start to make sense, and perhaps you may want to come back and refer to this article to improve clarity. It is not required to understand this article fully to be able to write queries, set up clusters, maintain clusters, or do anything else related to Cassandra. A general sense of this article will take you far enough to work extremely well with Cassandra-based projects. Resources for Article: Further resources on this subject: Apache Cassandra: Libraries and Applications [Article] Apache Felix Gogo [Article] Migration from Apache to Lighttpd [Article]
Read more
  • 0
  • 0
  • 3608

article-image-image-classification-and-feature-extraction-images
Packt
25 Oct 2013
3 min read
Save for later

Image classification and feature extraction from images

Packt
25 Oct 2013
3 min read
(For more resources related to this topic, see here.) Classifying images Automated Remote Sensing ( ARS ) is rarely ever done in the visible spectrum. The most commonly available wavelengths outside of the visible spectrum are infrared and near-infrared. The following scene is a thermal image (band 10) from a fairly recent Landsat 8 flyover of the US Gulf Coast from New Orleans, Louisiana to Mobile, Alabama. Major natural features in the image are labeled so you can orient yourself: Because every pixel in that image has a reflectance value, it is information. Python can "see" those values and pick out features the same way we intuitively do by grouping related pixel values. We can colorize pixels based on their relation to each other to simplify the image and view related features. This technique is called classification. Classifying can range from fairly simple groupings based only on some value distribution algorithm derived from the histogram to complex methods involving training data sets and even computer learning and artificial intelligence. The simplest forms are called unsupervised classifications, whereas methods involving some sort of training data to guide the computer are called supervised. It should be noted that classification techniques are used across many fields, from medical doctors trying to spot cancerous cells in a patient's body scan, to casinos using facial-recognition software on security videos to automatically spot known con-artists at blackjack tables. To introduce remote sensing classification we'll just use the histogram to group pixels with similar colors and intensities and see what we get. First you'll need to download the Landsat 8 scene here: http://geospatialpython.googlecode.com/files/thermal.zip Instead of our histogram() function from previous examples, we'll use the version included with NumPy that allows you to easily specify a number of bins and returns two arrays with the frequency as well as the ranges of the bin values. We'll use the second array with the ranges as our class definitions for the image. The lut or look-up table is an arbitrary color palette used to assign colors to classes. You can use any colors you want. import gdalnumeric # Input file name (thermal image) src = "thermal.tif" # Output file name tgt = "classified.jpg" # Load the image into numpy using gdal srcArr = gdalnumeric.LoadFile(src) # Split the histogram into 20 bins as our classes classes = gdalnumeric.numpy.histogram(srcArr, bins=20)[1] # Color look-up table (LUT) - must be len(classes)+1. # Specified as R,G,B tuples lut = [[255,0,0],[191,48,48],[166,0,0],[255,64,64], [255,115,115],[255,116,0],[191,113,48],[255,178,115], [0,153,153],[29,115,115],[0,99,99],[166,75,0], [0,204,0],[51,204,204],[255,150,64],[92,204,204],[38,153,38], [0,133,0],[57,230,57],[103,230,103],[184,138,0]] # Starting value for classification start = 1 # Set up the RGB color JPEG output image rgb = gdalnumeric.numpy.zeros((3, srcArr.shape[0], srcArr.shape[1],), gdalnumeric.numpy.float32) # Process all classes and assign colors for i in range(len(classes)): mask = gdalnumeric.numpy.logical_and(start <= srcArr, srcArr <= classes[i]) for j in range(len(lut[i])): rgb[j] = gdalnumeric.numpy.choose(mask, (rgb[j], lut[i][j])) start = classes[i]+1 # Save the image gdalnumeric.SaveArray(rgb.astype(gdalnumeric.numpy.uint8), tgt, format="JPEG") The following image is our classification output, which we just saved as a JPEG. We didn't specify the prototype argument when saving as an image, so it has no georeferencing information. This result isn't bad for a very simple unsupervised classification. The islands and coastal flats show up as different shades of green. The clouds were isolated as shades of orange and dark blues. We did have some confusion inland where the land features were colored the same as the Gulf of Mexico. We could further refine this process by defining the class ranges manually instead of just using the histogram.
Read more
  • 0
  • 0
  • 13101
article-image-general-considerations
Packt
25 Oct 2013
9 min read
Save for later

General Considerations

Packt
25 Oct 2013
9 min read
(For more resources related to this topic, see here.) Building secure Node.js applications will require an understanding of the many different layers that it is built upon. Starting from the bottom, we have the language specification that defines what JavaScript consists of. Next, the virtual machine executes your code and may have differences from the specification. Following that, the Node.js platform and its API have details in their operation that affect your applications. Lastly, third-party modules interact with our own code and need to be audited for secure programming practices. First, JavaScript's official name is ECMAScript. The international European Computer Manufacturers Association (ECMA) first standardized the language as ECMAScript in 1997. This ECMA-262 specification defines what comprises JavaScript as a language, including its features, and even some of its bugs. Even some of its general quirkiness has remained unchanged in the specification to maintain backward compatibility. While I won't say the specification itself is required reading, I will say that it is worth considering. Second, Node.js uses Google's V8 virtual machine to interpret and execute your source code. While developing for the browser, you have to consider all the other virtual machines (not to mention versions), when it comes to available features. In a Node.js application, your code only runs on the server, so you have much more freedom, and you can use all the features available to you in V8. Additionally, you can also optimize for the V8 engine exclusively. Next, Node.js handles setting up the event loop, and it takes your code to register callbacks for events and executes them accordingly. There are some important details regarding how Node.js responds to exceptions and other errors that you will need to be aware of while developing your applications. Atop Node.js is the developer API. This API is written mostly in JavaScript which allows you, as a JavaScript developer, to read it for yourself, and understand how it works. There are many provided modules that you will likely end up using, and it's important for you to know how they work, so you can code defensively. Last, but not least, the third-party modules that npm gives you access to, are in great abundance, which can be a double-edged sword. On one hand, you have many options to pick from that suit your needs. On the other hand, having a third-party code is a potential security liability, as you will be expected to support and audit each of these modules (in addition to their own dependencies) for security vulnerabilities. JavaScript security One of the biggest security risks in JavaScript itself, both on the client and now on the server, is the use of the eval() function. This function, and others like it, takes a string argument, which can represent an expression, statement, or a series of statements, and it is executed as any other JavaScript source code. This is demonstrated in the following code: // these variables are available to eval()'d code // assume these variables are user input from a POST request var a = req.body.a; // => 1 var b = req.body.b; // => 2 var sum = eval(a + "+" + b); // same as '1 + 2' This code has full access to the current scope, and can even affect the global object, giving it an alarming amount of control. Let's look at the same code, but imagine if someone malicious sent arbitrary JavaScript code instead of a simple number. The result is shown in the following code: var a = req.body.a; // => 1 var b = req.body.b; // => 2; console.log("corrupted"); var sum = eval(a + "+" + b); // same as '1 + 2; console.log("corrupted"); Due to how eval() is exploited here, we are witnessing a "remote code execution" attack! When executed directly on the server, an attacker could gain access to server files and databases. There are a few cases where eval() can be useful, but if the user input is involved in any step of the process, it should likely be avoided at all costs! There are other features of JavaScript that are functional equivalents to eval(), and should likewise be avoided unless absolutely necessary. First is the Function constructor that allows you to create a callable function from strings, as shown in the following code: // creates a function that returns the sum of 2 arguments var adder = new Function("a", "b", "return a + b"); adder(1, 2); // => 3 While very similar to the eval() function, it is not exactly the same. This is because it does not have access to the current scope. However, it does still have access to the global object, and should be avoided whenever a user input is involved. If you find yourself in a situation where there is an absolute need to execute an arbitrary code that involves user input, you do have one secure option. Node.js platform's API includes a vm module that is meant to give you the ability to compile and run code in a sandbox, preventing manipulation of the global object and even the current scope. It should be noted that the vm module has many known issues and edge cases. You should read the documentation, and understand all the implications of what you are doing to make sure you don't get caught off-guard. ES5 features ECMAScript5 included an extensive batch of changes to JavaScript, including the following changes: Strict mode for removing unsafe features from the language. Property descriptors that give you control over object and property access. Functions for changing object mutability. Strict mode Strict mode changes the way JavaScript code runs in select cases. First, it causes errors to be thrown in cases that were silent before. Second, it removes and/or change features that made optimizations for JavaScript engines either difficult or impossible. Lastly, it prohibits some syntax that is likely to show up in future versions of JavaScript. Additionally, strict mode is opt-in only, and can be applied either globally or for an individual function scope. For Node.js applications, to enable strict mode globally, add the –use_strict command line flag, while executing your program. While dealing with third-party modules that may or may not be using strict mode, this can potentially have negative side effects on your overall application. With that said, you could potentially make strict mode compliance a requirement for any audits on third-party modules. Strict mode can be enabled by adding the "use strict" pragma at the beginning of a function, before any other expressions as shown in the following code: function sayHello(name) { "use strict"; // enables strict mode for this function scope console.log("hello", name); } In Node.js, all the required files are wrapped with a function expression that handles the CommonJS module API. As a result, you can enable strict mode for an entire file, by simply putting the directive at the top of the file. This will not enable strict mode globally, as it would in an environment like the browser. Strict mode makes many changes to the syntax and runtime behavior, but for the sake of brevity we will only discuss changes relevant to application security. First, scripts run via eval() in strict mode cannot introduce new variables to the enclosing scope. This prevents leaking new and possibly conflicting variables into your code, when you run eval() as shown in the following code: "use strict"; eval("var a = true"); console.log(a); // ReferenceError thrown – a does not exist In addition, the code run via eval() is not given access to the global object through its context. This is similar, if not related, to other changes for function scope, which will be explained shortly, but this is specifically important for eval(), as it can no longer use the global object to perform additional black magic. It turns out that the eval() function is able to be overridden in JavaScript. It can be accomplished by creating a new global variable called eval, and assigning something else to it, which could be malicious. Strict mode prohibits this type of operation. It is treated more like a language keyword than a variable, and attempting to modify it will result in a syntax error as shown in the following code: // all of the examples below are syntax errors "use strict"; eval = 1; ++eval; var eval; function eval() { } Next, the function objects are more tightly secured. Some common extensions to ECMAScript add the function.caller and function.arguments references to each function, after it is invoked. Effectively, you can "walk" the call stack for a specific function by traversing these special references. This potentially exposes information that would normally appear to be out of scope. Strict mode simply makes these properties throw a TypeError remark, while attempting to read or write them, as shown in the following code: "use strict"; function restricted() { restricted.caller; // TypeError thrown restricted.arguments; // TypeError thrown } Next, arguments.callee is removed in strict mode (such as function.caller and function.arguments shown previously). Normally, arguments.callee refers to the current function, but this magic reference also exposes a way to "walk" the call stack, and possibly reveal information that previously would have been hidden or out of scope. In addition, this object makes certain optimizations difficult or impossible for JavaScript engines. Thus, it also throws a TypeError exception, when an access is attempted, as shown in the following code: "use strict"; function fun() { arguments.callee; // TypeError thrown } Lastly, functions executed with null or undefined as the context no longer coerce the global object as the context. This applies to eval() as we saw earlier, but goes further to prevent arbitrary access to the global object in other function invocations, as shown in the following code: "use strict"; (function () { console.log(this); // => null }).call(null); Strict mode can help make the code far more secure than before, but ECMAScript 5 also includes access control through the property descriptor APIs. A JavaScript engine has always had the capability to define property access, but ES5 includes these APIs to give that same power to application developers. Summary In this article, we examined the security features that applied generally to the language of JavaScript itself, including how to use static code analysis to check for many of the aforementioned pitfalls. Also, we looked at some of the inner workings of a Node.js application, and how it differs from typical browser development, when it comes to security. Resources for Article: Further resources on this subject: Setting up Node [Article] So, what is Node.js? [Article] Learning to add dependencies [Article]
Read more
  • 0
  • 0
  • 11391

article-image-advanced-system-management
Packt
25 Oct 2013
11 min read
Save for later

Advanced System Management

Packt
25 Oct 2013
11 min read
(For more resources related to this topic, see here.) Beyond backups Of course, backups are not the only issue with managing multiple, remote systems. In particular, managing such multiple configurations using a centralized application is often desirable. Configuration management One of the issues frequently faced by administrators is that of having multiple, remote systems all with similar software for the most part, but with minor differences in what is installed or running. Debian provides several packages that can help manage such an environment in a unified manner. Two of the more popular packages, both available in Debian, are FAI and Puppet. While we don't have the space to go into details, both applications are described briefly here. Fully Automated Installation Fully Automated Installation (FAI) focuses on managing Linux installations, and is developed using Debian, although it works with many different distributions, not just Debian. FAI uses a class concept for categorizing similar systems, and provides a good deal of flexibility and customization via hooks. FAI provides for unattended, automatic installation as well as tools for monitoring and updating groups of systems. FAI is frequently used for creating and maintaining clusters. More information is available at http://fai-project.org/. Puppet Probably the best known application for distributed management is Puppet, developed by Puppet Labs. Unlike FAI, only the Open Source edition is free, the Enterprise edition, which has many additional features, is not. Puppet does include support for environments other than Linux. The desired configuration is described in a custom, high-level definition language, and distributed to systems with installed clients. Unlike FAI, Puppet does not provide its own bare metal remote installation method, but does use existing methods (such as kickstart) to provide this function. A number of companies that make heavy use of distributed and clustered systems use Puppet to manage their environments. More information is available at http://puppetlabs.com/. Other packages There are other packages that can be used to manage a distributed environment, such as Chef and BCFG2. While simpler than Puppet or FAI, they support similar functions and have been used in some distributed and clustered environments. The use of FAI, Puppet, and others in cluster management warrants a brief look at clustering next, and what packages in Debian support clustering. Clusters A cluster is a group of systems that work together in such a way that the whole functions as a single unit. Such clusters can be loosely coupled or tightly coupled. A loosely coupled environment, each system is complete in itself, and can handle all of the tasks any of the other systems can handle. The environment provides mechanisms for redundancy, load sharing, and fail-over between systems, and is often called a High Availability (HA) cluster. In a tightly coupled environment, the systems involved are highly dependent on one another, often sharing memory and disk storage, and all work on the same task together. The environment provides mechanisms for data sharing, avoiding storage conflicts, keeping the systems in synchronization, and splitting up tasks appropriately. This design is often used in super-computing environments. Clustering is an advanced technique that involves more than just installing and configuring software. It also involves hardware integration, and systems and network design, and implementation. Along with the URLs mentioned below, a good text on the subject is Building Clustered Linux Systems, by Robert W. Lucke, Prentice Hall. Here we will only touch the very basics, along with what tools Debian provides. Let's take a brief look at each environment, and some of the tools used to create them. High Availability clusters Two primary functions are required to implement a high availability cluster: A way to handle load balancing and individual host fail-over. A way to synchronize storage so that all servers provide the same view of the data they serve. Debian includes meta packages that bring together software from the Linux High Availability project, including cluster-agents and resource-agents, two of the higher-level meta packages. These packages install various agents that are useful in coordinating and managing load balancing and fail-over. In some cases, a master server is designated to distribute the processing load among other servers. Data synchronization is handled by using shared storage and any of the filesystems that provide for multiple accesses and shared files, such as NFS or AFS. High Availability clusters generally use standard software, along with software that is readily available to manage the dynamics of such environments. Beowulf clusters In addition to the considerations for High Availability clusters, more tightly coupled environments such as Beowulf clusters also require an infrastructure to manage and distribute computing tasks. There are several web pages devoted to creating a Beowulf cluster using Debian as well as packages that aid in creating such a cluster. One such page is https://wiki.debian.org/StartaBeowulf, a Debian Wiki page on Beowulf basics. The manual for FAI also has a section on creating a Beowulf cluster. Books are available as well. Debian provides several packages that are helpful in building such a cluster, such as the OpenMPI libraries for message passing, and various utilities that run commands on multiple systems, such as those in the kadif package. There are even projects that have released scripts and live CDs that allow you to set up a cluster quickly (one such project is the PelicanHPC project, developed for Debian Lenny, hosted at http://www.pelicanhpc.org/. This type of cluster is not something that you can set up and go. Beowulf and other tightly coupled clusters are intended for highly parallel computing, and the programs that do the actual computing must be designed specifically for such an environment. That said, some packages for specific parallel computations do exist in Debian, such as nwchem, which provides several applications for computational chemistry that take advantage of parallelism. Common tools Some common components of clusters have already been mentioned, such as the OpenMPI libraries. Aside from the meta-packages already mentioned, the redhat-cluster suite of tools is available in Debian, as well as many useful libraries, scheduling tools, and failover tools such as booth. All of these can be found using apt-cache or Synaptic by searching for "cluster". Webmin Many administrators will never have to administer a cluster, and many won't be responsible for a large number of systems requiring central backup solutions. However, even administering a single system using command line tools and text editors can be a chore. Even clusters sometimes require administrative tasks on individual systems. Fortunately, there is an application that can ease many administrative tasks, is easy to use, and can handle many aspects of Linux administration. It is called Webmin. Up until Debian Sarge, Webmin was a part of Debian distributions. However, the Debian developer in charge of packaging it had difficulty keeping up with the frequent releases, and it was eventually dropped from Debian. However, the upstream Webmin developers maintain current packages that install cleanly. Some users have reported issues because Webmin does not always handle configuration files exactly as Debian intends, but it most certainly attempts to handle them in a compatible manner, and while some users have experienced problems with upgrades, many administrators are quite happy with Webmin. As long as you are willing to deal with conflicts during upgrades, or restrict use of modules that have major configuration impacts, you will find Webmin quite useful. Installing Webmin Webmin may be installed by adding the following lines to your apt sources file: deb http://download.webmin.com/download/repository sarge contrib deb http://webmin.mirror.somersettechsolutions.co.uk/repository sarge contrib Usually, this is added to a separate webmin.list file in /etc/apt/sources.list.d. The use of 'sarge' for the release name in the configuration is not a mistake. Since Webmin was dropped after the Sarge release (Debian 3.1), the developers update the repository as it is and haven't bothered changing it to keep up with the Debian code names. However, the versions available in the repository are compatible with any Debian release since 3.1. After updating your cache file, Webmin can be installed and maintained using apt-get, aptitude, or Synaptic. Also, if you request a Webmin upgrade from within Webmin itself on a Debian system, it will use the proper Debian package to upgrade. Using Webmin Webmin runs in the background, and provides an HTTP or HTTPS server on localhost port 10,000. You can use any web browser to connect to http://localhost:10000/ to access Webmin. Upon first installation, only the root user or those in a group allowed to use sudo to access the root account, may log in but Webmin users can be managed separately or in conjunction with local users. Webmin provides extensive and easy to understand menus and icons for various configuration tasks. Webmin is also highly modular and extensible, and an extensive list of standard modules is included with the base package. It is not possible to cover Webmin as fully here as it deserves, but a short list of some of its capabilities includes: Configuration of Webmin itself (the server, users, modules, and security) Local system user and password management Filesystem management Bootup and service management CRON job management Software updates Basic filesystem backups Authentication and security configuration APACHE, DNS, SSH, and FTP (if you're using ProFTP) configuration User mail management Qmail or sendmail configuration Network and Firewall configuration and management Bandwidth monitoring Printer management There are even modules that apply to clusters. Also, Webmin can search and allow access to other Webmin servers on the local network or you can define remote servers manually. This allows a central Webmin server, installed on a particular system, to be the gateway to all of the other servers in your environment, essentially providing a single point of access to manage all Webmin enabled servers. Webmin and Debian Webmin understands the configuration file layout of many distributions. The main problem is when a particular module does not handle certain types of configuration in the way the Debian developers prefer, which can make package upgrades somewhat difficult. This can be handled in a couple of ways. Most modules provide a means to edit configuration files directly, so if you have read the Debian documentation you can modify the configuration appropriately to use Debian specific configuration techniques. Or, you may choose to allow Webmin to modify files as it sees fit, and handle any conflicts manually when you upgrade the software involved. Finally, you can avoid those modules involved with specific software that are more likely to cause problems. One such module is Apache, which doesn't use links from sites-enabled to sites-available. Rather, it configures directly in the sites-enabled directory. Some administrators create the configuration in Webmin, and then move and link the files. Others prefer to manually configure Apache outside of Webmin. Webmin modules are constantly changing, and some actually recognize the Debian file layouts well, so it is not possible to give a comprehensive list of modules to avoid at this time. Best practice when using Webmin is to read the documentation and check the configuration files for specific software prior to using Webmin. Then, after configuring with Webmin, check the files again to determine whether changes may be required to work within the particular package's Debian configuration framework. Based upon this, you can decide whether to continue to configure using Webmin or switch back to manual configuration of that particular software. Webmin security Security is always a concern when remote access to a system is involved. Webmin handles this by requiring authentication and providing for detailed access restrictions that provide a layer of control beyond the firewall. Webmin users can be defined separately, or certain local users can be designated. Access to the various modules in Webmin can be restricted to certain users or groups of users, and detailed logs of Webmin actions are kept. Usermin In addition to Webmin, there is a server called Usermin which may be installed from the same repository as Webmin. It allows individual users to perform a number of functions more easily, such as changing their password, accessing their files, read and manage their email, and managing some aspects of their user profile. It is also modular and has the same security features as Webmin. Summary Several powerful and flexible central backup solutions exist that help manage backups for multiple remote servers and sites. Debian provides packages that assist in building High Availability and Beowulf style multiprocessing clusters as well. And, whether you are involved in managing clusters or not, or even a single system, Webmin can ease an administrator's tasks. Resources for Article: Further resources on this subject: Customizing a Linux kernel [Article] Microsoft SharePoint 2010 Administration: Farm Governance [Article] Testing Workflows for Microsoft Dynamics AX 2009 Administration [Article]
Read more
  • 0
  • 0
  • 5659
Modal Close icon
Modal Close icon