Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-introduction-parallel-programming-and-cuda-sample-code
Packt
16 Sep 2010
3 min read
Save for later

Introduction to Parallel Programming and CUDA with Sample Code

Packt
16 Sep 2010
3 min read
To give an example, let’s say we have an array that contains thousands of floating-point integers and each value needs to be run through a lengthy algorithm. Instead of running each value through the algorithm consecutively (i.e. one at a time), parallelism allows multiple values to be processed simultaneously (i.e. running many values through the algorithm at the same time), reducing overall processing time and producing fast and accurate results. There are some restrictions with using parallelism and not every program can be done in parallel. For instance, let’s say we have that same program from before but this time as we process a value we want to then check the currently processed value against all the previously calculated values in the array, before going to the next. We can confidently say all previous values in the array have been processed and are available to be accessed for the check. If we tried to do this in parallel, we could have incorrect data because multiple values are calculated at the same time and some may be ready for checking while others are not. Extra checks and steps are needed to prevent these types of concurrency issues. However, the results could still prove to be worth the extra steps. One of the major breakthroughs in parallel programming technology today goes beyond the scope of just multi-core CPU’s. Although they do offer a lot more power and potential than single-core units, another common computer component, the GPU, offers even more power, and NVIDIA’s flagship product, called CUDA, offers this technology to all developers easily and for free. CUDA was developed by NVIDIA to provide simple access to GPGPU (General-Purpose Computation on Graphics Processing Units) and parallel computing on their own GPU’s. The logic behind the idea is that GPU’s have much more processing power than CPU’s and have numerous cores that operate in parallel to run intensive graphics operations. By allowing developers to utilize this power for their own projects, it can create fast solutions to some heavy and time-consuming programs, specifically those that run the same process recursively and independently of other processes. The learning curve is not very steep for most developers. CUDA accomplishes making GPGPU easily usable by adding functionality to the standard C and C++ programming languages. This allows for fast adoption by almost any programmer and helps with cross-platform integration. To get started with CUDA, you will need a recent NVIDIA GPU (Geforce 8 series and beyond, or you can check on the NVIDIA website to see which GPU’s are CUDA enabled). CUDA works on Windows, Mac OSX, and certain Linux distributions. You will need to download and install the developer drivers, the CUDA toolkit, and the CUDA SDK off the Nvidia website, respectively. NVIDIA provides an installation guide on their website which provides more details about the installation process, as well as a method of checking the installation to see if it is working.
Read more
  • 0
  • 0
  • 3555

article-image-watching-multiple-threads-c
Packt
15 Oct 2009
6 min read
Save for later

Watching Multiple Threads in C#

Packt
15 Oct 2009
6 min read
We can use the BackgroundWorker component and then the Thread class to create new threads independent of the main application thread. The applications can respond to UI events, while the processing continues, and take full advantage of multiple cores, and can thus run faster. However, we are used to debugging applications that run in just one thread (the main thread), and there are many changes in the debugging process that generate great confusion when following the classic procedures running many concurrent threads. How can we successfully debug applications that are running many concurrent threads? Time for action – Understanding the difficulty in debugging concurrent threads Your cellular phone rings! The FBI agents have detected a problem with an encryption engine. When the application receives the same messages many times during a certain period, the encryption process generates exactly the same results, as shown in the following image: Thus, hackers could easily break the code once they discover this important bug. They ask for your help. Of course, you want to cooperate because you do not want the FBI agents to get angry with you. However, you need to debug the multithreaded encryption engine, and you have never done that! Let's create a solution for this problem! First, we are going to try to debug the multithreaded application the same way we do with a single-threaded application to understand the new problems we might face: Open the project, SMSEncryption. Define a breakpoint in the line int liThreadNumber = (int)poThreadParameter; in the ThreadEncryptProcedure procedure code. Press F5 or select Debug | Start Debugging in the main menu. Enter or copy and paste a long text (with more than 5,000 lines) in the Textbox labeled Original SMS Messages and click on the Run in a thread button. The line with the breakpoint defined is shown highlighted as the next statement that will be executed. Press F10 or select, Debug | Step Over in the main menu two or three times (depending on the number of cores you have in the computer). As you can see, the next statement that gets executed is the same even when you try to go on with the next one. It seems that the statement is not being executed. However, inspecting the value of poThreadParameter (the parameter passed to the ThreadEncryptProcedure procedure) shows that it changes each time you step over the statement, as shown in the following image: Stop the application and repeat the steps 1 to 5 to make sure you are not crazy because of parallelism, multithreading, and the FBI agents! What just happened? You are getting nervous about the debugging process! Do not worry. We will learn how to debug your encryption engine while the FBI agents kindly prepare a cup of fresh cappuccino for you. The debugger executed each new Thread class instance call to the Start method, with this line: prloThreadList[liThreadNumber].Start(liThreadNumber); Then, it entered in the ThreadEncryptProcedure method (we have used the same method for every created encryption thread) with different values for the poThreadParameter parameter. Therefore, you stayed in the same statement as many times as the threads were created (equivalent to the number of cores available in the computer) in the following line: int liThreadNumber = (int)poThreadParameter; As we can see, debugging this way is very confusing, because the IDE switches from one thread to another, and you loose control over the statements that are going to be executed next. In a debugging process, you need to know in which part of the application you are. As we tested our first attempt to debug a multithreaded application, we tried the same technique as with single-threaded applications. There are new subjects to learn and new techniques to use. Debugging concurrent threads When we need to inspect values, execute a procedure step-by-step, and find solutions to problems related to some specific code, the best way to achieve that with a multithreaded application is to work with it as a single-threaded application. But, how can we do that? It is very simple. We must run one thread at a time and freeze the other concurrent threads while we are debugging the thread in which we are interested and on which we are focusing. When we debug single-threaded applications, we are aware of the method in which we are positioned and its context. In multithreaded applications, we must also be aware of the thread in which we are positioned. If we do not know in which thread we are executing statements, we will be completely confused in just a few seconds, as happened in our previous activity. We must tailor our multithreaded applications to simplify the debugging process. If we do not do this, the debugging process will be a nightmare. Indeed, we do not want that to happen! Time for action – Finding the threads You wonder where the threads are. How can you guess in which thread you are working while executing the application step-by-step? You are an excellent C# programmer, but multithreaded debugging is very confusing. You do not want the FBI agents to realize that you are in trouble. However, you must hurry up, because they have a great training in detecting nervous people in the course of their usual interrogations. Now, we are going to use the IDE features to help us find the threads in a multithreaded application: Using the same project that we used in the previous example, with the same breakpoint defined, press F5 or select Debug | Start Debugging in the main menu. Enter or copy and paste a long text (with more than 5,000 lines) in the Textbox labeled Original SMS Messages and click on the Run in a thread button. The line with the breakpoint defined is shown highlighted as the next statement that will be executed. Select Debug | Windows | Threads in the main menu or press Ctrl + Alt + H. The Threads window will be shown, displaying all the threads created by the application process, as shown in the following image: The yellow arrow in the left of the thread list points out the current thread—the thread for which the IDE is showing the current statement. Press F10 or select Debug | Step Over in the main menu. As you can see, the next statement is the same again, but the current thread pointed out in the thread list changes, as shown in the following image: Go on running the application step-by-step and watch how the current thread changes. Observe the Threads window throughout your debugging process. What just happened? You found the threads in the debugging process. Now, you believe you will be able to make the necessary changes to the application if you learn a few debugging techniques quickly. The Threads window displays the list of threads created by the application process. Many of them are created automatically by the C# runtime. The others are created by the Thread class instances and the BackgroundWorker component we have in the application. Using the Threads window, we can easily determine in which thread we are executing when debugging a multithreaded application. It is indeed very helpful. Remember that each thread has its own stack.  
Read more
  • 0
  • 0
  • 3554

article-image-file-information-perforce-crm
Packt
24 Sep 2013
17 min read
Save for later

File Information in Perforce CRM

Packt
24 Sep 2013
17 min read
(For more resources related to this topic, see here.) File properties Every file has properties. Some of these properties relate to how files are represented in your workspace. Some relate to the history of the file. Still others will impact how Perforce can manage that file. While various file properties are exposed in different parts of the P4V interface, the Files tab in the view pane provides you with the maximum amount of detail with the minimum effort. The following screenshot shows a file selected in the tree panel along with the information that would be presented in the Files tab: The Files tab presentation has two primary sections. The upper section presents information for the files in a folder using a tabular format. This provides a useful, easy to use, and comparative presentation. The lower section provides information specific to a single file selected in the upper tabular section. This provides you with the ability to focus on additional details once you've identified files of interest using the tabular upper section. The tabular section presents a selected set of the information available for a file. As you'll see shortly, the information selected is under your control. The Details tab provides all of the available information, and shows in addition the explicit mapping between the Workspace and Depot locations of the file. These locations can be seen in various parts of the P4V interface. The Checked Out By tab provides details about all of the workspaces that have a file checked out. These are the full details. You will see summaries of this information in the tree panel hover tips. The Preview tab shows the contents of text files. This is particularly useful when browsing repository files that are not currently present in your workspace. Customizing the tabular display While the tabular section of the Files tab is useful, without order it is still just a collection of data. P4V allows you to sort on any of the displayed columns. It also allows you to select which columns are displayed to help you focus on important factors. You can reorder the columns by clicking on a column header, dragging it, and dropping it in the position you desire. You can also select the column displayed in the table by right-clicking on the header bar and then checking the columns you want displayed, as shown in the following screenshot: Note the triangle indicator in the Name column header. This specifies that the table entries are sorted based on the value in the Name column. A triangle pointing upwards sorts in ascending order, a triangle pointing downwards sorts in descending order. Sorting on either the Latest Changelist or Date Last Submitted columns allows you to quickly identify more recently or least recently changed files. Explaining the # characters The # character is the Perforce notation indicating the file revision. So foo.txt#3 indicates revision (or version) 3 of the file foo.txt. Using this notation saves a lot of screen space and reduces visual clutter. It's also the versioning notation you'll see in the log panel or which you would use on the command line. But what about things like #5/5, #3/5, #0/5 or even #25 of 25? Not to worry, long division and set theory are not involved. This is simply a notation that indicates the version of the file in your workspace and the maximum number of file versions the server knows about. So #5/5 would indicate that you have version 5 of the file in your workspace and that is the most up-to-date version the server knows about. Likewise, #3/5 would indicate that you have version 3 of the file in your workspace, yet the server knows about two more recent versions: 4 and 5. While #0/5 indicates that you have no version of the file in your workspace. Finally, the #25 of 25 notation has replaced the "/" with "of" to increase readability. 0 is the workspace version of deleted files and files marked for add, but not yet submitted. Showing deleted files The deleted files are visible in the depot tree only if you turn on the filter to show them, as we see in the following screenshot: The filter Show Deleted Depot Files has been checked. As a result, we can see a file execmac.cwhere the latest revision is 2, and the tooltip (as well as the icon) shows us that it is deleted at head revision. As we will discuss later in this article, the head revision is the latest revision in the repository. This filter is not available if you have selected the Workspace tab since deleted files do not exist locally. Type and filetype P4V understands two separate type concepts relating to files. Knowing the difference and how they apply will come in handy. One concept is known as type and refers to an association based on the file name extension. This is managed by the operating system on the local workstation. For example, Windows might associate the .docx extension with the Microsoft Word application. The server maintains no information about type. By default, this relationship is used by P4V to select the tool to use when it opens a file for you. For example, if you ask P4V to open a file called foo.html and you have a web development tool installed, it may launch that tool. If you don't have such a tool, then it might launch a browser. Not to worry, you can override this behavior. The other concept is known as filetype, and refers to the information Perforce uses to define workspace population characteristic, control P4V processing, and define server storage parameters. The server maintains filetype information on a per-revision basis. When a file is first added it is assigned a filetype. That filetype persists until it is explicitly changed. Most of the time, you will not need to worry about specifying a particular filetype. P4V uses several techniques to assure that the appropriate filetype is established when a file is first added (including defaults configured by your administrator). Of course you can always explicitly specify a filetype, but that is for advanced users. The two basic filetypes are text and binary. Perforce knows that text files have line endings. When it populates your workspace with a text file it automatically adapts the line endings within the file to the encoding appropriate for your operating system. It is also understood that text files can be compared in ways that humans can understand. On the other hand, binary files are treated as a collection of bytes. They receive no special processing when populating your workspace, and it is understood that they can't usually be compared in ways that humans would understand. Filetype can also specify other attributes of files, such as making files always writable in workspaces. They can also set the executable bit on Linux/Unix for shell scripts or executable binaries. Understanding file versions and history In this article, we are going to first look at how to get older revisions of files into our workspace and how to understand the history of a file or a set of files. That includes understanding how file revisions relate to changelists, and the state of the repository as of a particular date and time. We often need to have a look at older versions of files. When looking at a project, you may need to reproduce an older release, or understand a particular baseline which is a set of files or a complete folder or tree. Getting different revisions of files On the right-click menu for a file or folder there is a Get Revision…option, as shown in the following screenshot: Clicking on the preceding option brings up the following dialog: Various actions are possible with this dialog. For now, we will focus on choosing between Get latest revision and Specify revision using:. Get latest revision updates the workspace files with the highest revision of the files known to the server. This is also known as getting or syncing to the head revision. The head version is a special version with its own reference #head. You will see #head in the log pane and in various error and status reports. Specify revision using: updates the workspace files with either explicit revisions or revisions that are implied from a context such as date/time, label, or changelist. Don't worry that you don't know what all of the Specify revision using: choices are or what they do. Take a moment to scan through the choices. Note the Browse… results. They are designed to reduce typing errors to get you the specific revision you're interested in. How file revisions relate to changelists The following screenshot shows the history of fileos2.c using the Revision Graph tool, which is available on the right-click menu for the currently selected file within the tree pane (right-click, then select Revision Graph). Note that for clarity we have turned off various panes within the tool to show the minimum necessary, and included the Legend option to explain the shapes: This shows us the association between revisions of a file and the changelists in which those revisions were submitted to the repository. In this example, revision 3 was part of changelist 294, and revision 2 was part of changelist 279. The file was first created and thus added to the repository as revision 1 in changelist 220. It is fairly clear that a Get latest would give us revision 3 of the file in our workspace. Similarly, if we Specify revision using: changelist 279 we would get revision 2 of the file. So what would you expect if we chose changelist 280? Doing a get which specifies a changelist gives us the state of the repository at the point of that changelist. In this case, it will give us revision 2. Revision 2 is the latest revision up to or including the specified changelist. In fact for this file, getting any changelist between 279 and 293 inclusive, will give us revision 2. What would we expect if you do a get which specifies changelist 100? The preceding history shows us that this file did not exist in the repository as of that changelist. Therefore, a get of that changelist would remove the file from our workspace because it didn't exist as of that changelist. Using Perforce terminology, we would have revision 0, or the revision which does not exist, in our workspace. Removing version 0 files from a workspace is designed to avoid the inevitable user errors that would occur if you had to delete the file yourself. Potentially surprising get revision results! Following on from the preceding examples, what would you expect if you tried to get changelist 9999999 or some equally large number, for which a changelist doesn't yet exist? Hopefully, it is not a big surprise that we just get the latest revision of the file. In the preceding case, revision 3. As we saw previously, even though the changelist doesn't directly contain this file, the file still associates revisions with every changelist, even those that don't yet exist. This large changelist technique is actually just an obscure version of get latest. Obscure is not usually a good engineering practice. But watch out! What will happen if we try to get revision 5 of this file? For this file, revision 5 does not exist. Therefore the get request treats this as if you requested revision 0 so it will remove the file from the workspace. Be very careful getting more than one file as of a specific version number. When more than one file is involved, this gets that version of every file that has that many versions or more and removes (#0) the rest. This is usually not what you want. Changelists and folders People don't always realize that the Revision Graph… right-click option can also apply to folders. The folder view shows all of the files at one time, as seen in the following screenshot: In the preceding screenshot, the cursor is hovering over changelist 370, and the 3 files in the folder that had new versions created by that changelist are highlighted. What would we get if we synced the entire folder to changelist 370? By default, we would get the files in that particular changelist and the version for every other file in the folder as of that changelist. In the preceding screenshot that would be version 3 of Jamfile.html, which was submitted in changelist 369. But, index.html at version 0 would be removed from the workspace because it didn't exist until changelist 383. Therefore, we can see that a changelist can be used in two ways: To refer to the set of files submitted in that changelist To refer to the state of the entire repository (or any part of it such as a folder), up to and including that changelist When waxing philosophical, we sometimes refer to this as the particle nature versus the wave nature of changelists! Get revision options We haven't yet covered the options in the Get Revision… dialog shown in the following screenshot. They provide functionality that can be difficult to achieve using other techniques. The preceding screenshot shows the output on clicking Preview. We see what the results would be, without actually affecting anything in our workspace. This can help avoid surprises and perhaps prevent unnecessary mistakes. The Force Operation (replace file even if you already have the revision specified)checkbox, if checked, will update the files in our workspace to the specified revisions, even if P4V thinks we already have those revisions. You need to force operations when files have been removed from the workspace without coordinating that removal with Perforce. Following on from the discussion in the previous section about a Get Revision, always getting files up to or including the changelist, we can modify this behavior by checking Only get revisions for files listed in changelist. With this option you update your workspace with only the file versions created by that changelist. If the following changelist is discovered to cause a problem, then it can be very useful to get a workspace into the state immediately before that changelist. Referencing a specific date and/or time The Get Revision dialog allows us to specify a date and time, as shown in the following screenshot: Every submitted changelist has an associated date and time of submission. The value specified in this dialog will give the same results as if you were to do a get specifying the changelist that was submitted precisely at or most closely before the specified date and time. Most people prefer to reference using changelist numbers rather than date and time. Changelists are unique and unambiguous. More than one changelist may be submitted within the one second granularity of date and time tracking. And communicating date and time across time zones is error prone at best. Referencing a label As stated in the preceding section, we can also do a get relative to a particular label: You can type the name of the label in directly. However, most people prefer to select an appropriate label from the choices presented when you click on the Browse…button. Most Perforce users don't use labels. They use changelists. Files in another workspace You can also reference file versions in another workspace: As previously stated, we can type the workspace name in the field or browse to select a workspace. This feature is most useful when you're trying to resolve scenarios where it works/ fails in my workspace. Most people use it with Preview so that they don't impact their current workspace. However, it does require that both workspaces reference the same depot files. This is only likely to be true for continuous build and code review environments. Depot paths The path of currently selected files or folders is shown in the address bar at the top of the screen, as per the following screenshot: You can turn off the address bar by right-clicking on it and unchecking the option. There is also a shortcut key (Ctrl + C) which copies the full path of the currently selected file or folder to the clipboard. This can be useful for documentation purposes. Depending on whether you have the depot or workspace tab showing, you might get: //depot/Jam/MAIN/src/Build.com Or, depending on your workspace definition: c:workbruno_mainJamMAINsrcBuild.com Finding files – an introduction to wildcards It is easy to find files with certain characters in their names by going to the Search | Find File… menu option. This will give you the Find File tab, as shown in the following screenshot: In the preceding screenshot, we had a particular folder selected, and that is copied to the Search in: field. Sharp eyed readers will have noticed that the full path contains some extra characters: //depot/Jam/MAIN/.... In this instance, P4V has added the ... wildcard to the directory path (this wildcard is also known as an ellipsis). This wildcard means that all subfolders should be included in the search. We will look at more wildcard options in the next section of this article. You can also drag-and-drop directory paths from the tree pane on the left-hand side to the Search in field and the contents will be appropriately updated. If we enter, for example, unix into the Names matches any of the following field and click on Find, we might get results as shown in the following screenshot: In this particular example, we can see that there are three files in that folder (or in any subfolders) containing unix in their filename. By default, the results do not include any files where the latest revision is marked as deleted. You can change this by checking Include deleted depot files. Note that it is possible to filter the results further using the Submission date or changelist option:, for example: When working with dates you can search for particular periods of time, as seen here: We invite you to explore this powerful feature on your own! Showing history The history of both files and folders can be seen via the History tab. You can go to the View | History menu option or click on the appropriate toolbar icon. This tab can show the history for the currently selected file or folder in the tree pane. You can right-click on a file in the tree pane and select File History. If you have a folder selected the menu option is Folder History. File history The history for a file might be: In this example, we can see the basic history information for the selected file. For each revision we see information such as Changelist and Date Submitted. As previously seen with the File Properties tab, by right-clicking the column headers you can select other columns to be displayed. When this tab is displayed, clicking to select another file in the tree pane will cause the tab to be updated dynamically. The icon for revision 8 in the preceding screenshot has a highlight around it, this shows that this revision is the one currently present in the workspace. The History tab also has a details tab which can be shown if you wish, by dragging the splitter bar upwards with the mouse, as shown here: Folder history For a folder, the History tab shows all changelists that have affected files in that folder or any of its subfolders. This acts in the same way for a depot. An example is shown here: Notice that there is no column for revisions as they only apply to files, not to folders. If you drag-and-drop a column header onto another one (left or right), you can change the order they are displayed in.
Read more
  • 0
  • 0
  • 3547

article-image-create-quick-application-cakephp-part-1
Packt
17 Nov 2009
9 min read
Save for later

Create a Quick Application in CakePHP: Part 1

Packt
17 Nov 2009
9 min read
The ingredients are fresh, sliced up, and in place. The oven is switched on, heated, and burning red. It is time for us to put on the cooking hat, and start making some delicious cake recipes. So, are you ready, baker? In this article, we are going to develop a small application that we'll call the "CakeTooDoo". It will be a simple to-do-list application, which will keep record of the things that we need to do. A shopping list, chapters to study for an exam, list of people you hate, and list of girls you had a crush on are all examples of lists. CakeTooDoo will allow us to keep an updated list. We will be able to view all the tasks, add new tasks, and tick the tasks that are done and much more. Here's another example of a to-do list, things that we are going to cover in this article: Make sure Cake is properly installed for CakeTooDoo Understand the features of CakeTooDoo Create and configure the CakeTooDoo database Write our first Cake model Write our first Cake controller Build a list that shows all the tasks in CakeTooDoo Create a form to add new tasks to CakeTooDoo Create another form to edit tasks in the to-do list Have a data validation rule to make sure users do not enter empty task title Add functionality to delete a task from the list Make separate lists for completed and pending Tasks Make the creation and modification time of a task look nicer Create a homepage for CakeTooDoo Making Sure the Oven is Ready Before we start with CakeTooDoo, let's make sure that our oven is ready. But just to make sure that we do not run into any problem later, here is a check list of things that should already be in place: Apache is properly installed and running in the local machine. MySQL database server is installed and running in the local machine. PHP, version 4.3.2 or higher, is installed and working with Apache. The latest 1.2 version of CakePHP is being used. Apache mod_rewrite module is switched on. AllowOverride is set to all for the web root directory in the Apache configuration file httpd.conf. CakePHP is extracted and placed in the web root directory of Apache. Apache has write access for the tmp directory of CakePHP. In this case, we are going to rename the Cake directory to it CakeTooDoo. CakeTooDoo: a Simple To-do List Application As we already know, CakeTooDoo will be a simple to-do list. The list will consist of many tasks that we want to do. Each task will consist of a title and a status. The title will indicate the thing that we need to do, and the status will keep record of whether the task has been completed or not. Along with the title and the status, each task will also record the time when the task has been created and last modified. Using CakeTooDoo, we will be able to add new tasks, change the status of a task, delete a task, and view all the tasks. Specifically, CakeTooDoo will allow us to do the following things: View all tasks in the list Add a new task to the list Edit a task to change its status View all completed tasks View all pen Delete a task A homepage that will allow access to all the features. You may think that there is a huge gap between knowing what to make and actually making it. But wait! With Cake, that's not true at all! We are just 10 minutes away from the fully functional and working CakeTooDoo. Don't believe me? Just keep reading and you will find it out yourself. Configuring Cake to Work with a Database The first thing we need to do is to create the database that our application will use. Creating database for Cake applications are no different than any other database that you may have created before. But, we just need to follow a few simple naming rules or conventions while creating tables for our database. Once the database is in place, the next step is to tell Cake to use the database. Time for Action: Creating and Configuring the Database Create a database named caketoodoo in the local machine's MySQL server. In your favourite MySQL client, execute the following code: CREATE DATABASE caketoodoo; In our newly created database, create a table named tasks, by running the following code in your MySQL client: USE caketoodoo; CREATE TABLE tasks ( id int(10) unsigned NOT NULL auto_increment, title varchar(255) NOT NULL, done tinyint(1) default NULL, created datetime default NULL, modified datetime default NULL, PRIMARY KEY (id) ); Rename the main cake directory to CakeTooDoo, if you haven't done that yet. Move inside the directory CakeTooDoo/app/config. In the config directory, there is a file named database.php.default. Rename this file to database.php. Open the database.php file with your favourite editor, and move to line number 73, where we will find an array named $default. This array contains database connection options. Assign login to the database user you will be using and password to the password of that user. Assign database to caketoodoo. If we are using the database user ahsan with password sims, the configuration will look like this: var $default = array( 'driver' => 'mysql', 'persistent' => false, 'host' => 'localhost', 'port' => '', 'login' => 'ahsan', 'password' => 'sims', 'database' => 'caketoodoo', 'schema' => '', 'prefix' => '', 'encoding' => '' ); Now, let us check if Cake is being able to connect to the database. Fire up a browser, and point to http://localhost/CakeTooDoo/. We should get the default Cake page that will have the following two lines: Your database configuration file is present and Cake is able to connect to the database, as shown in the following screen shot. If you get the lines, we have successfully configured Cake to use the caketoodoo database. What Just Happened? We just created our first database, following Cake convention, and configured Cake to use that database. Our database, which we named caketoodoo, has only one table named task. It is a convention in Cake to have plural words for table names. Tasks, users, posts, and comments are all valid names for database tables in Cake. Our table tasks has a primary key named id. All tables in Cake applications' database must have id as the primary key for the table. Conventions in CakePHPDatabase tables used with CakePHP should have plural names. All database tables should have a field named id as the primary key of the table. We then configured Cake to use the caketoodoo database. This was achieved by having a file named database.php in the configuration directory of the application. In database.php, we set the default database to caketoodoo. We also set the database username and password that Cake will use to connect to the database server. Lastly, we made sure that Cake was able to connect to our database, by checking the default Cake page. Conventions in Cake are what make the magic happen. By favoring convention over configuration, Cake makes productivity increase to a scary level without any loss to flexibility. We do not need to spend hours setting configuration values to just make the application run. Setting the database name is the only configuration that we will need, everything else will be figured out "automagically" by Cake. Throughout this article, we will get to know more conventions that Cake follows.   Writing our First Model Now that Cake is configured to work with the caketoodoo database, it's time to write our first model. In Cake, each database table should have a corresponding model. The model will be responsible for accessing and modifying data in the table. As we know, our database has only one table named tasks. So, we will need to define only one model. Here is how we will be doing it: Time for Action: Creating the Task Model Move into the directory CakeTooDoo/app/models. Here, create a file named task.php. In the file task.php, write the following code: <?php class Task extends AppModel { var $name = 'Task'; } ?> Make sure there are no white spaces or tabs before the <?php tag and after the ?> tag. Then save the file. What Just Happened? We just created our first Cake model for the database table tasks. All the models in a CakePHP application are placed in the directory named models in the app directory. Conventions in CakePHP: All model files are kept in the directory named models under the app directory. Normally, each database table will have a corresponding file (model) in this directory. The file name for a model has to be singular of the corresponding database table name followed by the .php extension. The model file for the tasks database table is therefore named task.php. Conventions in CakePHP: The model filename should be singular of the corresponding database table name. Models basically contain a PHP class. The name of the class is also singular of the database table name, but this time it is CamelCased. The name of our model is therefore Task. Conventions in CakePHP: A model class name is also singular of the name of the database table that it represents. You will notice that this class inherits another class named AppModel. All models in CakePHP must inherit this class. The AppModel class inherits another class called Model. Model is a core CakePHP class that has all the basic functions to add, modify, delete, and access data from the database. By inheriting this class, all the models will also be able to call these functions, thus we do not need to define them separately each time we have a new model. All we need to do is to inherit the AppModel class for all our models. We then defined a variable named $name in the Task'model, and assigned the name of the model to it. This is not mandatory, as Cake can figure out the name of the model automatically. But, it is a good practice to name it manually.
Read more
  • 0
  • 0
  • 3547

article-image-dwr-java-ajax-user-interface-basic-elements-part-1
Packt
20 Oct 2009
16 min read
Save for later

DWR Java AJAX User Interface: Basic Elements (Part 1)

Packt
20 Oct 2009
16 min read
  Creating a Dynamic User Interface The idea behind a dynamic user interface is to have a common "framework" for all samples. We will create a new web application and then add new features to the application as we go on. The user interface will look something like the following figure: The user interface has three main areas: the title/logo that is static, the tabs that are dynamic, and the content area that shows the actual content. The idea behind this implementation is to use DWR functionality to generate tabs and to get content for the tab pages. The tabbed user interface is created using a CSS template from the Dynamic Drive CSS Library (http://dynamicdrive.com/style/csslibrary/item/css-tabs-menu). Tabs are read from a properties file, so it is possible to dynamically add new tabs to the web page. The following screenshot shows the user interface. The following sequence diagram shows the application flow from the logical perspective. Because of the built-in DWR features we don't need to worry very much about how asynchronous AJAX "stuff" works. This is, of course, a Good Thing. Now we will develop the application using the Eclipse IDE and the Geronimo test environment Creating a New Web Project First, we will create a new web project. Using the Eclipse IDE we do the following: select the menu File | New | Dynamic Web Project. This opens the New Dynamic Web Project dialog; enter the project name DWREasyAjax and click Next, and accept the defaults on all the pages till the last page, where Geronimo Deployment Plan is created as shown in the following screenshot: Enter easyajax as Group Id and DWREasyAjax as Artifact Id. On clicking Finish, Eclipse creates a new web project. The following screen shot shows the generated project and the directory hierarchy. Before starting to do anything else, we need to copy DWR to our web application. All DWR functionality is present in the dwr.jar file, and we just copy that to the WEB-INF | lib directory. A couple of files are noteworthy: web.xml and geronimo-web.xml. The latter is generated for the Geronimo application server, and we can leave it as it is. Eclipse has an editor to show the contents of geronimo-web.xml when we double-click the file. Configuring the Web Application The context root is worth noting (visible in the screenshot above). We will need it when we test the application. The other XML file, web.xml, is very important as we all know. This XML will hold the DWR servlet definition and other possible initialization parameters. The following code shows the full contents of the web.xml file that we will use: <?xml version="1.0" encoding="UTF-8"?> <web-app xsi_schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web- app_2_5.xsd" id="WebApp_ID" version="2.5"> <display-name>DWREasyAjax</display-name> <servlet> <display-name>DWR Servlet</display-name> <servlet-name>dwr-invoker</servlet-name> <servlet-class> org.directwebremoting.servlet.DwrServlet </servlet-class> <init-param> <param-name>debug</param-name> <param-value>true</param-value> </init-param> </servlet> <servlet-mapping> <servlet-name>dwr-invoker</servlet-name> <url-pattern>/dwr/*</url-pattern> </servlet-mapping> <welcome-file-list> <welcome-file>index.html</welcome-file> <welcome-file>index.htm</welcome-file> <welcome-file>index.jsp</welcome-file> <welcome-file>default.html</welcome-file> <welcome-file>default.htm</welcome-file> <welcome-file>default.jsp</welcome-file> </welcome-file-list> </web-app> DWR cannot function without the dwr.xml configuration file. So we need to create the configuration file. We use Eclipse to create a new XML file in the WEB-INF directory. The following is required for the user interface skeleton. It already includes the allow-element for our DWR based menu. <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE dwr PUBLIC "-//GetAhead Limited//DTD Direct Web Remoting 2.0//EN" "http://getahead.org/dwr/dwr20.dtd"> <dwr> <allow> <create creator="new" javascript="HorizontalMenu"> <param name="class" value="samples.HorizontalMenu" /> </create> </allow> </dwr> In the allow element, there is a creator for the horizontal menu Java class that we are going to implement here. The creator that we use here is the new creator, which means that DWR will use an empty constructor to create Java objects for clients. The parameter named class holds the fully qualified class name. Developing the Web Application Since we have already defined the name of the Java class that will be used for creating the menu, the next thing we do is implement it. The idea behind the HorizontalMenu class is that it is used to read a properties file that holds the menus that are going to be on the web page. We add properties to a file named dwrapplication.properties, and we create it in the same samples-package as the HorizontalMenu-class. The properties file for the menu items is as follows: menu.1=Tables and lists,TablesAndLists menu.2=Field completion,FieldCompletion The syntax for the menu property is that it contains two elements separated by a comma. The first element is the name of the menu item. This is visible to user. The second is the name of HTML template file that will hold the page content of the menu item. The class contains just one method, which is used from JavaScript and via DWR to retrieve the menu items. The full class implementation is shown here: package samples; import java.io.IOException; import java.io.InputStream; import java.util.List; import java.util.Properties; import java.util.Vector; public class HorizontalMenu { public HorizontalMenu() { } public List<String> getMenuItems() throws IOException { List<String> menuItems = new Vector<String>(); InputStream is = this.getClass().getClassLoader().getResourceAsStream( "samples/dwrapplication.properties"); Properties appProps = new Properties(); appProps.load(is); is.close(); for (int menuCount = 1; true; menuCount++) { String menuItem = appProps.getProperty("menu." + menuCount); if (menuItem == null) { break; } menuItems.add(menuItem); } return menuItems; } } The implementation is straightforward. The getMenuItems() method loads properties using the ClassLoader.getResourceAsStream() method, which searches the class path for the specified resource. Then, after loading properties, a for loop is used to loop through menu items and then a List of String-objects is returned to the client. The client is the JavaScript callback function that we will see later. DWR automatically converts the List of String objects to JavaScript arrays, so we don't have to worry about that. Testing the Web Application We haven't completed any client-side code now, but let's test the code anyway. Testing uses the Geronimo test environment. The Project context menu has the Run As menu that we use to test the application as shown in the following screenshot: Run on Server opens a wizard to define a new server runtime. The following screenshot shows that the Geronimo test environment has already been set up, and we just click Finish to run the application. If the test environment is not set up, we can manually define a new one in this dialog: After we click Finish, Eclipse starts the Geronimo test environment and our application with it. When the server starts, the Console tab in Eclipse informs us that it's been started. The Servers tab shows that the server is started and all the code has been synchronized, that is, the code is the most recent (Synchronization happens whenever we save changes on some deployed file.) The Servers tab also has a list of deployed applications under the server. Just the one application that we are testing here is visible in the Servers tab. Now comes the interesting part—what are we going to test if we haven't really implemented anything? If we take a look at the web.xml file, we will find that we have defined one initialization parameter. The Debug parameter is true, which means that DWR generates test pages for our remoted Java classes. We just point the browser (Firefox in our case) to the URL http://127.0.0.1:8080/DWREasyAjax/dwr and the following page opens up: This page will show a list of all the classes that we allow to be remoted. When we click the class name, a test page opens as in the following screenshot: This is an interesting page. We see all the allowed methods, in this case, all public class methods since we didn't specifically include or exclude anything. The most important ones are the script elements, which we need to include in our HTML pages. DWR does not automatically know what we want in our web pages, so we must add the script includes in each page where we are using DWR and a remoted functionality. Then there is the possibility of testing remoted methods. When we test our own method, getMenuItems(), we see a response in an alert box: The array in the alert box in the screenshot is the JavaScript array that DWR returns from our method. Developing Web Pages The next step is to add the web pages. Note that we can leave the test environment running. Whenever we change the application code, it is automatically published to test the environment, so we don't need to stop and start the server each time we make some changes and want to test the application. The CSS style sheet is from the Dynamic Drive CSS Library. The file is named styles.css, and it is in the WebContent directory in Eclipse IDE. The CSS code is as shown: /*URL: http://www.dynamicdrive.com/style/ */ .basictab{ padding: 3px 0; margin-left: 0; font: bold 12px Verdana; border-bottom: 1px solid gray; list-style-type: none; text-align: left; /*set to left, center, or right to align the menu as desired*/ } .basictab li{ display: inline; margin: 0; } .basictab li a{ text-decoration: none; padding: 3px 7px; margin-right: 3px; border: 1px solid gray; border-bottom: none; background-color: #f6ffd5; color: #2d2b2b; } .basictab li a:visited{ color: #2d2b2b; } .basictab li a:hover{ background-color: #DBFF6C; color: black; } .basictab li a:active{ color: black; } .basictab li.selected a{ /*selected tab effect*/ position: relative; top: 1px; padding-top: 4px; background-color: #DBFF6C; color: black; } This CSS is shown for the sake of completion, and we will not go into details of CSS style sheets. It is sufficient to say that CSS provides an excellent method to create websites with good presentation. The next step is the actual web page. We create an index.jsp page, in the WebContent directory, which will have the menu and also the JavaScript functions for our samples. It should be noted that although all JavaScript code is added to a single JSP page here in this sample, in "real" projects it would probably be more useful to create a separate file for JavaScript functions and include the JavaScript file in the HTML/JSP page using a code snippet such as this: <script type="text/javascript" src="myjavascriptcode/HorizontalMenu.js"/>. We will add JavaScript functions later for each sample. The following is the JSP code that shows the menu using the remoted HorizontalMenu class. <%@ page language="java" contentType="text/html; charset=ISO-8859-1" pageEncoding="ISO-8859-1"%> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> <link href="styles.css" rel="stylesheet" type="text/css"/> <script type='text/javascript' src='/DWREasyAjax/dwr/engine.js'></script> <script type='text/javascript' src='/DWREasyAjax/dwr/util.js'></script> <script type='text/javascript' src='/DWREasyAjax/dwr/interface/HorizontalMenu.js'></script> <title>DWR samples</title> <script type="text/javascript"> function loadMenuItems() { HorizontalMenu.getMenuItems(setMenuItems); } function getContent(contentId) { AppContent.getContent(contentId,setContent); } function menuItemFormatter(item) { elements=item.split(','); return '<li><a href="#" onclick="getContent(''+elements[1]+'');return false;">'+elements[0]+'</a></li>'; } function setMenuItems(menuItems) { menu=dwr.util.byId("dwrMenu"); menuItemsHtml=''; for(var i=0;i<menuItems.length;i++) { menuItemsHtml=menuItemsHtml+menuItemFormatter(menuItems[i]); } menu.innerHTML=menuItemsHtml; } function setContent(htmlArray) { var contentFunctions=''; var scriptToBeEvaled=''; var contentHtml=''; for(var i=0;i<htmlArray.length;i++) { var html=htmlArray[i]; if(html.toLowerCase().indexOf('<script')>-1) { if(html.indexOf('TO BE EVALED')>-1) { scriptToBeEvaled=html.substring(html.indexOf('>')+1,html.indexOf('</')); } else { eval(html.substring(html.indexOf('>')+1,html.indexOf('</'))); contentFunctions+=html; } } else { contentHtml+=html; } } contentScriptArea=dwr.util.byId("contentAreaFunctions"); contentScriptArea.innerHTML=contentFunctions; contentArea=dwr.util.byId("contentArea"); contentArea.innerHTML=contentHtml; if(scriptToBeEvaled!='') { eval(scriptToBeEvaled); } } </script> </head> <body onload="loadMenuItems()"> <h1>DWR Easy Java Ajax Applications</h1> <ul class="basictab" id="dwrMenu"> </ul> <div id="contentAreaFunctions"> </div> <div id="contentArea"> </div> </body> </html> This JSP is our user interface. The HTML is just normal HTML with a head element and a body element. The head includes reference to a style sheet and to DWR JavaScript files, engine.js, util.js, and our own HorizontalMenu.js. The util.js file is optional, but as it contains very useful functions, it could be included in all the web pages where we use the functions in util.js. The body element has a contentArea place holder for the content pages just below the menu. It also contains the content area for JavaScript functions for a particular content. The body element onload-event executes the loadMenuItems() function when the page is loaded. The loadMenuItems() function calls the remoted method of the HorizontalMenu Java class. The parameter of the HorizontalMenu. getMenuItems() JavaScript function is the callback function that is called by DWR when the Java method has been executed and it returns menu items. The setMenuItems() function is a callback function for the loadMenuItems() function mentioned in the previous paragraph. While loading menu items, the Horizontal.getMenuItems() remoted method returns menu items as a List of Strings as a parameter to the setMenuItems() function. The menu items are formatted using the menuItemFormatter() helper function. The menuItemFormatter() function creates li elements of menu texts. Menus are formatted as links, (a href) and they have an onclick event that has a function call to the getContent-function, which in turn calls the AppContent.getContent() function. The AppContent is a remoted Java class, which we haven't implemented yet, and its purpose is to read the HTML from a file based on the menu item that the user clicked. Implementation of AppContent and the content pages are described in the next section. The setContent() function sets the HTML content to the content area and also evaluates JavaScript options that are within the content to be inserted in the content area (this is not used very much, but it is there for those who need it). Our dynamic user interface looks like this: Note the Firebug window at the bottom of the browser screen. The Firebug console in the screenshot shows one POST request to our HorizontalMenu.getMenuItems() method. Other Firebug features are extremely useful during development work, and we find it useful that Firebug has been enabled throughout the development work. Callback Functions We saw our first callback function as a parameter in the HorizontalMenu.getMenuItems(setMenuItems) function, and since callbacks are an important concept in DWR, it would be good to discuss a little more about them now that we have seen their first usage. Callbacks are used to operate on the data that was returned from a remoted method. As DWR and AJAX are asynchronous, typical return values in RPCs (Remote Procedure Calls), as in Java calls, do not work. DWR hides the details of calling the callback functions and handles everything internally from the moment we return a value from the remoted Java method to receiving the returned value to the callback function. Two methods are recommended while using callback functions. We have already seen the first method in the HorizontalMenu.getMenuItems(setMenuItems) function call. Remember that there are no parameters in the getMenuItems()Java method, but in the JavaScript call, we added the callback function name at the end of the parameter list. If the Java method has parameters, then the JavaScript call is similar to CountryDB.getCountries(selectedLetters,setCountryRows), where selectedLetters is the input parameter for the Java method and setCountryRows is the name of the callback function (we see the implementation later on). The second method to use callbacks is a meta-data object in the remote JavaScript call. An example (a full implementation is shown later in this article) is shown here: CountryDB.saveCountryNotes(ccode,newNotes, { callback:function(newNotes) { //function body here } }); Here, the function is anonymous and its implementation is included in the JavaScript call to the remoted Java method. One advantage here is that it is easy to read the code, and the code is executed immediately after we get the return value from the Java method. The other advantage is that we can add extra options to the call. Extra options include timeout and error handler as shown in the following example: CountryDB.saveCountryNotes(ccode,newNotes, { callback:function(newNotes) { //function body here }, timeout:10000, errorHandler:function(errorMsg) { alert(errorMsg);} }); It is also possible to add a callback function to those Java methods that do not return a value. Adding a callback to methods with no return values would be useful in getting a notification when a remote call has been completed. Afterword Our first sample is ready, and it is also the basis for the following samples. We also looked at how applications are tested in the Eclipse environment. Using DWR, we can look at JavaScript code on the browser and Java code on the server as one. It may take a while to get used to it, but it will change the way we develop web applications. Logically, there is no longer a client and a server but just a single run time platform that happens to be physically separate. But in practice, of course, applications using DWR, JavaScript on the client and Java in the server, are using the typical client-server interaction. This should be remembered when writing applications in the logically single run-time platform.
Read more
  • 0
  • 0
  • 3544

article-image-silverstripe-24-adding-some-spice-widgets-and-short-codes
Packt
02 May 2011
10 min read
Save for later

SilverStripe 2.4: Adding Some Spice with Widgets and Short Codes

Packt
02 May 2011
10 min read
SilverStripe 2.4 Module Extension, Themes, and Widgets: Beginner's Guide: RAW Create smashing SilverStripe applications by extending modules, creating themes, and adding widgets Why can't we simply use templates and $Content to accomplish the task? Widgets and short codes generally don't display their information directly like a placeholder does They can be used to fetch external information for you—we'll use Google and Facebook services in our examples Additionally they can aggregate internal information—for example displaying a tag cloud based on the key words you've added to pages or articles Widget or short code? Both can add more dynamic and/or complex content to the page than the regular fields. What’s the difference? Widgets are specialized content areas that can be dynamically dragged and dropped in a predefined area on a page in the CMS. You can't insert a widget into a rich-text editor field, it needs to be inserted elsewhere to a template. Additionally widgets can be customised from within the CMS. Short codes are self-defined tags in squared brackets that are entered anywhere in a content or rich-text area. Configuration is done through parameters, which are much like attributes in HTML. So the main difference is where you want to use the advanced content. Creating our own widget Let's create our first widget to see how it works. The result of this section should look like this: Time for action – embracing Facebook Facebook is probably the most important communication and publicity medium in the world at the moment. Our website is no exception and we want to publish the latest news on both our site and Facebook, but we definitely don't want to do that manually. You can either transmit information from your website to Facebook or you can grab information off Facebook and put it into your website. We'll use the latter approach, so let's hack away: In the Page class add a relation to the WidgetArea class and make it available in the CMS: public static $has_one = array( 'SideBar' => 'WidgetArea', ); public function getCMSFields(){ $fields = parent::getCMSFields(); $fields->addFieldToTab( 'Root.Content.Widgets', new WidgetAreaEditor('SideBar') ); return $fields; } Add $SideBar to templates/Layout/Page.ss in the theme directory, wrapping it inside another element for styling later on (the first and third line are already in the template, they are simply there for context): $Form <aside id="sidebar">$SideBar</aside> </section> <aside> is one of the new HTML5 tags. It's intended for content that is only "tangentially" related to the page's main content. For a detailed description see the official documentation at http://www.w3.org/TR/html-markup/aside.html. Create the widget folder in the base directory. We'll simply call it widget_facebookfeed/. Inside that folder, create an empty _config.php file. Additionally, create the folders code/ and templates/. Add the following PHP class—you'll know the filename and where to store it by now. The Controller's comments haven't been stripped this time, but are included to encourage best practice and provide a meaningful example: <?php class FacebookFeedWidget extends Widget { public static $db = array( 'Identifier' => 'Varchar(64)', 'Limit' => 'Int', ); public static $defaults = array( 'Limit' => 1, ); public static $cmsTitle = 'Facebook Messages'; public static $description = 'A list of the most recent Facebook messages'; public function getCMSFields(){ return new FieldSet( new TextField( 'Identifier', 'Identifier of the Facebook account to display' ), new NumericField( 'Limit', 'Maximum number of messages to display' ) ); } public function Feeds(){ /** * URL for fetching the information, * convert the returned JSON into an array. */ $url = 'http://graph.facebook.com/' . $this->Identifier . '/feed?limit=' . ($this->Limit + 5); $facebook = json_decode(file_get_contents($url), true); /** * Make sure we received some content, * create a warning in case of an error. */ if(empty($facebook) || !isset($facebook['data'])){ user_error( 'Facebook message error or API changed', E_USER_WARNING ); return; } /** * Iterate over all messages and only fetch as many as needed. */ $feeds = new DataObjectSet(); $count = 0; foreach($facebook['data'] as $post){ if($count >= $this->Limit){ break; } /** * If no such messages exists, log a warning and exit. */ if(!isset($post['from']['id']) || !isset($post['id'] || !isset($post['message'])){ user_error( 'Facebook detail error or API changed', E_USER_WARNING ); return; } /** * If the post is from the user itself and not someone * else, add the message and date to our feeds array. */ if(strpos($post['id'], $post['from']['id']) === 0){ $posted = date_parse($post['created_time']); $feeds->push(new ArrayData(array( 'Message' => DBField::create( 'HTMLText', nl2br($post['message']) ), 'Posted' => DBField::create( 'SS_Datetime', $posted['year'] . '-' . $posted['month'] . '-' . $posted['day'] . ' ' . $posted['hour'] . ':' . $posted['minute'] . ':' . $posted['second'] ), ))); $count++; } } return $feeds; } Define the template, use the same filename as for the previous file, but make sure that you use the correct extension. So the file widget_facebookfeed/templates/FacebookFeedWidget.ss should look like this: <% if Limit == 0 %> <% else %> <div id="facebookfeed" class="rounded"> <h2>Latest Facebook Update<% if Limit == 1 %> <% else %>s<% end_if %></h2> <% control Feeds %> <p> $Message <small>$Posted.Nice</small> </p> <% if Last %><% else %><hr/><% end_if %> <% end_control %> </div> <% end_if %> Also create a file widget_facebookfeed/templates/WidgetHolder.ss with just this single line of content: $Content We won't cover the CSS as it's not relevant to our goal. You can either copy it from the final code provided or simply roll your own. Rebuild the database with /dev/build?flush=all. Log into /admin. On each page you should now have a Widgets tab that looks similar to the next screenshot. In this example, the widget has already been activated by clicking next to the title in the left-hand menu. If you have more than one widget installed, you can simply add and reorder all of them on each page by drag-and-drop. So even novice content editors can add useful and interesting features to the pages very easily. Enter the Facebook ID and change the number of messages to display, if you want to. Save and Publish the page. Reload the page in the frontend and you should see something similar to the screenshot at the beginning of this section. allow_url_fopen must be enabled for this to work, otherwise you're not allowed to use remote objects such as local files. Due to security concerns it may be disabled, and you'll get error messages if there's a problem with this setting. For more details see http://www.php.net/manual/en/filesystem.configuration.php#ini.allow-url-fopen. What just happened? Quite a lot happened, so let's break it down into digestible pieces. Widgets in general Every widget is actually a module, although a small one, and limited in scope. The basic structure is the same: residing in the root folder, having a _config.php file (even if it's empty) and containing folders for code, templates, and possibly also JavaScript or images. Nevertheless, a widget is limited to the sidebar, so it's probably best described as an add-on. We'll take a good look at its bigger brother, the module, a little later. You're not required to name the folder widget_*, but it's a common practice and you should have a good reason for not sticking to it. Common use cases for widgets include tag clouds, Twitter integration, showing a countdown, and so forth. If you want to see what others have been doing with widgets or you need some of that functionality, visit http://www.silverstripe.org/widgets/. Keeping widgets simple In general widgets should work with default settings and if there are additional settings they should be both simple and few in number. While we'll be able to stick to the second part, we can't provide meaningful default settings for a Facebook account. Still, keep this idea in mind and try to adhere to it where possible. Facebook graph API We won't go into details of the Facebook Graph API, but it's a powerful tool—we've just scratched the surface with our example. Looking at the URL http://graph.facebook.com/<username>/feed?limit=5 you only need to know that it fetches the last five items from the user's feed, which consists of the wall posts (both by the user himself and others). <username> must obviously be replaced by the unique Facebook ID—either a number or an alias name the user selected. If you go to the user's profile, you should be able to see it in the URL. For example, SilverStripe Inc's Facebook profile is located at https://www.facebook.com/pages/silverstripe/44641219945?ref=ts&v=wall—so the ID is 44641219945. That's also what we've used for the example in the previous screenshot. For more details on the Graph API see http://developers.facebook.com/docs/api. Connecting pages and widgets First we need to connect our pages and widgets in general. You'll need to do this step whenever you want to use widgets. You'll need to do two things to make this connection: Reference the WidgetArea class in the base page's Model and make it available in the CMS through getCMSFields(). Secondly, we need to place the widget in our page. $SideBar You're not required to call the widget placeholder $SideBar, but it's a convention as widgets are normally displayed on a website's sidebar. If you don't have a good reason to do it otherwise, stick to it. You're not limited to a single sidebar As we define the widget ourselves, we can also create more than one for some or all pages. Simply add and rename the $SideBar in both the View and Model with something else and you're good to go. You can use multiple sidebars in the same region or totally different ones—for example creating header widgets and footer widgets. Also, take the name "sidebar" with a grain of salt, it can really have any shape you want. What about the intro page? Right. We've only added $SideBar to the standard templates/Layout/Page.ss. Shouldn't we proceed and put the PHP code into ContentPage.php? We could, but if we wanted to add the widget to another page type, which we'll create later, we'd have to copy the code. Not DRY, so let's keep it in the general Page.php. The intro page is a bit confusing right now. While you can add widgets in the backend, they can’t be displayed as the placeholder is missing in the template. To clean this up, let's simply remove the Widget tab from the intro page. It's not strictly required, but it prevents content authors from having a field in the CMS that does nothing visible on the website. To do this, simply extend the getCMSFields() in the IntroPage.php file, like this: function getCMSFields() { $fields = parent::getCMSFields(); $fields->removeFieldFromTab('Root.Content.Main', 'Content'); $fields->removeFieldFromTab('Root.Content', 'Widgets'); return $fields; }
Read more
  • 0
  • 0
  • 3543
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-meteorjs-javascript-framework-why-meteor-rocks
Packt
08 Feb 2013
18 min read
Save for later

Meteor.js JavaScript Framework: Why Meteor Rocks!

Packt
08 Feb 2013
18 min read
(For more resources related to this topic, see here.) Modern web applications Our world is changing. With continual advancements in display, computing, and storage capacities, what wasn't possible just a few years ago is now not only possible, but critical to the success of a good application. The Web in particular has undergone significant change. The origin of the web app (client/server) From the beginning, web servers and clients have mimicked the dumb terminal approach to computing, where a server with significantly more processing power than a client will perform operations on data (writing records to a database, math calculations, text searches, and so on), transform the data into a readable format (turn a database record into HTML, and so on), and then serve the result to the client, where it's displayed for the user. In other words, the server does all the work, and the client acts as more of a display, or dumb terminal. The design pattern for this is called...wait for it…the client/server design pattern: This design pattern, borrowed from the dumb terminals and mainframes of the 60s and 70s, was the beginning of the Web as we know it, and has continued to be the design pattern we think of, when we think of the Internet. The rise of the machines (MVC) Before the Web (and ever since), desktops were able to run a program such as a spreadsheet or a word processor without needing to talk to a server. This type of application could do everything it needed to, right there on the big and beefy desktop machine. During the early 90s, desktop computers got faster and better. Even more and more beefy. At the same time, the Web was coming alive. People started having the idea that a hybrid between the beefy desktop application (a fat app) and the connected client/server application (a thin app) would produce the best of both worlds. This kind of hybrid app — quite the opposite of a dumb terminal — was called a smart app. There were many business-oriented smart apps created, but the easiest examples are found in computer games. Massively Multiplayer Online games (MMOs), first-person shooters, and real-time strategies are smart apps where information (the data model) is passed between machines through a server. The client in this case does a lot more than just display the information. It performs most of the processing (or controls) and transforms the data into something to be displayed (the view). This design pattern is simple, but very effective. It's called the Model View Controller (MVC) pattern. The model is all the data. In the context of a smart app, the model is provided by a server. The client makes requests for the model from the server. Once the client gets the model, it performs actions/logic on this data, and then prepares it to be displayed on the screen. This part of the application (talk to the server, modify the data model, and prep data for display) is called the controller. The controller sends commands to the view, which displays the information, and reports back to the controller when something happens on the screen (a button click, for example). The controller receives that feedback, performs logic, and updates the model. Lather, rinse, repeat. Because web browsers were built to be "dumb clients" the idea of using a browser as a smart app was out of the question. Instead, smart apps were built on frameworks such as Microsoft .NET, Java, or Macromedia (now Adobe) Flash. As long as you had the framework installed, you could visit a web page to download/run a smart app. Sometimes you could run the app inside the browser, sometimes you could download it first, but either way, you were running a new type of web app, where the application could talk to the server and share the processing workload. The browser grows up (MVVM) Beginning in the early 2000s, a new twist on the MVC pattern started to emerge. Developers started to realize that, for connected/enterprise "smart apps", there was actually a nested MVC pattern. The server (controller) was performing business logic on the database information (model) through the use of business objects, and then passing that information on to a client application (a "view"). The client was receiving this information from the server, and treating it as its own personal "model." The client would then act as a proper controller, perform logic, and send the information to the view to be displayed on the screen. So, the "view" for the server MVC was the "model" for the second MVC. Then came the thought, "why stop at two?" There was no reason an application couldn't have multiple nested MVCs, with each view becoming the model for the next MVC. In fact, on the client side, there's actually a good reason to do so. Separating actual display logic (such as "this submit button goes here" and "the text area changed value") from the client-side object logic (such as "user can submit this record" and "the phone # has changed") allows a large majority of the code to be reused. The object logic can be ported to another application, and all you have to do is change out the display logic to extend the same model and controller code to a different application or device. From 2004-2005, this idea was refined and modified for smart apps (called the presentation model) by Martin Fowler and Microsoft (called the Model View View-Model). While not strictly the same thing as a nested MVC, the MVVM design pattern applied the concept of a nested MVC to the frontend application. As browser technologies (HTML and JavaScript) matured, it became possible to create smart apps that use the MVVM design pattern directly inside an HTML web page. This pattern makes it possible to run a full-sized application directly from a browser. No more downloading multiple frameworks or separate apps. You can now get the same functionality from visiting a URL as you previously could from buying a packaged product. A giant Meteor appears! Meteor takes the MVVM pattern to the next level. By applying templating through handlebars.js (or other template libraries) and using instant updates, it truly enables a web application to act and perform like a complete, robust smart application. Let's walk through some concepts of how Meteor does this, and then we'll begin to apply this to our Lending Library application. Cached and synchronized data (the model) Meteor supports a cached-and-synchronized data model that is the same on the client and the server. When the client notices a change to the data model, it first caches the change locally, and then tries to sync with the server. At the same time, it is listening to changes coming from the server. This allows the client to have a local copy of the data model, so it can send the results of any changes to the screen quickly, without having to wait for the server to respond. In addition, you'll notice that this is the beginning of the MVVM design pattern, within a nested MVC. In other words, the server publishes data changes, and treats those data changes as the "view" in its own MVC pattern. The client subscribes to those changes, and treats the changes as the "model" in its MVVM pattern. A code example of this is very simple inside of Meteor (although you can make it more complex and therefore more controlled if you'd like): var lists = new Meteor.Collection("lists"); What this one line does is declare that there is a lists data model. Both the client and server will have a version of it, but they treat their versions differently. The client will subscribe to changes announced by the server, and update its model accordingly. The server will publish changes, and listen to change requests from the client, and update its model (its master copy) based on those change requests. Wow. One line of code that does all that! Of course there is more to it, but that's beyond the scope of this article, so we'll move on. To better understand Meteor data synchronization, see the Publish and subscribe section of the Meteor documentation at http://docs.meteor.com/#publishandsubscribe. Templated HTML (the view) The Meteor client renders HTML through the use of templates. Templates in HTML are also called view data bindings. Without getting too deep, a view data binding is a shared piece of data that will be displayed differently if the data changes. The HTML code has a placeholder. In that placeholder different HTML code will be placed, depending on the value of a variable. If the value of that variable changes, the code in the placeholder will change with it, creating a different view. Let's look at a very simple data binding – one that you don't technically need Meteor for – to illustrate the point. In LendLib.html, you will see an HTML (Handlebar) template expression: <div id="categories-container"> {{> categories}} </div> That expression is a placeholder for an HTML template, found just below it: <template name="categories"> <h2 class="title">my stuff</h2>... So, {{> categories}} is basically saying "put whatever is in the template categories right here." And the HTML template with the matching name is providing that. If you want to see how data changes will change the display, change the h2 tag to an h4 tag, and save the change: <template name="categories"> <h4 class="title">my stuff</h4>... You'll see the effect in your browser ("my stuff" become itsy bitsy). That's a template – or view data binding – at work! Change the h4 back to an h2 and save the change. Unless you like the change. No judgment here...okay, maybe a little bit of judgment. It's ugly, and tiny, and hard to read. Seriously, you should change it back before someone sees it and makes fun of you!! Alright, now that we know what a view data binding is, let's see how Meteor uses them. Inside the categories template in LendLib.html, you'll find even more Handlebars templates: <template name="categories"> <h4 class="title">my stuff</h4> <div id="categories" class="btn-group"> {{#each lists}} <div class="category btn btn-inverse"> {{Category}} </div> {{/each}} </div> </template> The first Handlebars expression is part of a pair, and is a for-each statement. {{#each lists}} tells the interpreter to perform the action below it (in this case, make a new div) for each item in the lists collection. lists is the piece of data. {{#each lists}} is the placeholder. Now, inside the #each lists expression, there is one more Handlebars expression. {{Category}} Because this is found inside the #each expression, Category is an implied property of lists. That is to say that {{Category}} is the same as saying this.Category, where this is the current item in the for each loop. So the placeholder is saying "Add the value of this.Category here." Now, if we look in LendLib.js, we will see the values behind the templates. Template.categories.lists = function () { return lists.find(... Here, Meteor is declaring a template variable named lists, found inside a template called categories. That variable happens to be a function. That function is returning all the data in the lists collection, which we defined previously. Remember this line? var lists = new Meteor.Collection("lists"); That lists collection is returned by the declared Template.categories.lists, so that when there's a change to the lists collection, the variable gets updated, and the template's placeholder is changed as well. Let's see this in action. On your web page pointing to http://localhost:3000, open the browser console and enter the following line: > lists.insert({Category:"Games"}); This will update the lists data collection (the model). The template will see this change, and update the HTML code/placeholder. The for each loop will run one additional time, for the new entry in lists, and you'll see the following screen: In regards to the MVVM pattern, the HTML template code is part of the client's view. Any changes to the data are reflected in the browser automatically. Meteor's client code (the View-Model) As discussed in the preceding section, LendLib.js contains the template variables, linking the client's model to the HTML page, which is the client's view. Any logic that happens inside of LendLib.js as a reaction to changes from either the view or the model is part of the View-Model. The View-Model is responsible for tracking changes to the model and presenting those changes in such a way that the view will pick up the changes. It's also responsible for listening to changes coming from the view. By changes, we don't mean a button click or text being entered. Instead, we mean a change to a template value. A declared template is the View-Model, or the model for the view. That means that the client controller has its model (the data from the server) and it knows what to do with that model, and the view has its model (a template) and it knows how to display that model. Let's create some templates We'll now see a real-life example of the MVVM design pattern, and work on our Lending Library at the same time. Adding categories through the console has been a fun exercise, but it's not a long -t term solution. Let's make it so we can do that on the page instead. Open LendLib.html and add a new button just before the {{#each lists}} expression. <div id="categories" class="btn-group"> <div class="category btn btn-inverse" id="btnNewCat">+</div> {{#each lists}} This will add a plus button to the page. Now, we'll want to change out that button for a text field if we click on it. So let's build that functionality using the MVVM pattern, and make it based on the value of a variable in the template. Add the following lines of code: <div id="categories" class="btn-group"> {{#if new_cat}} {{else}} <div class="category btn btn-inverse" id="btnNewCat">+</div> {{/if}} {{#each lists}} The first line {{#if new_cat}} checks to see if new_cat is true or false. If it's false, the {{else}} section triggers, and it means we haven't yet indicated we want to add a new category, so we should be displaying the button with the plus sign. In this case, since we haven't defined it yet, new_cat will be false, and so the display won't change. Now let's add the HTML code to display, if we want to add a new category: <div id="categories" class="btn-group"> {{#if new_cat}} <div class="category"> <input type="text" id="add-category" value="" /> </div> {{else}} <div class="category btn btn-inverse" id="btnNewCat">+</div> {{/if}} {{#each lists}} Here we've added an input field, which will show up when new_cat is true. The input field won't show up unless it is, so for now it's hidden. So how do we make new_cat equal true? Save your changes if you haven't already, and open LendingLib.js. First, we'll declare a Session variable, just below our lists template declaration. Template.categories.lists = function () { return lists.find({}, {sort: {Category: 1}}); }; // We are declaring the 'adding_category' flag Session.set('adding_category', false); Now, we declare the new template variable new_cat, which will be a function returning the value of adding_category: // We are declaring the 'adding_category' flag Session.set('adding_category', false); // This returns true if adding_category has been assigned a value //of true Template.categories.new_cat = function () { return Session.equals('adding_category',true); }; Save these changes, and you'll see that nothing has changed. Ta-daaa! In reality, this is exactly as it should be, because we haven't done anything to change the value of adding_category yet. Let's do that now. First, we'll declare our click event, which will change the value in our Session variable. Template.categories.new_cat = function () { return Session.equals('adding_category',true); }; Template.categories.events({ 'click #btnNewCat': function (e, t) { Session.set('adding_category', true); Meteor.flush(); focusText(t.find("#add-category")); } }); Let's take a look at the following line: Template.categories.events({ This line is declaring that there will be events found in the category template. Now let's take a look at the next line: 'click #btnNewCat': function (e, t) { This line tells us that we're looking for a click event on the HTML element with an id="btnNewCat" (which we already created on LendingLib.html). Session.set('adding_category', true); Meteor.flush(); focusText(t.find("#add-category")); We set the Session variable adding_category = true, we flush the DOM (clear up anything wonky), and then we set the focus onto the input box with the expression id="add-category". One last thing to do, and that is to quickly add the helper function focusText(). Just before the closing tag for the if (Meteor.isClient) function, add the following code: /////Generic Helper Functions///// //this function puts our cursor where it needs to be. function focusText(i) { i.focus(); i.select(); }; } //------closing bracket for if(Meteor.isClient){} Now when you save the changes, and click on the plus [ ] button, you'll see the following input box: Fancy! It's still not useful, but we want to pause for a second and reflect on what just happened. We created a conditional template in the HTML page that will either show an input box or a plus button, depending on the value of a variable. That variable belongs to the View-Model. That is to say that if we change the value of the variable (like we do with the click event), then the view automatically updates. We've just completed an MVVM pattern inside a Meteor application! To really bring this home, let's add a change to the lists collection (also part of the View-Model, remember?) and figure out a way to hide the input field when we're done. First, we need to add a listener for the keyup event. Or to put it another way, we want to listen when the user types something in the box and hits Enter. When that happens, we want to have a category added, based on what the user typed. First, let's declare the event handler. Just after the click event for #btnNewCat, let's add another event handler: focusText(t.find("#add-category")); }, 'keyup #add-category': function (e,t){ if (e.which === 13) { var catVal = String(e.target.value || ""); if (catVal) { lists.insert({Category:catVal}); Session.set('adding_category', false); } } } }); We add a "," at the end of the click function, and then added the keyup event handler. if (e.which === 13) This line checks to see if we hit the Enter/return key. var catVal = String(e.target.value || ""); if (catVal) This checks to see if the input field has any value in it. lists.insert({Category:catVal}); If it does, we want to add an entry to the lists collection. Session.set('adding_category', false); Then we want to hide the input box, which we can do by simply modifying the value of adding_category. One more thing to add, and we're all done. If we click away from the input box, we want to hide it, and bring back the plus button. We already know how to do that inside the MVVM pattern by now, so let's add a quick function that changes the value of adding_category. Add one more comma after the keyup event handler, and insert the following event handler: Session.set('adding_category', false); } } }, 'focusout #add-category': function(e,t){ Session.set('adding_category',false); } }); Save your changes, and let's see this in action! In your web browser, on http://localhost:3000 , click on the plus sign — add the word Clothes and hit Enter. Your screen should now resemble the following: Feel free to add more categories if you want. Also, experiment with clicking on the plus button, typing something in, and then clicking away from the input field. Summary In this article you've learned about the history of web applications, and seen how we've moved from a traditional client/server model to a full-fledged MVVM design pattern. You've seen how Meteor uses templates and synchronized data to make things very easy to manage, providing a clean separation between our view, our view logic, and our data. Lastly, you've added more to the Lending Library, making a button to add categories, and you've done it all using changes to the View-Model, rather than directly editing the HTML.   Resources for Article : Further resources on this subject: How to Build a RSS Reader for Windows Phone 7 [Article] Applying Special Effects in 3D Game Development with Microsoft Silverlight 3: Part 2 [Article] Top features of KnockoutJS [Article]
Read more
  • 0
  • 0
  • 3543

article-image-play-framework-data-validation-using-controllers
Packt
21 Jul 2011
15 min read
Save for later

Play Framework: Data Validation Using Controllers

Packt
21 Jul 2011
15 min read
Play Framework Cookbook Over 60 incredibly effective recipes to take you under the hood and leverage advanced concepts of the Play framework Introduction This article will help you to keep your controllers as clean as possible, with a well defined boundary to your model classes. Always remember that controllers are really only a thin layer to ensure that your data from the outside world is valid before handing it over to your models, or something needs to be specifically adapted to HTTP. URL routing using annotation-based configuration If you do not like the routes file, you can also describe your routes programmatically by adding annotations to your controllers. This has the advantage of not having any additional config file, but also poses the problem of your URLs being dispersed in your code. You can find the source code of this example in the examples/chapter2/annotationcontroller directory. How to do it... Go to your project and install the router module via conf/dependencies.yml: require: - play - play -> router head Then run playdeps and the router module should be installed in the modules/ directory of your application. Change your controller like this: @StaticRoutes({ @ServeStatic(value="/public/", directory="public") }) public class Application extends Controller { @Any(value="/", priority=100) public static void index() { forbidden("Reserved for administrator"); } @Put(value="/", priority=2, accept="application/json") public static void hiddenIndex() { renderText("Secret news here"); } @Post("/ticket") public static void getTicket(String username, String password) { String uuid = UUID.randomUUID().toString(); renderJSON(uuid); } } How it works... Installing and enabling the module should not leave any open questions for you at this point. As you can see in the controller, it is now filled with annotations that resemble the entries in the routes.conf file, which you could possibly have deleted by now for this example. However, then your application will not start, so you have to have an empty file at least. The @ServeStatic annotation replaces the static command in the routes file. The @StaticRoutes annotation is just used for grouping several @ServeStatic annotations and could be left out in this example. Each controller call now has to have an annotation in order to be reachable. The name of the annotation is the HTTP method, or @Any, if it should match all HTTP methods. Its only mandatory parameter is the value, which resembles the URI—the second field in the routes. conf. All other parameters are optional. Especially interesting is the priority parameter, which can be used to give certain methods precedence. This allows a lower prioritized catchall controller like in the preceding example, but a special handling is required if the URI is called with the PUT method. You can easily check the correct behavior by using curl, a very practical command line HTTP client: curl -v localhost:9000/ This command should give you a result similar to this: > GET / HTTP/1.1 > User-Agent: curl/7.21.0 (i686-pc-linux-gnu) libcurl/7.21.0 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.18 > Host: localhost:9000 > Accept: */* > < HTTP/1.1 403 Forbidden < Server: Play! Framework;1.1;dev < Content-Type: text/html; charset=utf-8 < Set-Cookie: PLAY_FLASH=;Path=/ < Set-Cookie: PLAY_ERRORS=;Path=/ < Set-Cookie: PLAY_SESSION=0c7df945a5375480993f51914804284a3bb ca726-%00___ID%3A70963572-b0fc-4c8c-b8d5-871cb842c5a2%00;Path=/ < Cache-Control: no-cache < Content-Length: 32 < <h1>Reserved for administrator</h1> You can see the HTTP error message and the content returned. You can trigger a PUT request in a similar fashion: curl -X PUT -v localhost:9000/ > PUT / HTTP/1.1 > User-Agent: curl/7.21.0 (i686-pc-linux-gnu) libcurl/7.21.0 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.18 > Host: localhost:9000 > Accept: */* > < HTTP/1.1 200 OK < Server: Play! Framework;1.1;dev < Content-Type: text/plain; charset=utf-8 < Set-Cookie: PLAY_FLASH=;Path=/ < Set-Cookie: PLAY_ERRORS=;Path=/ < Set-Cookie: PLAY_SESSION=f0cb6762afa7c860dde3fe1907e8847347 6e2564-%00___ID%3A6cc88736-20bb-43c1-9d43-42af47728132%00;Path=/ < Cache-Control: no-cache < Content-Length: 16 Secret news here As you can see now, the priority has voted the controller method for the PUT method which is chosen and returned. There's more... The router module is a small, but handy module, which is perfectly suited to take a first look at modules and to understand how the routing mechanism of the Play framework works at its core. You should take a look at the source if you need to implement custom mechanisms of URL routing. Mixing the configuration file and annotations is possible You can use the router module and the routes file—this is needed when using modules as they cannot be specified in annotations. However, keep in mind that this is pretty confusing. You can check out more info about the router module at http://www.playframework.org/modules/router. Basics of caching Caching is quite a complex and multi-faceted technique, when implemented correctly. However, implementing caching in your application should not be complex, but rather the mindwork before, where you think about what and when to cache, should be. There are many different aspects, layers, and types (and their combinations) of caching in any web application. This recipe will give a short overview about the different types of caching and how to use them. You can find the source code of this example in the chapter2/caching-general directory. Getting ready First, it is important that you understand where caching can happen—inside and outside of your Play application. So let's start by looking at the caching possibilities of the HTTP protocol. HTTP sometimes looks like a simple protocol, but is tricky in the details. However, it is one of the most proven protocols in the Internet, and thus it is always useful to rely on its functionalities. HTTP allows the caching of contents by setting specific headers in the response. There are several headers which can be set: Cache-Control: This is a header which must be parsed and used by the client and also all the proxies in between. Last-Modified: This adds a timestamp, explaining when the requested resource had been changed the last time. On the next request the client may send an If-Modified- Since header with this date. Now the server may just return a HTTP 304 code without sending any data back. ETag: An ETag is basically the same as a Last-Modified header, except it has a semantic meaning. It is actually a calculated hash value resembling the resource behind the requested URL instead of a timestamp. This means the server can decide when a resource has changed and when it has not. This could also be used for some type of optimistic locking. So, this is a type of caching on which the requesting client has some influence on. There are also other forms of caching which are purely on the server side. In most other Java web frameworks, the HttpSession object is a classic example, which belongs to this case. Play has a cache mechanism on the server side. It should be used to store big session data, in this case any data exceeding the 4KB maximum cookie size. Be aware that there is a semantic difference between a cache and a session. You should not rely on the data being in the cache and thus need to handle cache misses. You can use the Cache class in your controller and model code. The great thing about it is that it is an abstraction of a concrete cache implementation. If you only use one node for your application, you can use the built-in ehCache for caching. As soon as your application needs more than one node, you can configure a memcached in your application.conf and there is no need to change any of your code. Furthermore, you can also cache snippets of your templates. For example, there is no need to reload the portal page of a user on every request when you can cache it for 10 minutes. This also leads to a very simple truth. Caching gives you a lot of speed and might even lower your database load in some cases, but it is not free. Caching means you need RAM, lots of RAM in most cases. So make sure the system you are caching on never needs to swap, otherwise you could read the data from disk anyway. This can be a special problem in cloud deployments, as there are often limitations on available RAM. The following examples show how to utilize the different caching techniques. We will show four different use cases of caching in the accompanying test. First test: public class CachingTest extends FunctionalTest { @Test public void testThatCachingPagePartsWork() { Response response = GET("/"); String cachedTime = getCachedTime(response); assertEquals(getUncachedTime(response), cachedTime); response = GET("/"); String newCachedTime = getCachedTime(response); assertNotSame(getUncachedTime(response), newCachedTime); assertEquals(cachedTime, newCachedTime); } @Test public void testThatCachingWholePageWorks() throws Exception { Response response = GET("/cacheFor"); String content = getContent(response); response = GET("/cacheFor"); assertEquals(content, getContent(response)); Thread.sleep(6000); response = GET("/cacheFor"); assertNotSame(content, getContent(response)); } @Test public void testThatCachingHeadersAreSet() { Response response = GET("/proxyCache"); assertIsOk(response); assertHeaderEquals("Cache-Control", "max-age=3600", response); } @Test public void testThatEtagCachingWorks() { Response response = GET("/etagCache/123"); assertIsOk(response); assertContentEquals("Learn to use etags, dumbass!", response); Request request = newRequest(); String etag = String.valueOf("123".hashCode()); Header noneMatchHeader = new Header("if-none-match", etag); request.headers.put("if-none-match", noneMatchHeader); DateTime ago = new DateTime().minusHours(12); String agoStr = Utils.getHttpDateFormatter().format(ago. toDate()); Header modifiedHeader = new Header("if-modified-since", agoStr); request.headers.put("if-modified-since", modifiedHeader); response = GET(request, "/etagCache/123"); assertStatus(304, response); } private String getUncachedTime(Response response) { return getTime(response, 0); } private String getCachedTime(Response response) { return getTime(response, 1); } private String getTime(Response response, intpos) { assertIsOk(response); String content = getContent(response); return content.split("n")[pos]; } } The first test checks for a very nice feature. Since play 1.1, you can cache parts of a page, more exactly, parts of a template. This test opens a URL and the page returns the current date and the date of such a cached template part, which is cached for about 10 seconds. In the first request, when the cache is empty, both dates are equal. If you repeat the request, the first date is actual while the second date is the cached one. The second test puts the whole response in the cache for 5 seconds. In order to ensure that expiration works as well, this test waits for six seconds and retries the request. The third test ensures that the correct headers for proxy-based caching are set. The fourth test uses an HTTP ETag for caching. If the If-Modified-Since and If-None- Match headers are not supplied, it returns a string. On adding these headers to the correct ETag (in this case the hashCode from the string 123) and the date from 12 hours before, a 302 Not-Modified response should be returned. How to do it... Add four simple routes to the configuration as shown in the following code: GET / Application.index GET /cacheFor Application.indexCacheFor GET /proxyCache Application.proxyCache GET /etagCache/{name} Application.etagCache The application class features the following controllers: public class Application extends Controller { public static void index() { Date date = new Date(); render(date); } @CacheFor("5s") public static void indexCacheFor() { Date date = new Date(); renderText("Current time is: " + date); } public static void proxyCache() { response.cacheFor("1h"); renderText("Foo"); } @Inject private static EtagCacheCalculator calculator; public static void etagCache(String name) { Date lastModified = new DateTime().minusDays(1).toDate(); String etag = calculator.calculate(name); if(!request.isModified(etag, lastModified.getTime())) { throw new NotModified(); } response.cacheFor(etag, "3h", lastModified.getTime()); renderText("Learn to use etags, dumbass!"); } } As you can see in the controller, the class to calculate ETags is injected into the controller. This is done on startup with a small job as shown in the following code: @OnApplicationStart public class InjectionJob extends Job implements BeanSource { private Map<Class, Object>clazzMap = new HashMap<Class, Object>(); public void doJob() { clazzMap.put(EtagCacheCalculator.class, new EtagCacheCalculator()); Injector.inject(this); } public <T> T getBeanOfType(Class<T>clazz) { return (T) clazzMap.get(clazz); } } The calculator itself is as simple as possible: public class EtagCacheCalculator implements ControllerSupport { public String calculate(String str) { return String.valueOf(str.hashCode()); } } The last piece needed is the template of the index() controller, which looks like this: Current time is: ${date} #{cache 'mainPage', for:'5s'} Current time is: ${date} #{/cache} How it works... Let's check the functionality per controller call. The index() controller has no special treatment inside the controller. The current date is put into the template and that's it. However, the caching logic is in the template here because not the whole, but only a part of the returned data should be cached, and for that a #{cache} tag used. The tag requires two arguments to be passed. The for parameter allows you to set the expiry out of the cache, while the first parameter defines the key used inside the cache. This allows pretty interesting things. Whenever you are in a page where something is exclusively rendered for a user (like his portal entry page), you could cache it with a key, which includes the user name or the session ID, like this: #{cache 'home-' + connectedUser.email, for:'15min'} ${user.name} #{/cache} This kind of caching is completely transparent to the user, as it exclusively happens on the server side. The same applies for the indexCacheFor() controller. Here, the whole page gets cached instead of parts inside the template. This is a pretty good fit for nonpersonalized, high performance delivery of pages, which often are only a very small portion of your application. However, you already have to think about caching before. If you do a time consuming JPA calculation, and then reuse the cache result in the template, you have still wasted CPU cycles and just saved some rendering time. The third controller call proxyCache() is actually the most simple of all. It just sets the proxy expire header called Cache-Control. It is optional to set this in your code, because your Play is configured to set it as well when the http.cacheControl parameter in your application.conf is set. Be aware that this works only in production, and not in development mode. The most complex controller is the last one. The first action is to find out the last modified date of the data you want to return. In this case it is 24 hours ago. Then the ETag needs to be created somehow. In this case, the calculator gets a String passed. In a real-world application you would more likely pass the entity and the service would extract some properties of it, which are used to calculate the ETag by using a pretty-much collision-safe hash algorithm. After both values have been calculated, you can check in the request whether the client needs to get new data or may use the old data. This is what happens in the request.isModified() method. If the client either did not send all required headers or an older timestamp was used, real data is returned; in this case, a simple string advising you to use an ETag the next time. Furthermore, the calculated ETag and a maximum expiry time are also added to the response via response.cacheFor(). A last specialty in the etagCache() controller is the use of the EtagCacheCalculator. The implementation does not matter in this case, except that it must implement the ControllerSupport interface. However, the initialization of the injected class is still worth a mention. If you take a look at the InjectionJob class, you will see the creation of the class in the doJob() method on startup, where it is put into a local map. Also, the Injector.inject() call does the magic of injecting the EtagCacheCalculator instance into the controllers. As a result of implementing the BeanSource interface, the getBeanOfType() method tries to get the corresponding class out of the map. The map actually should ensure that only one instance of this class exists. There's more... Caching is deeply integrated into the Play framework as it is built with the HTTP protocol in mind. If you want to find out more about it, you will have to examine core classes of the framework. More information in the ActionInvoker If you want to know more details about how the @CacheFor annotation works in Play, you should take a look at the ActionInvoker class inside of it. Be thoughtful with ETag calculation Etag calculation is costly, especially if you are calculating more then the last-modified stamp. You should think about performance here. Perhaps it would be useful to calculate the ETag after saving the entity and storing it directly at the entity in the database. It is useful to make some tests if you are using the ETag to ensure high performance. In case you want to know more about ETag functionality, you should read RFC 2616. You can also disable the creation of ETags totally, if you set http.useETag=false in your application.conf. Use a plugin instead of a job The job that implements the BeanSource interface is not a very clean solution to the problem of calling Injector.inject() on start up of an application. It would be better to use a plugin in this case.
Read more
  • 0
  • 0
  • 3542

article-image-jboss-plug-and-eclipse-web-tools-platform
Packt
23 Oct 2009
4 min read
Save for later

JBoss AS plug-in and the Eclipse Web Tools Platform

Packt
23 Oct 2009
4 min read
In this article, we recommend that you use the JBoss AS (version 4.2), which is a free J2EE Application Server that can be downloaded from http://www.jboss.org/jbossas/downloads/ (complete documentation can be downloaded from http://www.jboss.org/jbossas/docs/). JBoss AS plug-in and the Eclipse WTP JBoss AS plug-in can be treated as an elegant method of connecting a J2EE Application Server to the Eclipse IDE. It's important to know that JBoss AS plug-in does this by using the WTP support, which is a project included by default in the Eclipse IDE. WTP is a major project that extends the Eclipse platform with a strong support for Web and J2EE applications. In this case, WTP will sustain important operations, like starting the server in run/debug mode, stopping the server, and delegating WTP projects to their runtimes. For now, keep in mind that Eclipse supports a set of WTP servers and for every WTP server you may have one WTP runtime. Now, we will see how to install and configure the JBoss 4.2.2 runtime and server. Adding a WTP runtime in Eclipse In case of JBoss Tools, the main scope of Server Runtimes is to point to a server installation somewhere on your machine. By runtimes, we can use different configurations of the same server installed in different physical locations. Now, we will create a JBoss AS Runtime (you can extrapolate the steps shown below for any supported server): From the Window menu, select Preferences. In the Preferences window, expand the Server node and select the Runtime Environments child-node. On the right side of the window, you can see a list of currently installed runtimes, as shown in the following screenshot, where you can see that an Apache Tomcat runtime is reported (this is just an example, the Apache Tomcat runtime is not a default one). Now, if you want to install a new runtime, you should click the Add button from the top-right corner. This will bring in front the New Server Runtime Environment window as you can see in the following screenshot. Because we want to add a JBoss 4.2.2 runtime, we will select the JBoss 4.2 Runtime option (for other adapters proceed accordingly). After that, click Next for setting the runtime parameters. In the runtimes list, we have runtimes provided by WTP and runtimes provided by JBoss Tools (see the section marked in red on the previous screenshot). Because this article is about JBoss Tools, we will further discuss only the runtimes from this category. Here, we have five types of runtimes with the mention that the JBoss Deploy-Only Runtime type is for developers who start/stop/debug applications outside Eclipse. In this step, you will configure the JBoss runtime by indicating the runtime's name (in the Name field), the runtime's home directory (in the Home Directory field), the Java Runtime Environment associated with this runtime (in the JRE field), and the configuration type (in the Configuration field).In the following screenshot, we have done all these settings for our JBoss 4.2 Runtime. The official documentation of JBoss AS 4.2.2 recommends using JDK version 5. If you don't have this version in the JRE list, you can add it like this: Display the Preferences window by clicking the JRE button. In this window, click the Add button to display the Add JRE window. Continue by selecting the Standard VM option and click on the Next button. On the next page, use the Browse button to navigate to the JRE 5 home directory. Click on the Finish button and you should see a new entry in the Installed JREs field of the Preferences window (as shown in the following screenshot). Just check the checkbox of this new entry and click OK. Now, JRE 5 should be available in the JRE list of the New Server Runtime Environment window. After this, just click on the Finish button and the new runtime will be added, as shown in the following screenshot: From this window, you can also edit or remove a runtime by using the Edit and Remove buttons. These are automatically activated when you select a runtime from the list. As a final step, it is recommended to restart the Eclipse IDE.
Read more
  • 0
  • 0
  • 3541

article-image-indexing-and-performance-tuning
Packt
10 Oct 2014
40 min read
Save for later

Indexing and Performance Tuning

Packt
10 Oct 2014
40 min read
In this article by Hans-Jürgen Schönig, author of the book PostgreSQL Administration Essentials, you will be guided through PostgreSQL indexing, and you will learn how to fix performance issues and find performance bottlenecks. Understanding indexing will be vital to your success as a DBA—you cannot count on software engineers to get this right straightaway. It will be you, the DBA, who will face problems caused by bad indexing in the field. For the sake of your beloved sleep at night, this article is about PostgreSQL indexing. (For more resources related to this topic, see here.) Using simple binary trees In this section, you will learn about simple binary trees and how the PostgreSQL optimizer treats the trees. Once you understand the basic decisions taken by the optimizer, you can move on to more complex index types. Preparing the data Indexing does not change user experience too much, unless you have a reasonable amount of data in your database—the more data you have, the more indexing can help to boost things. Therefore, we have to create some simple sets of data to get us started. Here is a simple way to populate a table: test=# CREATE TABLE t_test (id serial, name text);CREATE TABLEtest=# INSERT INTO t_test (name) SELECT 'hans' FROM   generate_series(1, 2000000);INSERT 0 2000000test=# INSERT INTO t_test (name) SELECT 'paul' FROM   generate_series(1, 2000000);INSERT 0 2000000 In our example, we created a table consisting of two columns. The first column is simply an automatically created integer value. The second column contains the name. Once the table is created, we start to populate it. It's nice and easy to generate a set of numbers using the generate_series function. In our example, we simply generate two million numbers. Note that these numbers will not be put into the table; we will still fetch the numbers from the sequence using generate_series to create two million hans and rows featuring paul, shown as follows: test=# SELECT * FROM t_test LIMIT 3;id | name----+------1 | hans2 | hans3 | hans(3 rows) Once we create a sufficient amount of data, we can run a simple test. The goal is to simply count the rows we have inserted. The main issue here is: how can we find out how long it takes to execute this type of query? The timing command will do the job for you: test=# timingTiming is on. As you can see, timing will add the total runtime to the result. This makes it quite easy for you to see if a query turns out to be a problem or not: test=# SELECT count(*) FROM t_test;count---------4000000(1 row)Time: 316.628 ms As you can see in the preceding code, the time required is approximately 300 milliseconds. This might not sound like a lot, but it actually is. 300 ms means that we can roughly execute three queries per CPU per second. On an 8-Core box, this would translate to roughly 25 queries per second. For many applications, this will be enough; but do you really want to buy an 8-Core box to handle just 25 concurrent users, and do you want your entire box to work just on this simple query? Probably not! Understanding the concept of execution plans It is impossible to understand the use of indexes without understanding the concept of execution plans. Whenever you execute a query in PostgreSQL, it generally goes through four central steps, described as follows: Parser: PostgreSQL will check the syntax of the statement. Rewrite system: PostgreSQL will rewrite the query (for example, rules and views are handled by the rewrite system). Optimizer or planner: PostgreSQL will come up with a smart plan to execute the query as efficiently as possible. At this step, the system will decide whether or not to use indexes. Executor: Finally, the execution plan is taken by the executor and the result is generated. Being able to understand and read execution plans is an essential task of every DBA. To extract the plan from the system, all you need to do is use the explain command, shown as follows: test=# explain SELECT count(*) FROM t_test;                             QUERY PLAN                            ------------------------------------------------------Aggregate (cost=71622.00..71622.01 rows=1 width=0)   -> Seq Scan on t_test (cost=0.00..61622.00                         rows=4000000 width=0)(2 rows)Time: 0.370 ms In our case, it took us less than a millisecond to calculate the execution plan. Once you have the plan, you can read it from right to left. In our case, PostgreSQL will perform a sequential scan and aggregate the data returned by the sequential scan. It is important to mention that each step is assigned to a certain number of costs. The total cost for the sequential scan is 61,622 penalty points (more details about penalty points will be outlined a little later). The overall cost of the query is 71,622.01. What are costs? Well, costs are just an arbitrary number calculated by the system based on some rules. The higher the costs, the slower a query is expected to be. Always keep in mind that these costs are just a way for PostgreSQL to estimate things—they are in no way a reliable number related to anything in the real world (such as time or amount of I/O needed). In addition to the costs, PostgreSQL estimates that the sequential scan will yield around four million rows. It also expects the aggregation to return just a single row. These two estimates happen to be precise, but it is not always so. Calculating costs When in training, people often ask how PostgreSQL does its cost calculations. Consider a simple example like the one we have next. It works in a pretty simple way. Generally, there are two types of costs: I/O costs and CPU costs. To come up with I/O costs, we have to figure out the size of the table we are dealing with first: test=# SELECT pg_relation_size('t_test'),   pg_size_pretty(pg_relation_size('t_test'));pg_relation_size | pg_size_pretty------------------+----------------       177127424 | 169 MB(1 row) The pg_relation_size command is a fast way to see how large a table is. Of course, reading a large number (many digits) is somewhat hard, so it is possible to fetch the size of the table in a much prettier format. In our example, the size is roughly 170 MB. Let's move on now. In PostgreSQL, a table consists of 8,000 blocks. If we divide the size of the table by 8,192 bytes, we will end up with exactly 21,622 blocks. This is how PostgreSQL estimates I/O costs of a sequential scan. If a table is read completely, each block will receive exactly one penalty point, or any number defined by seq_page_cost: test=# SHOW seq_page_cost;seq_page_cost---------------1(1 row) To count this number, we have to send four million rows through the CPU (cpu_tuple_cost), and we also have to count these 4 million rows (cpu_operator_cost). So, the calculation looks like this: For the sequential scan: 21622*1 + 4000000*0.01 (cpu_tuple_cost) = 61622 For the aggregation: 61622 + 4000000*0.0025 (cpu_operator_cost) = 71622 This is exactly the number that we see in the plan. Drawing important conclusions Of course, you will never do this by hand. However, there are some important conclusions to be drawn: The cost model in PostgreSQL is a simplification of the real world The costs can hardly be translated to real execution times The cost of reading from a slow disk is the same as the cost of reading from a fast disk It is hard to take caching into account If the optimizer comes up with a bad plan, it is possible to adapt the costs either globally in postgresql.conf, or by changing the session variables, shown as follows: test=# SET seq_page_cost TO 10;SET This statement inflated the costs at will. It can be a handy way to fix the missed estimates, leading to bad performance and, therefore, to poor execution times. This is what the query plan will look like using the inflated costs: test=# explain SELECT count(*) FROM t_test;                     QUERY PLAN                              -------------------------------------------------------Aggregate (cost=266220.00..266220.01 rows=1 width=0)   -> Seq Scan on t_test (cost=0.00..256220.00          rows=4000000 width=0)(2 rows) It is important to understand the PostgreSQL code model in detail because many people have completely wrong ideas about what is going on inside the PostgreSQL optimizer. Offering a basic explanation will hopefully shed some light on this important topic and allow administrators a deeper understanding of the system. Creating indexes After this introduction, we can deploy our first index. As we stated before, runtimes of several hundred milliseconds for simple queries are not acceptable. To fight these unusually high execution times, we can turn to CREATE INDEX, shown as follows: test=# h CREATE INDEXCommand:     CREATE INDEXDescription: define a new indexSyntax:CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ]ON table_name [ USING method ]   ( { column_name | ( expression ) }[ COLLATE collation ] [ opclass ][ ASC | DESC ] [ NULLS { FIRST | LAST } ]   [, ...] )   [ WITH ( storage_parameter = value [, ... ] ) ]   [ TABLESPACE tablespace_name ]   [ WHERE predicate ] In the most simplistic case, we can create a normal B-tree index on the ID column and see what happens: test=# CREATE INDEX idx_id ON t_test (id);CREATE INDEXTime: 3996.909 ms B-tree indexes are the default index structure in PostgreSQL. Internally, they are also called B+ tree, as described by Lehman-Yao. On this box (AMD, 4 Ghz), we can build the B-tree index in around 4 seconds, without any database side tweaks. Once the index is in place, the SELECT command will be executed at lightning speed: test=# SELECT * FROM t_test WHERE id = 423423;   id   | name--------+------423423 | hans(1 row)Time: 0.384 ms The query executes in less than a millisecond. Keep in mind that this already includes displaying the data, and the query is a lot faster internally. Analyzing the performance of a query How do we know that the query is actually a lot faster? In the previous section, you saw EXPLAIN in action already. However, there is a little more to know about this command. You can add some instructions to EXPLAIN to make it a lot more verbose, as shown here: test=# h EXPLAINCommand:     EXPLAINDescription: show the execution plan of a statementSyntax:EXPLAIN [ ( option [, ...] ) ] statementEXPLAIN [ ANALYZE ] [ VERBOSE ] statement In the preceding code, the term option can be one of the following:    ANALYZE [ boolean ]   VERBOSE [ boolean ]   COSTS [ boolean ]   BUFFERS [ boolean ]   TIMING [ boolean ]   FORMAT { TEXT | XML | JSON | YAML } Consider the following example: test=# EXPLAIN (ANALYZE TRUE, VERBOSE true, COSTS TRUE,   TIMING true) SELECT * FROM t_test WHERE id = 423423;               QUERY PLAN        ------------------------------------------------------Index Scan using idx_id on public.t_test(cost=0.43..8.45 rows=1 width=9)(actual time=0.016..0.018 rows=1 loops=1)   Output: id, name   Index Cond: (t_test.id = 423423)Total runtime: 0.042 ms(4 rows)Time: 0.536 ms The ANALYZE function does a special form of execution. It is a good way to figure out which part of the query burned most of the time. Again, we can read things inside out. In addition to the estimated costs of the query, we can also see the real execution time. In our case, the index scan takes 0.018 milliseconds. Fast, isn't it? Given these timings, you can see that displaying the result actually takes a huge fraction of the time. The beauty of EXPLAIN ANALYZE is that it shows costs and execution times for every step of the process. This is important for you to familiarize yourself with this kind of output because when a programmer hits your desk complaining about bad performance, it is necessary to dig into this kind of stuff quickly. In many cases, the secret to performance is hidden in the execution plan, revealing a missing index or so. It is recommended to pay special attention to situations where the number of expected rows seriously differs from the number of rows really processed. Keep in mind that the planner is usually right, but not always. Be cautious in case of large differences (especially if this input is fed into a nested loop). Whenever a query feels slow, we always recommend to take a look at the plan first. In many cases, you will find missing indexes. The internal structure of a B-tree index Before we dig further into the B-tree indexes, we can briefly discuss what an index actually looks like under the hood. Understanding the B-tree internals Consider the following image that shows how things work: In PostgreSQL, we use the so-called Lehman-Yao B-trees (check out http://www.cs.cmu.edu/~dga/15-712/F07/papers/Lehman81.pdf). The main advantage of the B-trees is that they can handle concurrency very nicely. It is possible that hundreds or thousands of concurrent users modify the tree at the same time. Unfortunately, there is not enough room in this book to explain precisely how this works. The two most important issues of this tree are the facts that I/O is done in 8,000 chunks and that the tree is actually a sorted structure. This allows PostgreSQL to apply a ton of optimizations. Providing a sorted order As we stated before, a B-tree provides the system with sorted output. This can come in quite handy. Here is a simple query to make use of the fact that a B-tree provides the system with sorted output: test=# explain SELECT * FROM t_test ORDER BY id LIMIT 3;                   QUERY PLAN                                    ------------------------------------------------------Limit (cost=0.43..0.67 rows=3 width=9)   -> Index Scan using idx_id on t_test(cost=0.43..320094.43 rows=4000000 width=9)(2 rows) In this case, we are looking for the three smallest values. PostgreSQL will read the index from left to right and stop as soon as enough rows have been returned. This is a very common scenario. Many people think that indexes are only about searching, but this is not true. B-trees are also present to help out with sorting. Why do you, the DBA, care about this stuff? Remember that this is a typical use case where a software developer comes to your desk, pounds on the table, and complains. A simple index can fix the problem. Combined indexes Combined indexes are one more source of trouble if they are not used properly. A combined index is an index covering more than one column. Let's drop the existing index and create a combined index (make sure your seq_page_cost variable is set back to default to make the following examples work): test=# DROP INDEX idx_combined;DROP INDEXtest=# CREATE INDEX idx_combined ON t_test (name, id);CREATE INDEX We defined a composite index consisting of two columns. Remember that we put the name before the ID. A simple query will return the following execution plan: test=# explain analyze SELECT * FROM t_test   WHERE id = 10;               QUERY PLAN                                              -------------------------------------------------Seq Scan on t_test (cost=0.00..71622.00 rows=1   width=9)(actual time=181.502..351.439 rows=1 loops=1)   Filter: (id = 10)   Rows Removed by Filter: 3999999Total runtime: 351.481 ms(4 rows) There is no proper index for this, so the system will fall back to a sequential scan. Why is there no proper index? Well, try to look up for first names only in the telephone book. This is not going to work because a telephone book is sorted by location, last name, and first name. The same applies to our index. A B-tree works basically on the same principles as an ordinary paper phone book. It is only useful if you look up the first couple of values, or simply all of them. Here is an example: test=# explain analyze SELECT * FROM t_test   WHERE id = 10 AND name = 'joe';     QUERY PLAN                                                    ------------------------------------------------------Index Only Scan using idx_combined on t_test   (cost=0.43..6.20 rows=1 width=9)(actual time=0.068..0.068 rows=0 loops=1)   Index Cond: ((name = 'joe'::text) AND (id = 10))   Heap Fetches: 0Total runtime: 0.108 ms(4 rows) In this case, the combined index comes up with a high speed result of 0.1 ms, which is not bad. After this small example, we can turn to an issue that's a little bit more complex. Let's change the costs of a sequential scan to 100-times normal: test=# SET seq_page_cost TO 100;SET Don't let yourself be fooled into believing that an index is always good: test=# explain analyze SELECT * FROM t_testWHERE id = 10;                   QUERY PLAN                ------------------------------------------------------Index Only Scan using idx_combined on t_test(cost=0.43..91620.44 rows=1 width=9)(actual time=0.362..177.952 rows=1 loops=1)   Index Cond: (id = 10)   Heap Fetches: 1Total runtime: 177.983 ms(4 rows) Just look at the execution times. We are almost as slow as a sequential scan here. Why does PostgreSQL use the index at all? Well, let's assume we have a very broad table. In this case, sequentially scanning the table is expensive. Even if we have to read the entire index, it can be cheaper than having to read the entire table, at least if there is enough hope to reduce the amount of data by using the index somehow. So, in case you see an index scan, also take a look at the execution times and the number of rows used. The index might not be perfect, but it's just an attempt by PostgreSQL to avoid the worse to come. Keep in mind that there is no general rule (for example, more than 25 percent of data will result in a sequential scan) for sequential scans. The plans depend on a couple of internal issues, such as physical disk layout (correlation) and so on. Partial indexes Up to now, an index covered the entire table. This is not always necessarily the case. There are also partial indexes. When is a partial index useful? Consider the following example: test=# CREATE TABLE t_invoice (   id     serial,   d     date,   amount   numeric,   paid     boolean);CREATE TABLEtest=# CREATE INDEX idx_partial   ON   t_invoice (paid)   WHERE   paid = false;CREATE INDEX In our case, we create a table storing invoices. We can safely assume that the majority of the invoices are nicely paid. However, we expect a minority to be pending, so we want to search for them. A partial index will do the job in a highly space efficient way. Space is important because saving on space has a couple of nice side effects, such as cache efficiency and so on. Dealing with different types of indexes Let's move on to an important issue: not everything can be sorted easily and in a useful way. Have you ever tried to sort circles? If the question seems odd, just try to do it. It will not be easy and will be highly controversial, so how do we do it best? Would we sort by size or coordinates? Under any circumstances, using a B-tree to store circles, points, or polygons might not be a good idea at all. A B-tree does not do what you want it to do because a B-tree depends on some kind of sorting order. To provide end users with maximum flexibility and power, PostgreSQL provides more than just one index type. Each index type supports certain algorithms used for different purposes. The following list of index types is available in PostgreSQL (as of Version 9.4.1): btree: These are the high-concurrency B-trees gist: This is an index type for geometric searches (GIS data) and for KNN-search gin: This is an index type optimized for Full-Text Search (FTS) sp-gist: This is a space-partitioned gist As we mentioned before, each type of index serves different purposes. We highly encourage you to dig into this extremely important topic to make sure that you can help software developers whenever necessary. Unfortunately, we don't have enough room in this book to discuss all the index types in greater depth. If you are interested in finding out more, we recommend checking out information on my website at http://www.postgresql-support.de/slides/2013_dublin_indexing.pdf. Alternatively, you can look up the official PostgreSQL documentation, which can be found at http://www.postgresql.org/docs/9.4/static/indexes.html. Detecting missing indexes Now that we have covered the basics and some selected advanced topics of indexing, we want to shift our attention to a major and highly important administrative task: hunting down missing indexes. When talking about missing indexes, there is one essential query I have found to be highly valuable. The query is given as follows: test=# xExpanded display (expanded) is on.test=# SELECT   relname, seq_scan, seq_tup_read,     idx_scan, idx_tup_fetch,     seq_tup_read / seq_scanFROM   pg_stat_user_tablesWHERE   seq_scan > 0ORDER BY seq_tup_read DESC;-[ RECORD 1 ]-+---------relname       | t_user  seq_scan     | 824350      seq_tup_read | 2970269443530idx_scan     | 0      idx_tup_fetch | 0      ?column?     | 3603165 The pg_stat_user_tables option contains statistical information about tables and their access patterns. In this example, we found a classic problem. The t_user table has been scanned close to 1 million times. During these sequential scans, we processed close to 3 trillion rows. Do you think this is unusual? It's not nearly as unusual as you might think. In the last column, we divided seq_tup_read through seq_scan. Basically, this is a simple way to figure out how many rows a typical sequential scan has used to finish. In our case, 3.6 million rows had to be read. Do you remember our initial example? We managed to read 4 million rows in a couple of hundred milliseconds. So, it is absolutely realistic that nobody noticed the performance bottleneck before. However, just consider burning, say, 300 ms for every query thousands of times. This can easily create a heavy load on a totally unnecessary scale. In fact, a missing index is the key factor when it comes to bad performance. Let's take a look at the table description now: test=# d t_user                         Table "public.t_user"Column | Type   |       Modifiers                    ----------+---------+-------------------------------id      | integer | not null default               nextval('t_user_id_seq'::regclass)email   | text   |passwd   | text   |Indexes:   "t_user_pkey" PRIMARY KEY, btree (id) This is really a classic example. It is hard to tell how often I have seen this kind of example in the field. The table was probably called customer or userbase. The basic principle of the problem was always the same: we got an index on the primary key, but the primary key was never checked during the authentication process. When you log in to Facebook, Amazon, Google, and so on, you will not use your internal ID, you will rather use your e-mail address. Therefore, it should be indexed. The rules here are simple: we are searching for queries that needed many expensive scans. We don't mind sequential scans as long as they only read a handful of rows or as long as they show up rarely (caused by backups, for example). We need to keep expensive scans in mind, however ("expensive" in terms of "many rows needed"). Here is an example code snippet that should not bother us at all: -[ RECORD 1 ]-+---------relname       | t_province  seq_scan     | 8345345      seq_tup_read | 100144140idx_scan     | 0      idx_tup_fetch | 0      ?column?     | 12 The table has been read 8 million times, but in an average, only 12 rows have been returned. Even if we have 1 million indexes defined, PostgreSQL will not use them because the table is simply too small. It is pretty hard to tell which columns might need an index from inside PostgreSQL. However, taking a look at the tables and thinking about them for a minute will, in most cases, solve the riddle. In many cases, things are pretty obvious anyway, and developers will be able to provide you with a reasonable answer. As you can see, finding missing indexes is not hard, and we strongly recommend checking this system table once in a while to figure out whether your system works nicely. There are a couple of tools, such as pgbadger, out there that can help us to monitor systems. It is recommended that you make use of such tools. There is not only light, there is also some shadow. Indexes are not always good. They can also cause considerable overhead during writes. Keep in mind that when you insert, modify, or delete data, you have to touch the indexes as well. The overhead of useless indexes should never be underestimated. Therefore, it makes sense to not just look for missing indexes, but also for spare indexes that don't serve a purpose anymore. Detecting slow queries Now that we have seen how to hunt down tables that might need an index, we can move on to the next example and try to figure out the queries that cause most of the load on your system. Sometimes, the slowest query is not the one causing a problem; it is a bunch of small queries, which are executed over and over again. In this section, you will learn how to track down such queries. To track down slow operations, we can rely on a module called pg_stat_statements. This module is available as part of the PostgreSQL contrib section. Installing a module from this section is really easy. Connect to PostgreSQL as a superuser, and execute the following instruction (if contrib packages have been installed): test=# CREATE EXTENSION pg_stat_statements;CREATE EXTENSION This module will install a system view that will contain all the relevant information we need to find expensive operations: test=# d pg_stat_statements         View "public.pg_stat_statements"       Column       |       Type       | Modifiers---------------------+------------------+-----------userid             | oid             |dbid               | oid             |queryid             | bigint          |query               | text             |calls               | bigint           |total_time         | double precision |rows               | bigint           |shared_blks_hit     | bigint           |shared_blks_read   | bigint           |shared_blks_dirtied | bigint           |shared_blks_written | bigint           |local_blks_hit     | bigint           |local_blks_read     | bigint           |local_blks_dirtied | bigint           |local_blks_written | bigint           |temp_blks_read     | bigint           |temp_blks_written   | bigint           |blk_read_time       | double precision |blk_write_time     | double precision | In this view, we can see the queries we are interested in, the total execution time (total_time), the number of calls, and the number of rows returned. Then, we will get some information about the I/O behavior (more on caching later) of the query as well as information about temporary data being read and written. Finally, the last two columns will tell us how much time we actually spent on I/O. The final two fields are active when track_timing in postgresql.conf has been enabled and will give vital insights into potential reasons for disk wait and disk-related speed problems. The blk_* prefix will tell us how much time a certain query has spent reading and writing to the operating system. Let's see what happens when we want to query the view: test=# SELECT * FROM pg_stat_statements;ERROR: pg_stat_statements must be loaded via   shared_preload_libraries The system will tell us that we have to enable this module; otherwise, data won't be collected. All we have to do to make this work is to add the following line to postgresql.conf: shared_preload_libraries = 'pg_stat_statements' Then, we have to restart the server to enable it. We highly recommend adding this module to the configuration straightaway to make sure that a restart can be avoided and that this data is always around. Don't worry too much about the performance overhead of this module. Tests have shown that the impact on performance is so low that it is even too hard to measure. Therefore, it might be a good idea to have this module activated all the time. If you have configured things properly, finding the most time-consuming queries should be simple: SELECT *FROM   pg_stat_statementsORDER   BY total_time DESC; The important part here is that PostgreSQL can nicely group queries. For instance: SELECT * FROM foo WHERE bar = 1;SELECT * FROM foo WHERE bar = 2; PostgreSQL will detect that this is just one type of query and replace the two numbers in the WHERE clause with a placeholder indicating that a parameter was used here. Of course, you can also sort by any other criteria: highest I/O time, highest number of calls, or whatever. The pg_stat_statement function has it all, and things are available in a way that makes the data very easy and efficient to use. How to reset statistics Sometimes, it is necessary to reset the statistics. If you are about to track down a problem, resetting can be very beneficial. Here is how it works: test=# SELECT pg_stat_reset();pg_stat_reset---------------(1 row)test=# SELECT pg_stat_statements_reset();pg_stat_statements_reset--------------------------(1 row) The pg_stat_reset command will reset the entire system statistics (for example, pg_stat_user_tables). The second call will wipe out pg_stat_statements. Adjusting memory parameters After we find the slow queries, we can do something about them. The first step is always to fix indexing and make sure that sane requests are sent to the database. If you are requesting stupid things from PostgreSQL, you can expect trouble. Once the basic steps have been performed, we can move on to the PostgreSQL memory parameters, which need some tuning. Optimizing shared buffers One of the most essential memory parameters is shared_buffers. What are shared buffers? Let's assume we are about to read a table consisting of 8,000 blocks. PostgreSQL will check if the buffer is already in cache (shared_buffers), and if it is not, it will ask the underlying operating system to provide the database with the missing 8,000 blocks. If we are lucky, the operating system has a cached copy of the block. If we are not so lucky, the operating system has to go to the disk system and fetch the data (worst case). So, the more data we have in cache, the more efficient we will be. Setting shared_buffers to the right value is more art than science. The general guideline is that shared_buffers should consume 25 percent of memory, but not more than 16 GB. Very large shared buffer settings are known to cause suboptimal performance in some cases. It is also not recommended to starve the filesystem cache too much on behalf of the database system. Mentioning the guidelines does not mean that it is eternal law—you really have to see this as a guideline you can use to get started. Different settings might be better for your workload. Remember, if there was an eternal law, there would be no setting, but some autotuning magic. However, a contrib module called pg_buffercache can give some insights into what is in cache at the moment. It can be used as a basis to get started on understanding what is going on inside the PostgreSQL shared buffer. Changing shared_buffers can be done in postgresql.conf, shown as follows: shared_buffers = 4GB In our example, shared buffers have been set to 4GB. A database restart is needed to activate the new value. In PostgreSQL 9.4, some changes were introduced. Traditionally, PostgreSQL used a classical System V shared memory to handle the shared buffers. Starting with PostgreSQL 9.3, mapped memory was added, and finally, it was in PostgreSQL 9.4 that a config variable was introduced to configure the memory technique PostgreSQL will use, shown as follows: dynamic_shared_memory_type = posix# the default is the first option     # supported by the operating system:     #   posix     #   sysv     #   windows     #   mmap     # use none to disable dynamic shared memory The default value on the most common operating systems is basically fine. However, feel free to experiment with the settings and see what happens performance wise. Considering huge pages When a process uses RAM, the CPU marks this memory as used by this process. For efficiency reasons, the CPU usually allocates RAM by chunks of 4,000 bytes. These chunks are called pages. The process address space is virtual, and the CPU and operating system have to remember which process belongs to which page. The more pages you have, the more time it takes to find where the memory is mapped. When a process uses 1 GB of memory, it means that 262.144 blocks have to be looked up. Most modern CPU architectures support bigger pages, and these pages are called huge pages (on Linux). To tell PostgreSQL that this mechanism can be used, the following config variable can be changed in postgresql.conf: huge_pages = try                     # on, off, or try Of course, your Linux system has to know about the use of huge pages. Therefore, you can do some tweaking, as follows: grep Hugepagesize /proc/meminfoHugepagesize:     2048 kB In our case, the size of the huge pages is 2 MB. So, if there is 1 GB of memory, 512 huge pages are needed. The number of huge pages can be configured and activated by setting nr_hugepages in the proc filesystem. Consider the following example: echo 512 > /proc/sys/vm/nr_hugepages Alternatively, we can use the sysctl command or change things in /etc/sysctl.conf: sysctl -w vm.nr_hugepages=512 Huge pages can have a significant impact on performance. Tweaking work_mem There is more to PostgreSQL memory configuration than just shared buffers. The work_mem parameter is widely used for operations such as sorting, aggregating, and so on. Let's illustrate the way work_mem works with a short, easy-to-understand example. Let's assume it is an election day and three parties have taken part in the elections. The data is as follows: test=# CREATE TABLE t_election (id serial, party text);test=# INSERT INTO t_election (party)SELECT 'socialists'   FROM generate_series(1, 439784);test=# INSERT INTO t_election (party)SELECT 'conservatives'   FROM generate_series(1, 802132);test=# INSERT INTO t_election (party)SELECT 'liberals'   FROM generate_series(1, 654033); We add some data to the table and try to count how many votes each party has: test=# explain analyze SELECT party, count(*)   FROM   t_election   GROUP BY 1;       QUERY PLAN                                                        ------------------------------------------------------HashAggregate (cost=39461.24..39461.26 rows=3     width=11) (actual time=609.456..609.456   rows=3 loops=1)     Group Key: party   -> Seq Scan on t_election (cost=0.00..29981.49     rows=1895949 width=11)   (actual time=0.007..192.934 rows=1895949   loops=1)Planning time: 0.058 msExecution time: 609.481 ms(5 rows) First of all, the system will perform a sequential scan and read all the data. This data is passed on to a so-called HashAggregate. For each party, PostgreSQL will calculate a hash key and increment counters as the query moves through the tables. At the end of the operation, we will have a chunk of memory with three values and three counters. Very nice! As you can see, the explain analyze statement does not take more than 600 ms. Note that the real execution time of the query will be a lot faster. The explain analyze statement does have some serious overhead. Still, it will give you valuable insights into the inner workings of the query. Let's try to repeat this same example, but this time, we want to group by the ID. Here is the execution plan: test=# explain analyze SELECT id, count(*)   FROM   t_election     GROUP BY 1;       QUERY PLAN                                                          ------------------------------------------------------GroupAggregate (cost=253601.23..286780.33 rows=1895949     width=4) (actual time=1073.769..1811.619     rows=1895949 loops=1)     Group Key: id   -> Sort (cost=253601.23..258341.10 rows=1895949   width=4) (actual time=1073.763..1288.432   rows=1895949 loops=1)         Sort Key: id       Sort Method: external sort Disk: 25960kB         -> Seq Scan on t_election         (cost=0.00..29981.49 rows=1895949 width=4)     (actual time=0.013..235.046 rows=1895949     loops=1)Planning time: 0.086 msExecution time: 1928.573 ms(8 rows) The execution time rises by almost 2 seconds and, more importantly, the plan changes. In this scenario, there is no way to stuff all the 1.9 million hash keys into a chunk of memory because we are limited by work_mem. Therefore, PostgreSQL has to find an alternative plan. It will sort the data and run GroupAggregate. How does it work? If you have a sorted list of data, you can count all equal values, send them off to the client, and move on to the next value. The main advantage is that we don't have to keep the entire result set in memory at once. With GroupAggregate, we can basically return aggregations of infinite sizes. The downside is that large aggregates exceeding memory will create temporary files leading to potential disk I/O. Keep in mind that we are talking about the size of the result set and not about the size of the underlying data. Let's try the same thing with more work_mem: test=# SET work_mem TO '1 GB';SETtest=# explain analyze SELECT id, count(*)   FROM t_election   GROUP BY 1;         QUERY PLAN                                                        ------------------------------------------------------HashAggregate (cost=39461.24..58420.73 rows=1895949     width=4) (actual time=857.554..1343.375   rows=1895949 loops=1)   Group Key: id   -> Seq Scan on t_election (cost=0.00..29981.49   rows=1895949 width=4)   (actual time=0.010..201.012   rows=1895949 loops=1)Planning time: 0.113 msExecution time: 1478.820 ms(5 rows) In this case, we adapted work_mem for the current session. Don't worry, changing work_mem locally does not change the parameter for other database connections. If you want to change things globally, you have to do so by changing things in postgresql.conf. Alternatively, 9.4 offers a command called ALTER SYSTEM SET work_mem TO '1 GB'. Once SELECT pg_reload_conf() has been called, the config parameter is changed as well. What you see in this example is that the execution time is around half a second lower than before. PostgreSQL switches back to the more efficient plan. However, there is more; work_mem is also in charge of efficient sorting: test=# explain analyze SELECT * FROM t_election ORDER BY id DESC;     QUERY PLAN                                                          ------------------------------------------------------Sort (cost=227676.73..232416.60 rows=1895949 width=15)   (actual time=695.004..872.698 rows=1895949   loops=1)   Sort Key: id   Sort Method: quicksort Memory: 163092kB   -> Seq Scan on t_election (cost=0.00..29981.49   rows=1895949 width=15) (actual time=0.013..188.876rows=1895949 loops=1)Planning time: 0.042 msExecution time: 995.327 ms(6 rows) In our example, PostgreSQL can sort the entire dataset in memory. Earlier, we had to perform a so-called "external sort Disk", which is way slower because temporary results have to be written to disk. The work_mem command is used for some other operations as well. However, sorting and aggregation are the most common use cases. Keep in mind that work_mem should not be abused, and work_mem can be allocated to every sorting or grouping operation. So, more than just one work_mem amount of memory might be allocated by a single query. Improving maintenance_work_mem To control the memory consumption of administrative tasks, PostgreSQL offers a parameter called maintenance_work_mem. It is used to handle index creations as well as VACUUM. Usually, creating an index (B-tree) is mostly related to sorting, and the idea of maintenance_work_mem is to speed things up. However, things are not as simple as they might seem. People might assume that increasing the parameter will always speed things up, but this is not necessarily true; in fact, smaller values might even be beneficial. We conducted some research to solve this riddle. The in-depth results of this research can be found at http://www.cybertec.at/adjusting-maintenance_work_mem/. However, indexes are not the only beneficiaries. The maintenance_work_mem command is also here to help VACUUM clean out indexes. If maintenance_work_mem is too low, you might see VACUUM scanning tables repeatedly because dead items cannot be stored in memory during VACUUM. This is something that should basically be avoided. Just like all other memory parameters, maintenance_work_mem can be set per session, or it can be set globally in postgresql.conf. Adjusting effective_cache_size The number of shared_buffers assigned to PostgreSQL is not the only cache in the system. The operating system will also cache data and do a great job of improving speed. To make sure that the PostgreSQL optimizer knows what to expect from the operation system, effective_cache_size has been introduced. The idea is to tell PostgreSQL how much cache there is going to be around (shared buffers + operating system side cache). The optimizer can then adjust its costs and estimates to reflect this knowledge. It is recommended to always set this parameter; otherwise, the planner might come up with suboptimal plans. Summary In this article, you learned how to detect basic performance bottlenecks. In addition to this, we covered the very basics of the PostgreSQL optimizer and indexes. At the end of the article, some important memory parameters were presented. Resources for Article: Further resources on this subject: PostgreSQL 9: Reliable Controller and Disk Setup [article] Running a PostgreSQL Database Server [article] PostgreSQL: Tips and Tricks [article]
Read more
  • 0
  • 0
  • 3541
article-image-setting-and-cleaning
Packt
11 Apr 2016
34 min read
Save for later

Setting Up and Cleaning Up

Packt
11 Apr 2016
34 min read
This article, by Mani Tadayon, author of the book, RSpec Essentials, discusses support code to set tests up and clean up after them. Initialization, configuration, cleanup, and other support code related to RSpec specs are important in real-world RSpec usage. We will learn how to cleanly organize support code in real-world applications by learning about the following topics: Configuring RSpec with spec_helper.rb Initialization and configuration of resources Preventing tests from accessing the Internet with WebMock Maintaining clean test state Custom helper code Loading support code on demand with tags (For more resources related to this topic, see here.) Configuring RSpec with spec_helper.rb The RSpec specs that we've seen so far have functioned as standalone units. Specs in the real world, however, almost never work without supporting code to prepare the test environment before tests are run and ensure it is cleaned up afterwards. In fact, the first line of nearly every real-world RSpec spec file loads a file that takes care of initialization, configuration, and cleanup: require 'spec_helper' By convention, the entry point for all support code for specs is in a file called spec_helper.rb. Another convention is that specs are located in a folder called spec in the root folder of the project. The spec_helper.rb file is located in the root of this spec folder. Now that we know where it goes, what do we actually put in spec_helper.rb? Let's start with an example: # spec/spec_helper.rb require 'rspec'   RSpec.configure do |config|   config.order            = 'random'   config.profile_examples = 3    end To see what these two options do, let's create a couple of dummy spec files that include our spec_helper.rb. Here's the first spec file: # spec/first_spec.rb require 'spec_helper'   describe 'first spec' do   it 'sleeps for 1 second' do     sleep 1   end     it 'sleeps for 2 seconds' do     sleep 2   end      it 'sleeps for 3 seconds' do     sleep 3   end  end And here's our second spec file: # spec/second_spec.rb require 'spec_helper'   describe 'second spec' do   it 'sleeps for 4 second' do     sleep 4   end     it 'sleeps for 5 seconds' do     sleep 5   end      it 'sleeps for 6 seconds' do     sleep 6   end  end Now let's run our two spec files and see what happens: We note that we used --format documentation when running RSpec so that we see the order in which the tests were run (the default format just outputs a green dot for each passing test). From the output, we can see that the tests were run in a random order. We can also see the three slowest specs. Although this was a toy example, I would recommend using both of these configuration options for RSpec. Running examples in a random order is very important, as it is the only reliable way of detecting bad tests which sometimes pass and sometimes fail based on the order the in which overall test suite is run. Also, keeping tests running fast is very important for maintaining a productive development flow, and seeing which tests are slow on every test run is the most effective way of encouraging developers to make the slow tests fast, or remove them from the test run. We'll return to both test order and test speed later. For now, let us just note that RSpec configuration is very important to keeping our specs reliable and fast. Initialization and configuration of resources Real-world applications rely on resources, such as databases, and external services, such as HTTP APIs. These must be initialized and configured for the application to work properly. When writing tests, dealing with these resources and services can be a challenge because of two opposing fundamental interests. First, we would like the test environment to match as closely as possible the production environment so that tests that interact with resources and services are realistic. For example, we may use a powerful database system in production that runs on many servers to provide the best performance. Should we spend money and effort to create and maintain a second production-grade database environment just for testing purposes? Second, we would like the test environment to be simple and relatively easy to understand, so that we understand what we are actually testing. We would also like to keep our code modular so that components can be tested in isolation, or in simpler environments that are easier to create, maintain, and understand. If we think of the example of the system that relies on a database cluster in production, we may ask ourselves whether we are better off using a single-server setup for our test database. We could even go so far as to use an entirely different database for our tests, such as the file-based SQLite. As always, there are no easy answers to such trade-offs. The important thing is to understand the costs and benefits, and adjust where we are on the continuum between production faithfulness and test simplicity as our system evolves, along with the goals it serves. For example, for a small hobbyist application or a project with a limited budget, we may choose to completely favor test simplicity. As the same code grows to become a successful fan site or a big-budget project, we may have a much lower tolerance for failure, and have both the motivation and resources to shift towards production faithfulness for our test environment. Some rules of thumb to keep in mind: Unit tests are better places for test simplicity Integration tests are better places for production faithfulness Try to cleverly increase production faithfulness in unit tests Try to cleverly increase test simplicity in integration tests In between unit and integration tests, be clear what is and isn't faithful to the production environment A case study of test simplicity with an external service Let's put these ideas into practice. I haven't changed the application code, except to rename the module OldWeatherQuery. The test code is also slightly changed to require a spec_helper file and to use a subject block to define an alias for the module name, which makes it easier to rename the code without having to change many lines of test code. So let's look at our three files now. First, here's the application code: # old_weather_query.rb   require 'net/http' require 'json' require 'timeout'   module OldWeatherQuery   extend self     class NetworkError < StandardError   end     def forecast(place, use_cache=true)     add_to_history(place)       if use_cache       cache[place] ||= begin         @api_request_count += 1         JSON.parse( http(place) )       end     else       JSON.parse( http(place) )     end   rescue JSON::ParserError     raise NetworkError.new("Bad response")   end     def api_request_count     @api_request_count ||= 0   end     def history     (@history || []).dup   end     def clear!     @history           = []     @cache             = {}     @api_request_count = 0   end     private     def add_to_history(s)     @history ||= []     @history << s   end     def cache     @cache ||= {}   end     BASE_URI = 'http://api.openweathermap.org/data/2.5/weather?q='   def http(place)     uri = URI(BASE_URI + place)       Net::HTTP.get(uri)   rescue Timeout::Error     raise NetworkError.new("Request timed out")   rescue URI::InvalidURIError     raise NetworkError.new("Bad place name: #{place}")   rescue SocketError     raise NetworkError.new("Could not reach #{uri.to_s}")   end end Next is the spec file: # spec/old_weather_query_spec.rb   require_relative 'spec_helper' require_relative '../old_weather_query'   describe OldWeatherQuery do   subject(:weather_query) { described_class }     describe 'caching' do     let(:json_response) do       '{"weather" : { "description" : "Sky is Clear"}}'     end       around(:example) do |example|       actual = weather_query.send(:cache)       expect(actual).to eq({})         example.run         weather_query.clear!     end       it "stores results in local cache" do       weather_query.forecast('Malibu,US')         actual = weather_query.send(:cache)       expect(actual.keys).to eq(['Malibu,US'])       expect(actual['Malibu,US']).to be_a(Hash)     end       it "uses cached result in subsequent queries" do       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')     end   end     describe 'query history' do     before do       expect(weather_query.history).to eq([])       allow(weather_query).to receive(:http).and_return("{}")     end     after do       weather_query.clear!     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.history).to eq(places)     end       it "does not allow history to be modified" do       expect {         weather_query.history = ['Malibu,CN']       }.to raise_error         weather_query.history << 'Malibu,CN'       expect(weather_query.history).to eq([])     end   end     describe 'number of API requests' do     before do       expect(weather_query.api_request_count).to eq(0)       allow(weather_query).to receive(:http).and_return("{}")     end       after do       weather_query.clear!     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.api_request_count).to eq(3)     end       it "does not allow count to be modified" do       expect {         weather_query.api_request_count = 100       }.to raise_error         expect {         weather_query.api_request_count += 10       }.to raise_error         expect(weather_query.api_request_count).to eq(0)     end   end end And last but not least, our spec_helper file, which has also changed only slightly: we only configure RSpec to show one slow spec (to keep test results uncluttered) and use color in the output to distinguish passes and failures more easily: # spec/spec_helper.rb   require 'rspec'   RSpec.configure do |config|   config.order            = 'random'   config.profile_examples = 1   config.color            = true end When we run these specs, something unexpected happens. Most of the time the specs pass, but sometimes they fail. If we keep running the specs with the same command, we'll see the tests pass and fail apparently at random. These are flaky tests, and we have exposed them because of the random order configuration we chose. If our tests run in a certain order, they fail. The problem could be simply in our tests. For example, we could have forgotten to clear state before or after a test. However, there could also be a problem with our code. In any case, we need to get to the bottom of the situation: We first notice that at the end of the failing test run, RSpec tells us "Randomized with seed 318". We can use this information to run the tests in the order that caused the failure and start to debug and diagnose the problem. We do this by passing the --seed parameter with the value 318, as follows: $ rspec spec/old_weather_query_spec.rb --seed 318 The problem has to do with the way that we increment @api_request_count without ensuring it has been initialized. Looking at our code, we notice that the only place we initialize @api_request_count is in OldWeatherQuery.api_request_count and OldWeatherQuery.clear!. If we don't call either of these methods first, then OldWeatherQuery.forecast, the main method in this module, will always fail. Our tests sometimes pass because our setup code calls one of these methods first when tests are run in a certain order, but that is not at all how our code would likely be used in production. So basically, our code is completely broken, but our specs pass (sometimes). Based on this, we can create a simple spec that will always fail: describe 'api_request is not initialized' do   it "does not raise an error" do     weather_query.forecast('Malibu,US')   end    end At least now our tests fail deterministically. But this is not the end of our troubles with these specs. If we run our tests many times with the seed value of 318, we will start seeing a second failing test case that is even more random than the first. This is an OldWeatherQuery::NetworkError and it indicates that our tests are actually making HTTP requests to the Internet! Let's do an experiment to confirm this. We'll turn off our Wi-Fi access, unplug our Ethernet cables, and run our specs. When we run our tests without any Internet access, we will see three errors in total. One of them is the error with the uninitialized @api_request_count instance variable, and two of them are instances of OldWeatherQuery::NetworkError, which confirms that we are indeed making real HTTP requests in our code. What's so bad about making requests to the Internet? After all, the test failures are indeed very random and we had to purposely shut off our Internet access to replicate the errors. Flaky tests are actually the least of our problems. First, we could be performing destructive actions that affect real systems, accounts, and people! Imagine if we were testing an e-commerce application that charged customer credit cards by using a third-party payment API via HTTP. If our tests actually hit our payment provider's API endpoint over HTTP, we would get a lot of declined transactions (assuming we are not storing and using real credit card numbers), which could lead to our account being suspended due to suspicions of fraud, putting our e-commerce application out of service. Also, if we were running a continuous integration (CI) server such as Jenkins, which did not have access to the public Internet, we would get failures in our CI builds due to failing tests that attempted to access the Internet. There are a few approaches to solving this problem. In our tests, we attempted to mock our HTTP requests, but obviously failed to do so effectively. A second approach is to allow actual HTTP requests but to configure a special server for testing purposes. Let's focus on figuring out why our HTTP mocks were not successful. In a small set of tests like in this example, it is not hard to hunt down the places where we are sending actual HTTP requests. In larger code bases with a lot of test support code, it may be harder. Also, it would be nice to prevent access to the Internet altogether so we notice these issues as soon as we run the offending tests. Fortunately, Ruby has many excellent tools for testing, and there is one that addresses our needs exactly: WebMock (https://github.com/bblimke/webmock). We simply install the gem and add a couple of lines to our spec helper file to disable all network connections in our tests: require 'rspec'   # require the webmock gem require 'webmock/rspec'   RSpec.configure do |config|   # this is done by default, but let's make it clear   WebMock.disable_net_connect!     Config.order            = 'random'   config.profile_examples = 1   config.color            = true end When we run our tests again, we'll see one or more instances of WebMock::NetConnectNotAllowedError, along with a backtrace to lead us to the point in our tests where the HTTP request was made: If we examine our test code, we'll notice that we mock the OldWeatherQuery.http method in a few places. However, we forgot to set up the mock in the first describe block for caching where we defined a json_response object, but never mocked the OldWeatherQuery.http method to return json_response. We can solve the problem by mocking OldWeatherQuery.http throughout the entire test file. We'll also take this opportunity to clean up the initialization of @api_request_count in our code. Here's what we have now: # new_weather_query.rb   require 'net/http' require 'json' require 'timeout'   module NewWeatherQuery   extend self     class NetworkError < StandardError   end     def forecast(place, use_cache=true)     add_to_history(place)     if use_cache       cache[place] ||= begin         increment_api_request_count         JSON.parse( http(place) )       end     else       JSON.parse( http(place) )     end   rescue JSON::ParserError => e     raise NetworkError.new("Bad response: #{e.inspect}")   end     def increment_api_request_count     @api_request_count ||= 0     @api_request_count += 1   end     def api_request_count     @api_request_count ||= 0   end     def history     (@history || []).dup   end     def clear!     @history           = []     @cache             = {}     @api_request_count = 0   end     private     def add_to_history(s)     @history ||= []     @history << s   end     def cache     @cache ||= {}   end     BASE_URI = 'http://api.openweathermap.org/data/2.5/weather?q='   def http(place)     uri = URI(BASE_URI + place)       Net::HTTP.get(uri)   rescue Timeout::Error     raise NetworkError.new("Request timed out")   rescue URI::InvalidURIError     raise NetworkError.new("Bad place name: #{place}")   rescue SocketError     raise NetworkError.new("Could not reach #{uri.to_s}")   end end And here is the spec file to go with it: # spec/new_weather_query_spec.rb   require_relative 'spec_helper' require_relative '../new_weather_query'   describe NewWeatherQuery do   subject(:weather_query) { described_class }     after { weather_query.clear! }     let(:json_response) { '{}' }   before do     allow(weather_query).to receive(:http).and_return(json_response)        end     describe 'api_request is initialized' do     it "does not raise an error" do       weather_query.forecast('Malibu,US')     end      end   describe 'caching' do     let(:json_response) do       '{"weather" : { "description" : "Sky is Clear"}}'     end       around(:example) do |example|       actual = weather_query.send(:cache)       expect(actual).to eq({})             example.run     end       it "stores results in local cache" do       weather_query.forecast('Malibu,US')         actual = weather_query.send(:cache)       expect(actual.keys).to eq(['Malibu,US'])       expect(actual['Malibu,US']).to be_a(Hash)     end       it "uses cached result in subsequent queries" do       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')     end   end     describe 'query history' do     before do       expect(weather_query.history).to eq([])     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.history).to eq(places)     end       it "does not allow history to be modified" do       expect {         weather_query.history = ['Malibu,CN']       }.to raise_error         weather_query.history << 'Malibu,CN'       expect(weather_query.history).to eq([])     end   end     describe 'number of API requests' do     before do       expect(weather_query.api_request_count).to eq(0)     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.api_request_count).to eq(3)     end       it "does not allow count to be modified" do       expect {         weather_query.api_request_count = 100       }.to raise_error         expect {         weather_query.api_request_count += 10       }.to raise_error         expect(weather_query.api_request_count).to eq(0)     end   end end Now we've fixed a major bug with our code that slipped through our specs and used to pass randomly. We've made it so that our tests always pass, regardless of the order in which they are run, and without needing to access the Internet. Our test code and application code has also become clearer as we've reduced duplication in a few places. A case study of production faithfulness with a test resource instance We're not done with our WeatherQuery example just yet. Let's take a look at how we would add a simple database to store our cached values. There are some serious limitations to the way we are caching with instance variables, which persist only within the scope of a single Ruby process. As soon as we stop or restart our app, the entire cache will be lost. In a production app, we would likely have many processes running the same code in order to serve traffic effectively. With our current approach, each process would have a separate cache, which would be very inefficient. We could easily save many HTTP requests if we were able to share the cache between processes and across restarts. Economizing on these requests is not simply a matter of improved response time. We also need to consider that we cannot make unlimited requests to external services. For commercial services, we would pay for the number of requests we make. For free services, we are likely to get throttled if we exceed some threshold. Therefore, an effective caching scheme that reduces the number of HTTP requests we make to our external services is of vital importance to the function of a real-world app. Finally, our cache is very simplistic and has no expiration mechanism short of clearing all entries. For a cache to be effective, we need to be able to store entries for individual locations for some period of time within which we don't expect the weather forecast to change much. This will keep the cache small and up to date. We'll use Redis (http://redis.io) as our database since it is very fast, simple, and easy to set up. You can find instructions on the Redis website on how to install it, which is an easy process on any platform. Once you have Redis installed, you simply need to start the server locally, which you can do with the redis-server command. We'll also need to install the Redis Ruby client as a gem (https://github.com/redis/redis-rb). Let's start with a separate configuration file to set up our Redis client for our tests: # spec/config/redis.rb   require 'rspec' require 'redis'   ENV['WQ_REDIS_URL'] ||= 'redis://localhost:6379/15'   RSpec.configure do |config|   if ! ENV['WQ_REDIS_URL'].is_a?(String)     raise "WQ_REDIS_URL environment variable not set"   end   ::REDIS_CLIENT = Redis.new( :url => ENV['WQ_REDIS_URL'] )     config.after(:example) do         ::REDIS_CLIENT.flushdb   end end Note that we place this file in a new config folder under our main spec folder. The idea is to configure each resource separately in its own file to keep everything isolated and easy to understand. This will make maintenance easy and prevent problems with configuration management down the road. We don't do much in this file, but we do establish some important conventions. There is a single environment variable, which takes care of the Redis connection URL. By using an environment variable, we make it easy to change configuration and also allow flexibility in how these configurations are stored. Our code doesn't care if the Redis connection URL is stored in a simple .env file with key-value pairs or loaded from a configuration database. We can also easily override this value manually simply by setting it when we run RSpec, like so: $ WQ_REDIS_URL=redis://1.2.3.4:4321/0 rspec spec Note that we also set a sensible default value, which is to run on the default Redis port of 6379 on our local machine, on database number 15, which is less likely to be used for local development. This prevents our tests from relying on our development database, or from polluting or destroying it. It is also worth mentioning that we prefix our environment variable with WQ (short for weather query). Small details like this are very important for keeping our code easy to understand and to prevent dangerous clashes. We could imagine the kinds of confusion and clashes that could be caused if we relied on REDIS_URL and we had multiple apps running on the same server, all relying on Redis. It would be very easy to break many applications if we changed the value of REDIS_URL for a single app to point to a different instance of Redis. We set a global constant, ::REDIS_CLIENT, to point to a Redis client. We will use this in our code to connect to Redis. Note that in real-world code, we would likely have a global namespace for the entire app and we would define globals such as REDIS_CLIENT under that namespace rather than in the global Ruby namespace. Finally, we configure RSpec to call the flushdb command after every example tagged with :redis to empty the database and keep state clean across tests. In our code, all tests interact with Redis, so this tag seems pointless. However, it is very likely that we would add code that had nothing to do with Redis, and using tags helps us to constrain the scope of our configuration hooks only to where they are needed. This will also prevent confusion about multiple hooks running for the same example. In general, we want to prevent global hooks where possible and make configuration hooks explicitly triggered where possible. So what does our spec look like now? Actually, it is almost exactly the same. Only a few lines have changed to work with the new Redis cache. See if you can spot them! # spec/redis_weather_query_spec.rb   require_relative 'spec_helper' require_relative '../redis_weather_query'   describe RedisWeatherQuery, redis: true do   subject(:weather_query) { described_class }     after { weather_query.clear! }     let(:json_response) { '{}' }   before do     allow(weather_query).to receive(:http).and_return(json_response)        end     describe 'api_request is initialized' do     it "does not raise an error" do       weather_query.forecast('Malibu,US')     end      end           describe 'caching' do     let(:json_response) do       '{"weather" : { "description" : "Sky is Clear"}}'     end       around(:example) do |example|       actual = weather_query.send(:cache).all       expect(actual).to eq({})             example.run     end     it "stores results in local cache" do       weather_query.forecast('Malibu,US')         actual = weather_query.send(:cache).all       expect(actual.keys).to eq(['Malibu,US'])       expect(actual['Malibu,US']).to be_a(Hash)     end       it "uses cached result in subsequent queries" do       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')     end   end     describe 'query history' do     before do       expect(weather_query.history).to eq([])     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.history).to eq(places)     end       it "does not allow history to be modified" do       expect {         weather_query.history = ['Malibu,CN']       }.to raise_error         weather_query.history << 'Malibu,CN'       expect(weather_query.history).to eq([])     end   end   describe 'number of API requests' do     before do       expect(weather_query.api_request_count).to eq(0)     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.api_request_count).to eq(3)     end       it "does not allow count to be modified" do       expect {         weather_query.api_request_count = 100       }.to raise_error         expect {         weather_query.api_request_count += 10       }.to raise_error         expect(weather_query.api_request_count).to eq(0)     end   end end So what about the actual WeatherQuery code? It changes very little as well: # redis_weather_query.rb   require 'net/http' require 'json' require 'timeout'   # require the new cache module require_relative 'redis_weather_cache' module RedisWeatherQuery   extend self     class NetworkError < StandardError   end     # ... same as before ...     def clear!     @history           = []     @api_request_count = 0           # no more clearing of cache here   end     private     # ... same as before ...       # the new cache module has a Hash-like interface   def cache     RedisWeatherCache   end     # ... same as before ...     end We can see that we've preserved pretty much the same code and specs as before. Almost all of the new functionality is accomplished in a new module that caches with Redis. Here is what it looks like: # redis_weather_cache.rb   require 'redis'   module RedisWeatherCache   extend self     CACHE_KEY             = 'weather_query:cache'   EXPIRY_ZSET_KEY       = 'weather_query:expiry_tracker'   EXPIRE_FORECAST_AFTER = 300 # 5 minutes       def redis_client     if ! defined?(::REDIS_CLIENT)       raise("No REDIS_CLIENT defined!")     end         ::REDIS_CLIENT   end     def []=(location, forecast)     redis_client.hset(CACHE_KEY, location, JSON.generate(forecast))     redis_client.zadd(EXPIRY_ZSET_KEY, Time.now.to_i, location)   end     def [](location)     remove_expired_entries         raw_value = redis_client.hget(CACHE_KEY, location)         if raw_value       JSON.parse(raw_value)     else       nil     end   end     def all     redis_client.hgetall(CACHE_KEY).inject({}) do |memo, (location, forecast_json)|       memo[location] = JSON.parse(forecast_json)       memo     end   end     def clear!     redis_client.del(CACHE_KEY)   end     def remove_expired_entries     # expired locations have a score, i.e. creation timestamp, less than a certain threshold     expired_locations = redis_client.zrangebyscore(EXPIRY_ZSET_KEY, 0, Time.now.to_i - EXPIRE_FORECAST_AFTER)       if ! expired_locations.empty?       # remove the cache entry       redis_client.hdel(CACHE_KEY, expired_locations)                  # also clear the expiry entry       redis_client.zrem(EXPIRY_ZSET_KEY, expired_locations)      end   end end We'll avoid a detailed explanation of this code. We simply note that we accomplish all of the design goals we discussed at the beginning of the section: a persistent cache with expiration of individual values. We've accomplished this using some simple Redis functionality along with ZSET or sorted set functionality, which is a bit more complex, and which we needed because Redis does not allow individual entries in a Hash to be deleted. We can see that by using method names such as RedisWeatherCache.[] and RedisWeatherCache.[]=, we've maintained a Hash-like interface, which made it easy to use this cache instead of the simple in-memory Ruby Hash we had in our previous iteration. Our tests all pass and are still pretty simple, thanks to the modularity of this new cache code, the modular configuration file, and the previous fixes we made to our specs to remove Internet and run-order dependencies. Summary In this article, we delved into setting up and cleaning up state for real-world specs that interact with external services and local resources by extending our WeatherQuery example to address a big bug, isolate our specs from the Internet, and cleanly configure a Redis database to serve as a better cache. Resources for Article: Further resources on this subject: Creating your first heat map in R [article] Probability of R? [article] Programming on Raspbian [article]
Read more
  • 0
  • 0
  • 3541

article-image-article-setting-development-environment
Packt
13 Jan 2012
14 min read
Save for later

Setting Up a Development Environment

Packt
13 Jan 2012
14 min read
Selecting your virtual environment Prisoners serving life sentences (in Canada) have what is known as a faint hope clause where you have a glimmer of a chance of getting parole after 15 years. However, those waiting for Microsoft to provide us a version of Virtual PC that can run Virtual Hard Drives (VHDs) hosting 64-bit operating systems (such as Windows Server 2008 R2), have no such hope of ever seeing that piece of software. But miracles do happen, and I hope that the release of a 64-bit capable Virtual PC renders this section of the article obsolete. If this has in fact happened, go with it and proceed to the following section.   Getting ready Head into your computer's BIOS settings and enable the virtualization setting. The exact setting you are looking for varies widely, so please consult with your manufacturer's documentation. This setting seems universally defaulted to off, so I am very sure you will need to perform this action.   How to do it... Since you are still reading, however, it is safe to say that a miracle has not yet happened. Your first task is to select a suitable virtualization technology that can support a 64-bit guest operating system. The recipe here is to consider the choices in this order, with the outcome of your virtual environment being selected: Microsoft Virtualization: Hyper-V certainly has the ability to create and run Virtual Hard Disks (VHDs) with 64-bit operating systems. It's free—that is, you can install the Hyper-V role , but it requires the base operating system to be Windows Server 2008 R2. It can be brutal to get it running properly on something like a laptop (for example, because of driver issues). It won't be a good idea to get Windows 2008 Server running on a laptop, primarily because of driver issues. I recommend that if your laptop is running Windows 7, look at creating a dual boot, and a boot to VHD where this other boot option / partition is Windows Server 2008 R2. The main disadvantage is coming up with an (preferably licensed) installation of Windows Server 2008 R2 as the main computer operating system (or as a dual boot option). Or perhaps your company runs Hyper-V on their server farm and would be willing to host your development environment for you? Either way, if you have managed to get access to a Hyper-V server, you are good to go! VMware Workstation: Go to http://www.vmware.com and download my absolute favorite virtualization technology—VMware Workstation—fully featured, powerful, and can run on Windows 7. I have used it for years and love it. You must of course pay for a license, but please believe me, it is a worthwhile investment. You can sign up for a 30 day trial to explore the benefits. Note that you only need one copy of VMware Workstation to create a virtual machine. Once you have created it, you can run it anywhere using the freely available VMware Player. Oracle Virtual Box: Go to http://www.virtualbox.org/ and download this free software that will run on Windows 7 and create and host 64-bit guest operating systems. The reason that this is at the bottom of the list is that I personally do not have experience using this software. However, I have colleagues who have used it and have had no problems with it. Give this a try and see if it works as equally well as a paid version of VMware. With your selected virtualization technology in hand, head to the next section to install and configure Windows Server 2008 R2, which is the base operating system required for an installation of SharePoint Server 2010. Installing and configuring Windows Server 2008 R2 SharePoint 2010 requires the Windows Server 2008 R2 operating system in order to run. In this recipe, we will confi gure the components of Windows Server 2008 necessary in order to get ready to install SQL Server 2008 and SharePoint 2010. Getting ready Download Windows Server 2008 R2 from your MSDN subscription, or type in windows server 2008 R2 trial download into your favorite search engine to download the 180-day trial from the Microsoft site. This article does not cover actually installing the base operating system. The specific instructions to do so will be dependent upon the virtualization software. Generally, it will be provided as an ISO image (the file extension will be .iso). ISO means a compressed disk image, and all virtualization software that I am aware of will let you mount (attach) an ISO image to the virtual machine as a CD Drive. This means that when you elect to create a new virtual machine, you will normally be prompted for the ISO image, and the installation of the operating system should proceed in a familiar and relatively automated fashion. So for this recipe, ready means that you have your virtualization software up and running, the Windows Server 2008 R2 base operating system is installed, and you are able to log in as the Administrator (and that you are effectively logging in for the first time). How to do it... Log in as the Administrator. You will be prompted to change the password the first time—I suggest choosing a very commonly used Microsoft password—Password1. However, feel free to select a password of your choice, but use it consistently throughout. The Initial configuration tasks screen will come up automatically. On this screen: Activate windows using your 180 day trial key or using your MSDN key. Select Provide computer name and domain. Change the computer name to a simpler one of your choice. In my case, I named the machine OPENHIGHWAY. Leave the Member of option as Workgroup. The computer will require a reboot. In the Update this server section, choose Download and install updates. Click on the Change settings link and select the option Never check for updates and click OK. Click the Check for updates link. The important updates will be selected. Click on Install Updates. Now is a good time for a coffee break! You will need to reboot the server when the updates complete. In the Customize this server section, click on Add Features. Select the Desktop Experience, Windows, PowerShell, Integrated, Scripting, and Environment options. Choose Add Required Features when prompted to do so. Reboot the server when prompted to do so. If the Initial configuration tasks screen appears now, or in the future, you may now select the checkbox for Do not show this window at logon. We will continue configuration from the Server Manager, which should be displayed on your screen. If not, launch the Server Manager using the icon on the taskbar. We return to Server Manager to continue the confi guration: OPTIONAL: Click on Configure Remote Desktop if you have a preference for accessing your virtual machine using Remote Desktop (RDP) instead of using the virtual machine's console software. In the Security Information section, click Go to Windows Firewall. Click on the Windows Firewall Properties link. From the dialog, go to each of the tabs, namely, Domain Profi le, Private Profi le, and Public Profi le and set the Firewall State to Off on each tab and click OK. Click on the Server Manager node, and from the main screen, click on the Configure IE ESC link. Set both options to Off and click OK. From the Server Manager, expand the Configuration node and then expand Local Users and Groups node, and then click on the Users folder. Right-click on the Administrator account and select Properties. Select the option for Password never expires and click OK. From the Server Manager, click the Roles node . Click the Add Roles link. Now, click on the Introductory screen and select the checkbox for Active Directory Domain Services. Click Next, again click on Next, and then click Install. After completion, click the Close this wizard and launch the Active Directory Domain Services Installation Wizard (dcpromo.exe) link. Now, carry out the following steps: From the new wizard that pops up, from the welcome screen, select the checkbox Use advanced mode installation, click Next, and again click on Next on the Operating System Compatibility screen. Select the option Create a new domain in a new forest and click Next. Choose your domain (FQDN)! This is completely internal to your development server and does not have to be real. For article purposes, I am using theopenhighway.net, as shown in the following screenshot. Then click Next: From the Set Forest Functional Level drop-down, choose Windows Server 2008 R2 and click Next. Click Next on the Additional Domain Controller Option screen. Select Yes on the Static IP assignment screen. Click Yes on the Dns Delegation Warning screen. Click Next on the Location for Database, Log Files, and SYSVOL screen. On the Directory Services Restore Mode Administrator Password screen, enter the same password that you used for the Administrator account, in my case, Password1. Click Next. Click Next on the Summary screen. Click on the Reboot On Completion screen. Otherwise reboot the server after the installation completes You will now confi gure a user account that will run the application pools for the SharePoint web applications in IIS. From the Server Manager, expand the Roles node. Keep expanding the Active Directory Domain Services until you see the Users folder. Click on the Users folder. Now carry out the following: Right-click on the Users folder and select New | User Enter SP_AppPool in the full name field and also enter SP_AppPool in the user logon field and click Next. Enter the password as Password1 (or the same as you had selected for the Administrator account). Deselect the option for User must change password at next logon and select the option for Password never expires. Click Next and then click Finish. A loopback check is a security feature to mitigate against reflection attacks, introduced in Windows Server 2003 SP1. You will likely encounter connection issues with your local websites and it is therefore universally recommended that you disable the loopback check on a development server. This is done from the registry editor: Click the Start menu button, choose Run…, enter Regedit, and click OK to bring up the registry editor. Navigate to HKEY_LOCAL_MACHINE | SYSTEM | CurrentControlSet | Control | Lsa Right-click the Lsa node and select New | DWORD (32-bit) Value In the place of New Value #1 type DisableLoopbackCheck. Right-click DisableLoopbackCheck, select Modify, change the value to 1, and click OK Congratulations! You have successfully confi gured Windows Server 2008 R2. There's more... The Windows Shutdown Event Tracker is simply annoying on a development machine. To turn this feature off, click the Start button, select Run…, enter gpedit.msc, and click OK. Scroll down, right-click on Display Shutdown Event Tracker, and select Edit. Select the Disabled option and click OK, as shown in the following screenshot: Installing and configuring SQL Server 2008 R2 SharePoint 2010 requires Microsoft SQL Server as a fundamental component of the overall SharePoint architecture. The content that you plan to manage in SharePoint, including web content and documents, literally is stored within and served from SQL Server databases. The SharePoint 2010 architecture itself relies on information stored in SQL Server databases, such as confi guration and the many service applications. In this recipe, we will install and configure the components of SQL Server 2008 necessary to install SharePoint 2010. Getting ready I do not recommend SQL Server Express for your development environment, although this is a possible, free, and valid choice for the installation of SharePoint 2010. In my personal experience, I have valued the full power and fl exibility of the full version of SQL Server as well as not having to live with the constraints and limitations of SQL Express. Besides, there is another little reason too! The Enterprise edition of SQL Server is either readily available with your MSDN subscription or downloadable as a trial from the Microsoft site. Download SQL Server 2008 R2 Enterprise from your MSDN subscription, or type in sql server 2008 enterprise R2 trial download into your favorite search engine to download the 180-day trial from the Microsoft site. For SQL Server 2008 R2 Enterprise, if you have MSDN software, then you will be provided with an ISO image that you can attach to the virtual machine. If you download your SQL Server from the Microsoft site as a trial, extract the software (it is a self-extracting EXE) on your local machine, and then share the folder with your virtual machine. Finallly, run the Setup.exe fi le. How to do it... Here is your recipe for installing SQL Server 2008 R2 Enterprise. Carry out the following steps to complete this recipe: You will be presented with the SQL Server Installation Center; on the left side of the screen, select Installation, as shown in the following screenshot: For the choices presented on the Installation screen, select New installation or add features to an existing installation. The Setup Support Rules (shown in the following screenshot) will run to identify any possible problems that might occur when installing SQL Server. All rules should pass. Click OK to continue: You will be presented with the SQL Server 2008 R2 Setup screen. On the fi rst screen, you can select an evaluation or use your product key (from, for example, MSDN) and then click Next. Accept the terms in the license, but do not check the Send feature usage data to Microsoft checkbox, and click Next. On the Setup Support Files screen, click Install. All tests will pass except for a warning that you can safely ignore (the one noting we are installing on a domain controller), and click Next, as shown in the following screenshot: On the Setup Role screen, select SQL Server Feature Installation and click Next. On the Feature Selection, as shown in the following screenshot, carry out the following tasks: In Instance Features, select Database Engine Services (and both SQL Server Replication and Full Text Search), Analysis Services, and Reporting Services In Shared Features, select Business Intelligence Development Studio, Management Tools Basic (and Management Tools Complete), and Microsoft Sync Framework Finally, click Next. On the Installation Rules screen, click Next On the Instance Confi guration screen, click Next. On the Disk Space Requirements screen, click Next On the Server Confi guration screen: Set the Startup Type for SQL Server Agent to be Automatic Click on the button Use the same account for all SQL Server services. Select the account NT AUTHORITYSYSTEM and click OK. Finally, click Next. On the Database Configuration Engine screen: Look for the Account Provisioning tab and click the Add Current User button under Specify SQL Server administrators. Finally, click Next On the Analysis Services Confi guration screen: Look for the Account Provisioning tab and click the Add Current User button under Specify which users have administrative permissions for Analysis Services. Finally, click Next. On the Reporting Services Configuration screen, select the option to Install but do not configure the report server. Now, click Next. On the Error Reporting Screen, click Next. On the Installation Confi guration Rules screen, click Next. On the Ready to Install screen, click Install. Your patience will be rewarded with the Complete screen! Finally, click Close. The Complete screen is shown in the following screenshot: You can close the SQL Server Installation Center. Confi gure SQL Server security for the SP_AppPool account: Click Start | All Programs | SQL Server 2008 R2 | SQL Server Management Studio. On Connect to server, type a period (.) in the Server Name field and click Connect. Expand the Security node. Right-click Logins and select New Login. Use the Search function and enter SP_AppPool in the box Enter object name to select. Click the check names button and then click OK. In my case, you see the properly formatted THEOPENHIGHWAYSP_AppPool in the login name text box. On the Server Roles tab, ensure that the dbcreator and securityadmin roles are selected (in addition to the already selected public role). Finally, click OK. Congratulations! You have successfully installed and confi gured SQL Server 2008 R2 Enterprise.
Read more
  • 0
  • 0
  • 3541

article-image-test-all-things-python
Packt
18 Feb 2016
20 min read
Save for later

Test all the things with Python

Packt
18 Feb 2016
20 min read
The first testing tool we're going to look at is called doctest. The name is short for "document testing" or perhaps a "testable document". Either way, it's a literate tool designed to make it easy to write tests in such a way that computers and humans both benefit from them. Ideally, doctest tests both, informs human readers, and tells the computer what to expect. Mixing tests and documentation helps us: Keeps the documentation up-to-date with reality Make sure that the tests express the intended behavior Reuse some of the efforts involved in the documentation and test creation (For more resources related to this topic, see here.) Where doctest performs best The design decisions that went into doctest make it particularly well suited to writing acceptance tests at the integration and system testing levels. This is because doctest mixes human-only text with examples that both humans and computers can read. This structure doesn't support or enforce any of the formalizations of testing, but it conveys information beautifully and it still provides the computer with the ability to say that works or that doesn't work. As an added bonus, it is about the easiest way to write tests you'll ever see. In other words, a doctest file is a truly excellent program specification that you can have the computer check against your actual code any time you want. API documentation also benefits from being written as doctests and checked alongside your other tests. You can even include doctests in your docstrings. The basic idea you should be getting from all this is that doctest is ideal for uses where humans and computers will both benefit from reading them. The doctest language Like program source code, doctest tests are written in plain text. The doctest module extracts the tests and ignores the rest of the text, which means that the tests can be embedded in human-readable explanations or discussions. This is the feature that makes doctest suitable for uses such as program specifications. Example – creating and running a simple doctest We are going to create a simple doctest file, to show the fundamentals of using the tool. Perform the following steps: Open a new text file in your editor, and name it test.txt. Insert the following text into the file: This is a simple doctest that checks some of Python's arithmetic >>> 2 + 2 4 >>> 3 * 3 10 We can now run the doctest. At the command prompt, change to the directory where you saved test.txt. Type the following command: $ python3 ‑m doctest test.txt When the test is run, you should see output like this: Result – three times three does not equal ten You just wrote a doctest file that describes a couple of arithmetic operations, and ran it to check whether Python behaved as the tests said it should. You ran the tests by telling Python to execute doctest on the file containing the tests. In this case, Python's behavior differed from the tests because, according to the tests, three times three equals ten. However, Python disagrees on that. As doctest expected one thing and Python did something different, doctest presented you with a nice little error report showing where to find the failed test, and how the actual result differed from the expected result. At the bottom of the report is a summary showing how many tests failed in each file tested, which is helpful when you have more than one file containing tests. The syntax of doctests You might have already figured it out from looking at the previous example: doctest recognizes tests by looking for sections of text that look like they've been copied and pasted from a Python interactive session. Anything that can be expressed in Python is valid within a doctest. Lines that start with a >>> prompt are sent to a Python interpreter. Lines that start with a ... prompt are sent as continuations of the code from the previous line, allowing you to embed complex block statements into your doctests. Finally, any lines that don't start with >>> or ..., up to the next blank line or >>> prompt, represent the output expected from the statement. The output appears as it would in an interactive Python session, including both the return value and anything printed to the console. If you don't have any output lines, doctest assumes it to mean that the statement is expected to have no visible result on the console, which usually means that it returns None. The doctest module ignores anything in the file that isn't part of a test, which means that you can put explanatory text, HTML, line-art diagrams, or whatever else strikes your fancy in between your tests. We took advantage of this in the previous doctest to add an explanatory sentence before the test itself. Example – a more complex test Add the following code to your test.txt file, separated from the existing code by at least one blank line: Now we're going to take some more of doctest's syntax for a spin.   >>> import sys >>> def test_write(): ...     sys.stdout.write("Hellon") ...     return True >>> test_write() Hello True Now take a moment to consider before running the test. Will it pass or fail? Should it pass or fail? Result – five tests run? Just as we discussed before, run the test using the following command: python3 -m doctest test.txt You should see a result like this: Because we added the new tests to the same file containing the tests from before, we still see the notification that three times three does not equal 10. Now, though, we also see that five tests were run, which means our new tests ran and were successful. Why five tests? As far as doctest is concerned, we added the following three tests to the file: The first one says that, when we import sys, nothing visible should happen The second test says that, when we define the test_write function, nothing visible should happen The third test says that, when we call the test_write function, Hello and True should appear on the console, in that order, on separate lines Since all three of these tests pass, doctest doesn't bother to say much about them. All it did was increase the number of tests reported at the bottom from two to five. Expecting exceptions That's all well and good for testing that things work as expected, but it is just as important to make sure that things fail when they're supposed to fail. Put another way: sometimes your code is supposed to raise an exception, and you need to be able to write tests that check that behavior as well. Fortunately, doctest follows nearly the same principle in dealing with exceptions as it does with everything else; it looks for text that looks like a Python interactive session. This means it looks for text that looks like a Python exception report and traceback, and matches it against any exception that gets raised. The doctest module does handle exceptions a little differently from the way it handles other things. It doesn't just match the text precisely and report a failure if it doesn't match. Exception tracebacks tend to contain many details that are not relevant to the test, but that can change unexpectedly. The doctest module deals with this by ignoring the traceback entirely: it's only concerned with the first line, Traceback (most recent call last):, which tells it that you expect an exception, and the part after the traceback, which tells it which exception you expect. The doctest module only reports a failure if one of these parts does not match. This is helpful for a second reason as well: manually figuring out what the traceback will look like, when you're writing your tests, would require a significant amount of effort and would gain you nothing. It's better to simply omit them. Example – checking for an exception This is yet another test that you can add to test.txt, this time testing some code that ought to raise an exception. Insert the following text into your doctest file, as always separated by at least one blank line: Here we use doctest's exception syntax to check that Python is correctly enforcing its grammar. The error is a missing ) on the def line. >>> def faulty(: ...     yield from [1, 2, 3, 4, 5] Traceback (most recent call last): SyntaxError: invalid syntax The test is supposed to raise an exception, so it will fail if it doesn't raise the exception or if it raises the wrong exception. Make sure that you have your mind wrapped around this: if the test code executes successfully, the test fails, because it expected an exception. Run the tests using the following doctest: python3 -m doctest test.txt Result – success at failing The code contains a syntax error, which means this raises a SyntaxError exception, which in turn means that the example behaves as expected; this signifies that the test passes. When dealing with exceptions, it is often desirable to be able to use a wildcard matching mechanism. The doctest provides this facility through its ellipsis directive that we'll discuss shortly. Expecting blank lines The doctest uses the first blank line after  >>> to identify the end of the expected output, so what do you do when the expected output actually contains a blank line? The doctest handles this situation by matching a line that contains only the text <BLANKLINE> in the expected output with a real blank line in the actual output. Controlling doctest behavior with directives Sometimes, the default behavior of doctest makes writing a particular test inconvenient. For example, doctest might look at a trivial difference between the expected and real outputs and wrongly conclude that the test has failed. This is where doctest directives come to the rescue. Directives are specially formatted comments that you can place after the source code of a test and that tell doctest to alter its default behavior in some way. A directive comment begins with # doctest:, after which comes a comma-separated list of options that either enable or disable various behaviors. To enable a behavior, write a + (plus symbol) followed by the behavior name. To disable a behavior, white a – (minus symbol) followed by the behavior name. We'll take a look at the several directives in the following sections. Ignoring part of the result It's fairly common that only part of the output of a test is actually relevant to determining whether the test passes. By using the +ELLIPSIS directive, you can make doctest treat the text ... (called an ellipsis) in the expected output as a wildcard that will match any text in the output. When you use an ellipsis, doctest will scan until it finds text matching whatever comes after the ellipsis in the expected output, and continue matching from there. This can lead to surprising results such as an ellipsis matching against a 0-length section of the actual output, or against multiple lines. For this reason, it needs to be used thoughtfully. Example – ellipsis test drive We're going to use the ellipsis in a few different tests to better get a feel of how it works. As an added bonus, these tests also show the use of doctest directives. Add the following code to your test.txt file: Next up, we're exploring the ellipsis. >>> sys.modules # doctest: +ELLIPSIS {...'sys': <module 'sys' (built-in)>...}   >>> 'This is an expression that evaluates to a string' ... # doctest: +ELLIPSIS 'This is ... a string'   >>> 'This is also a string' # doctest: +ELLIPSIS 'This is ... a string'   >>> import datetime >>> datetime.datetime.now().isoformat() # doctest: +ELLIPSIS '...-...-...T...:...:...' Result – ellipsis elides The tests all pass, where they would all fail without the ellipsis. The first and last tests, in which we checked for the presence of a specific module in sys.modules and confirmed a specific formatting while ignoring the contents of a string, demonstrate the kind of situation where ellipsis is really useful, because it lets you focus on the part of the output that is meaningful and ignore the rest of the test. The middle tests demonstrate how different outputs can match the same expected result when ellipsis is in play. Look at the last test. Can you imagine any output that wasn't an ISO-formatted time stamp, but that would match the example anyway? Remember that the ellipsis can match any amount of text. Ignoring white space Sometimes, white space (spaces, tabs, newlines, and their ilk) is more trouble than it's worth. Maybe you want to be able to break a single line of expected output across several lines in your test file, or maybe you're testing a system that uses lots of white space but doesn't convey any useful information with it. The doctest gives you a way to "normalize" white space, turning any sequence of white space characters, in both the expected output and in the actual output, into a single space. It then checks whether these normalized versions match. Example – invoking normality We're going to write a couple of tests that demonstrate how whitespace normalization works. Insert the following code into your doctest file: Next, a demonstration of whitespace normalization. >>> [1, 2, 3, 4, 5, 6, 7, 8, 9] ... # doctest: +NORMALIZE_WHITESPACE [1, 2, 3,  4, 5, 6,  7, 8, 9]   >>> sys.stdout.write("This textn contains weird     spacing.n") ... # doctest: +NORMALIZE_WHITESPACE This text contains weird spacing. 39 Result – white space matches any other white space Both of these tests pass, in spite of the fact that the result of the first one has been wrapped across multiple lines to make it easy for humans to read, and the result of the second one has had its strange newlines and indentations left out, also for human convenience. Notice how one of the tests inserts extra whitespace in the expected output, while the other one ignores extra whitespace in the actual output? When you use +NORMALIZE_WHITESPACE, you gain a lot of flexibility with regard to how things are formatted in the text file. You may have noted the value 39 on the last line of the last example. Why is that there? It's because the write() method returns the number of bytes that were written, which in this case happens to be 39. If you're trying this example in an environment that maps ASCII characters to more than one byte, you will see a different number here; this will cause the test to fail until you change the expected number of bytes. Skipping an example On some occasions, doctest will recognize some text as an example to be checked, when in truth you want it to be simply text. This situation is rarer than it might at first seem, because usually there's no harm in letting doctest check everything it can. In fact, usually it's very helpful to have doctest check everything it can. For those times when you want to limit what doctest checks, though, there's the +SKIP directive. Example – humans only Append the following code to your doctest file: Now we're telling doctest to skip a test   >>> 'This test would fail.' # doctest: +SKIP If it were allowed to run. Result – it looks like a test, but it's not Before we added this last example to the file, doctest reported thirteen tests when we ran the file through it. After adding this code, doctest still reports thirteen tests. Adding the skip directive to the code completely removed it from consideration by doctest. It's not a test that passes, nor a test that fails. It's not a test at all. The other directives There are a number of other directives that can be issued to doctest, should you find the need. They're not as broadly useful as the ones already mentioned, but the time might come when you require one or more of them. The full documentation for all of the doctest directives can be found at http://docs.python.org/3/library/doctest.html#doctest-options. The remaining directives of doctest in the Python 3.4 version are as follows: DONT_ACCEPT_TRUE_FOR_1: This makes doctest differentiate between boolean values and numbers DONT_ACCEPT_BLANKLINE: This removes support for the <BLANKLINE> feature IGNORE_EXCEPTION_DETAIL: This makes doctest only care that an exception is of the expected type Strictly speaking, doctest supports several other options that can be set using the directive syntax, but they don't make any sense as directives, so we'll ignore them here. The execution scope of doctest tests When doctest is running the tests from text files, all the tests from the same file are run in the same execution scope. This means that, if you import a module or bind a variable in one test, that module or variable is still available in later tests. We took advantage of this fact several times in the tests written so far in this article: the sys module was only imported once, for example, although it was used in several tests. This behavior is not necessarily beneficial, because tests need to be isolated from each other. We don't want them to contaminate each other because, if a test depends on something that another test does, or if it fails because of something that another test does, these two tests are in some sense combined into one test that covers a larger section of your code. You don't want that to happen, because then knowing which test has failed doesn't give you as much information about what went wrong and where it happened. So, how can we give each test its own execution scope? There are a few ways to do it. One would be to simply place each test in its own file, along with whatever explanatory text that is needed. This works well in terms of functionality, but running the tests can be a pain unless you have a tool to find and run all of them for you. Another problem with this approach is that this breaks the idea that the tests contribute to a human-readable document. Another way to give each test its own execution scope is to define each test within a function, as follows: >>> def test1(): ...     import frob ...     return frob.hash('qux') >>> test1() 77 By doing this, the only thing that ends up in the shared scope is the test function (named test1 here). The frob module and any other names bound inside the function are isolated with the caveat that things that happen inside imported modules are not isolated. If the frob.hash() method changes a state inside the frob module, that state will still be changed if a different test imports the frob module again. The third way is to exercise caution with the names you create, and be sure to set them to known values at the beginning of each test section. In many ways this is the easiest approach, but this is also the one that places the most burden on you, because you have to keep track of what's in the scope. Why does doctest behave in this way, instead of isolating tests from each other? The doctest files are intended not just for computers to read, but also for humans. They often form a sort of narrative, flowing from one thing to the next. It would break the narrative to be constantly repeating what came before. In other words, this approach is a compromise between being a document and being a test framework, a middle ground that works for both humans and computers. Check your understanding Once you've decided on your answers to these questions, check them by writing a test document and running it through doctest: How does doctest recognize the beginning of a test in a document? How does doctest know when a test continues to further lines? How does doctest recognize the beginning and end of the expected output of a test? How would you tell doctest that you want to break the expected output across several lines, even though that's not how the test actually outputs it? Which parts of an exception report are ignored by doctest? When you assign a variable in a test file, which parts of the file can actually see that variable? Why do we care what code can see the variables created by a test? How can we make doctest not care what a section of output contains? Exercise – English to doctest Time to stretch your wings a bit. I'm going to give you a description of a single function in English. Your job is to copy the description into a new text file, and then add tests that describe all the requirements in a way that the computer can understand and check. Try to make the doctests so that they're not just for the computer. Good doctests tend to clarify things for human readers as well. By and large, this means that you present them to human readers as examples interspersed with the text. Without further ado, here is the English description: The fib(N) function takes a single integer as its only parameter N. If N is 0 or 1, the function returns 1. If N is less than 0, the function raises a ValueError. Otherwise, the function returns the sum of fib(N – 1) and fib(N – 2). The returned value will never be less than 1. A naïve implementation of this function would get very slow as N increased. I'll give you a hint and point out that the last sentence about the function being slow, isn't really testable. As computers get faster, any test you write that depends on an arbitrary definition of "slow" will eventually fail. Also, there's no good way to test the difference between a slow function and a function stuck in an infinite loop, so there's not much point in trying. If you find yourself needing to do that, it's best to back off and try a different solution. Not being able to tell whether a function is stuck or just slow is called the halting problem by computer scientists. We know that it can't be solved unless we someday discover a fundamentally better kind of computer. Faster computers won't do the trick, and neither will quantum computers, so don't hold your breath. The next-to-last sentence also provides some difficulty, since to test it completely would require running every positive integer through the fib() function, which would take forever (except that the computer will eventually run out of memory and force Python to raise an exception). How do we deal with this sort of thing, then? The best solution is to check whether the condition holds true for a random sample of viable inputs. The random.randrange() and random.choice() functions in the Python standard library make that fairly easy to do. Summary We learned the syntax of doctest, and went through several examples describing how to use it. After that, we took a real-world specification for the AVL tree, and examined how to formalize it as a set of doctests, so that we could use it to automatically check the correctness of an implementation. Specifically, we covered doctest's default syntax and the directives that alter it, how to write doctests in text files, how to write doctests in Python docstrings, and what it feels like to use doctest to turn a specification into tests. If, you want to learn more about Python Testing then you can refer the following books: Expert Python Programming: https://www.packtpub.com/application-development/expert-python-programming Python Testing Cookbook: https://www.packtpub.com/application-development/python-testing-cookbook Resources for Article:   Further resources on this subject: Façade Pattern – Being Adaptive with Façade [article] Predicting Sports Winners with Decision Trees and pandas [article] Gradient Descent at Work [article]
Read more
  • 0
  • 0
  • 3540
article-image-coreos-networking-and-flannel-internals
Packt
03 Feb 2016
8 min read
Save for later

CoreOS Networking and Flannel Internals

Packt
03 Feb 2016
8 min read
In this article by Sreenivas Makam, author of the book Mastering CoreOS explains how microservices has increased the need to have lots of containers and also connectivity between containers across hosts. It is necessary to have a robust Container networking scheme to achieve this goal. This article will cover the basics of Container networking with a focus on how CoreOS does Container networking with Flannel. (For more resources related to this topic, see here.) Container networking basics The following are the reasons why we need Container networking: Containers need to talk to the external world. Containers should be reachable from the external world so that the external world can use the services that Containers provide. Containers need to talk to the host machine. An example can be sharing volumes. There should be inter-container connectivity in the same host and across hosts. An example is a WordPress container in one host talking to a MySQL container in another host. Multiple solutions are currently available to interconnect Containers. These solutions are pretty new and actively under development. Docker, until release 1.8, did not have a native solution to interconnect Containers across hosts. Docker release 1.9 introduced a Libnetwork-based solution to interconnect containers across hosts as well as do service discovery. CoreOS is using Flannel for container networking in CoreOS clusters. There are projects such as Weave and Calico that are developing Container networking solutions, and they plan to be a networking container plugin for any Container runtime such as Docker or Rkt. Flannel Flannel is an open source project that provides a Container networking solution for CoreOS clusters. Flannel can also be used for non-CoreOS clusters. Kubernetes uses Flannel to set up networking between the Kubernetes pods. Flannel allocates a separate subnet for every host where a Container runs, and the Containers in this host get allocated an individual IP address from the host subnet. An overlay network is set up between each host that allows Containers on different hosts to talk to each other. In Chapter 1, CoreOS Overview covered an overview of the Flannel control and data path. This section will delve into the Flannel internals. Manual installation Flannel can be installed manually or using the systemd unit, flanneld.service. The following command will install flannel in the CoreOS node using a container to build the flannel binary. The flanneld Flannel binary will be available in /home/core/flannel/bin after executing the following commands: git clone https://github.com/coreos/flannel.git docker run -v /home/core/flannel:/opt/flannel -i -t google/golang /bin/bash -c "cd /opt/flannel && ./build" The following is the Flannel version after we build flannel in our CoreOS node: Installation using flanneld.service Flannel is not installed by default in CoreOS. This is done to keep the CoreOS image size to a minimum. Docker requires flannel to configure the network and flannel requires docker to download the flannel container. To avoid this chicken-and-egg problem, early-docker.service is started by default in CoreOS, whose primary purpose is to download the flannel container and start it. A regular docker.service starts the Docker daemon with the flannel network. The following image shows you the sequence in flanneld.service, where early Docker daemon starts the flannel container, which, in turn starts docker.service with the subnet created by flannel: The following is the relevant section of flanneld.service that downloads the flannel container from the Quay repository: The following output shows the early docker's running containers. Early-docker will manage Flannel only: The following is the relevant section of flanneld.service that updates the docker options to use the subnet created by flannel: The following is the content of flannel_docker_opts.env—in my case—after flannel was started. The address, 10.1.60.1/24, is chosen by this CoreOS node for its containers: Docker will be started as part of docker.service, as shown in the following image, with the preceding environment file: Control path There is no central controller in flannel, and it uses etcd for internode communication. Each node in the CoreOS cluster runs a flannel agent and they communicate with each other using etcd. As part of starting the Flannel service, we specify the Flannel subnet that can be used by the individual nodes in the network. This subnet is registered with etcd so that every CoreOS node in the cluster can see it. Each node in the network picks a particular subnet range and registers atomically with etcd. The following is the relevant section of cloud-config that starts flanneld.service along with specifying the configuration for Flannel. Here, we have specified the subnet to be used for flannel as 10.1.0.0/16 along with the encapsulation type as vxlan: The preceding configuration will create the following etcd key as seen in the node. This shows that 10.1.0.0/16 is allocated for flannel to be used across the CoreOS cluster and that the encapsulation type is vxlan: Once each node gets a subnet, containers started in this node will get an IP address from the IP address pool allocated to the node. The following is the etcd subnet allocation per node. As we can see, all the subnets are in the 10.1.0.0/16 range that was configured earlier with etcd and with a 24-bit mask. The subnet length per host can also be controlled as a flannel configuration option: Let's look at ifconfig of the Flannel interface created in this node. The IP address is in the address range of 10.1.0.0/16: Data path Flannel uses the Linux bridge to encapsulate the packets using an overlay protocol specified in the Flannel configuration. This allows for connectivity between containers in the same host as well as across hosts. The following are the major backends currently supported by Flannel and specified in the JSON configuration file. The JSON configuration file can be specified in the Flannel section of cloud-config: UDP: In UDP encapsulation, packets from containers are encapsulated in UDP with the default port number 8285. We can change the port number if needed. VXLAN: From an encapsulation overhead perspective, VXLAN is efficient when compared to UDP. By default, port 8472 is used for VXLAN encapsulation. If we want to use an IANA-allocated VXLAN port, we need to specify the port field as 4789. AWS-VPC: This is applicable to using Flannel in the AWS VPC cloud. Instead of encapsulating the packets using an overlay, this approach uses a VPC route table to communicate across containers. AWS limits each VPC route table entry to 50, so this can become a problem with bigger clusters. The following is an example of specifying the AWS type in the flannel configuration: GCE: This is applicable to using Flannel in the GCE cloud. Instead of encapsulating the packets using an overlay, this approach uses the GCE route table to communicate across containers. GCE limits each VPC route table entry to 100, so this can become a problem with bigger clusters. The following is an example of specifying the GCE type in the Flannel configuration: Let's create containers in two different hosts with a VXLAN encapsulation and check whether the connectivity is fine. The following example uses a Vagrant CoreOS cluster with the Flannel service enabled. Host 1: Let's start a busybox container: Let's check the IP address allotted to the container. This IP address comes from the IP pool allocated to this CoreOS node by the flannel agent. 10.1.19.0/24 was allocated to host 1 and this container got the 10.1.19.2 address: Host 2: Let's start a busybox container: Let's check the IP address allotted to this container. This IP address comes from the IP pool allocated to this CoreOS node by the flannel agent. 10.1.1.0/24 was allocated to host 2 and this container got the 10.1.1.2 address: The following output shows you the ping being successful between container 1 and container 2. This ping packet is travelling across the two CoreOS nodes and is encapsulated using VXLAN: Flannel as a CNI plugin As explained in Chapter 1, CoreOS Overview, APPC defines a Container specification that any Container runtime can use. For Container networking, APPC defines a Container Network Interface (CNI) specification. With CNI, the Container networking functionality can be implemented as a plugin. CNI expects plugins to support APIs with a set of parameters and the implementation is left to the plugin. Example APIs add a container to a network and remove the container from the network with a defined parameter list. This allows the implementation of network plugins by different vendors and also the reuse of plugins across different Container runtimes. The following image shows the relationship between the RKT container runtime, CNI layer, and Plugin like Flannel. The IPAM Plugin is used to allocate an IP address to the containers and this is nested inside the initial networking plugin: Summary In this chapter, we covered different Container networking technologies with a focus on Container networking in CoreOS. There are many companies trying to solve this Container networking problem. Resources for Article: Further resources on this subject: Network and Data Management for Containers [article] Deploying a Play application on CoreOS and Docker [article] CoreOS – Overview and Installation [article]
Read more
  • 0
  • 0
  • 3537

article-image-hadoop-and-hdinsight-heartbeat
Packt
30 Sep 2013
6 min read
Save for later

Hadoop and HDInsight in a Heartbeat

Packt
30 Sep 2013
6 min read
(For more resources related to this topic, see here.) Apache Hadoop is the leading Big Data platform that allows to process large datasets efficiently and at low cost. Other Big Data 0platforms are MongoDB, Cassandra, and CouchDB. This section describes Apache Hadoop core concepts and its ecosystem. Core components The following image shows core Hadoop components: At the core, Hadoop has two key components: Hadoop Distributed File System (HDFS) Hadoop MapReduce (distributed computing for batch jobs) For example, say we need to store a large file of 1 TB in size and we only have some commodity servers each with limited storage. Hadoop Distributed File System can help here. We first install Hadoop, then we import the file, which gets split into several blocks that get distributed across all the nodes. Each block is replicated to ensure that there is redundancy. Now we are able to store and retrieve the 1 TB file. Now that we are able to save the large file, the next obvious need would be to process this large file and get something useful out of it, like a summary report. To process such a large file would be difficult and/or slow if handled sequentially. Hadoop MapReduce was designed to address this exact problem statement and process data in parallel fashion across several machines in a fault-tolerant mode. MapReduce programing models use simple key-value pairs for computation. One distinct feature of Hadoop in comparison to other cluster or grid solutions is that Hadoop relies on the "share nothing" architecture. This means when the MapReduce program runs, it will use the data local to the node, thereby reducing network I/O and improving performance. Another way to look at this is when running MapReduce, we bring the code to the location where the data resides. So the code moves and not the data. HDFS and MapReduce together make a powerful combination, and is the reason why there is so much interest and momentum with the Hadoop project. Hadoop cluster layout Each Hadoop cluster has three special master nodes (also known as servers): NameNode: This is the master for the distributed filesystem and maintains a metadata. This metadata has the listing of all the files and the location of each block of a file, which are stored across the various slaves (worker bees). Without a NameNode HDFS is not accessible. Secondary NameNode: This is an assistant to the NameNode. It communicates only with the NameNode to take snapshots of the HDFS metadata at intervals configured at cluster level. JobTracker: This is the master node for Hadoop MapReduce. It determines the execution plan of the MapReduce program, assigns it to various nodes, monitors all tasks, and ensures that the job is completed by automatically relaunching any task that fails. All other nodes of the Hadoop cluster are slaves and perform the following two functions: DataNode: Each node will host several chunks of files known as blocks. It communicates with the NameNode. TaskTracker: Each node will also serve as a slave to the JobTracker by performing a portion of the map or reduce task, as decided by the JobTracker. The following image shows a typical Apache Hadoop cluster: The Hadoop ecosystem As Hadoop's popularity has increased, several related projects have been created that simplify accessibility and manageability to Hadoop. I have organized them as per the stack, from top to bottom. The following image shows the Hadoop ecosystem: Data access The following software are typically used access mechanisms for Hadoop: Hive: It is a data warehouse infrastructure that provides SQL-like access on HDFS. This is suitable for the ad hoc queries that abstract MapReduce. Pig: It is a scripting language such as Python that abstracts MapReduce and is useful for data scientists. Mahout: It is used to build machine learning and recommendation engines. MS Excel 2013: With HDInsight, you can connect Excel to HDFS via Hive queries to analyze your data. Data processing The following are the key programming tools available for processing data in Hadoop: MapReduce: This is the Hadoop core component that allows distributed computation across all the TaskTrackers Oozie: It enables creation of workflow jobs to orchestrate Hive, Pig, and MapReduce tasks The Hadoop data store The following are the common data stores in Hadoop: HBase: It is the distributed and scalable NOSQL (Not only SQL) database that provides a low-latency option that can handle unstructured data HDFS: It is a Hadoop core component, which is the foundational distributed filesystem Management and integration The following are the management and integration software: Zookeeper: It is a high-performance coordination service for distributed applications to ensure high availability Hcatalog: It provides abstraction and interoperability across various data processing software such as Pig, MapReduce, and Hive Flume: Flume is distributed and reliable software for collecting data from various sources for Hadoop Sqoop: It is designed for transferring data between HDFS and any RDBMS Hadoop distributions Apache Hadoop is an open-source software and is repackaged and distributed by vendors offering enterprise support. The following is the listing of popular distributions: Amazon Elastic MapReduce (cloud, http://aws.amazon.com/elasticmapreduce/) Cloudera (http://www.cloudera.com/content/cloudera/en/home.html) EMC PivitolHD (http://gopivotal.com/) Hortonworks HDP (http://hortonworks.com/) MapR (http://mapr.com/) Microsoft HDInsight (cloud, http://www.windowsazure.com/) HDInsight distribution differentiator HDInsight is an enterprise-ready distribution of Hadoop that runs on Windows servers and on Azure HDInsight cloud service. It is 100 percent compatible with Apache Hadoop. HDInsight was developed in partnership with Hortonworks and Microsoft. Enterprises can now harness the power of Hadoop on Windows servers and Windows Azure cloud service. The following are the key differentiators for HDInsight distribution: Enterprise-ready Hadoop: HDInsight is backed by Microsoft support, and runs on standard Windows servers. IT teams can leverage Hadoop with Platform as a Service ( PaaS ) reducing the operations overhead. Analytics using Excel: With Excel integration, your business users can leverage data in Hadoop and analyze using PowerPivot. Integration with Active Directory: HDInsight makes Hadoop reliable and secure with its integration with Windows Active directory services. Integration with .NET and JavaScript: .NET developers can leverage the integration, and write map and reduce code using their familiar tools. Connectors to RDBMS: HDInsight has ODBC drivers to integrate with SQL Server and other relational databases. Scale using cloud offering: Azure HDInsight service enables customers to scale quickly as per the project needs and have seamless interface between HDFS and Azure storage vault. JavaScript console: It consists of easy-to-use JavaScript console for configuring, running, and post processing of Hadoop MapReduce jobs. Summary In this article, we reviewed the Apache Hadoop components and the ecosystem of projects that provide a cost-effective way to deal with Big Data problems. We then looked at how Microsoft HDInsight makes the Apache Hadoop solution better by simplified management, integration, development, and reporting. Resources for Article : Further resources on this subject: Making Big Data Work for Hadoop and Solr [Article] Understanding MapReduce [Article] Advanced Hadoop MapReduce Administration [Article]
Read more
  • 0
  • 0
  • 3534
Modal Close icon
Modal Close icon