Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7009 Articles
article-image-groups-and-cohorts
Packt
06 Jul 2015
20 min read
Save for later

Groups and Cohorts in Moodle

Packt
06 Jul 2015
20 min read
In this article by William Rice, author of the book, Moodle E-Learning Course Development - Third Edition shows you how to use groups to separate students in a course into teams. You will also learn how to use cohorts to mass enroll students into courses. Groups versus cohorts Groups and cohorts are both collections of students. There are several differences between them. We can sum up these differences in one sentence, that is; cohorts enable administrators to enroll and unenroll students en masse, whereas groups enable teachers to manage students during a class. Think of a cohort as a group of students working together through the same academic curriculum. For example, a group of students all enrolled in the same course. Think of a group as a subset of students enrolled in a course. Groups are used to manage various activities within a course. Cohort is a system-wide or course category-wide set of students. There is a small amount of overlap between what you can do with a cohort and a group. However, the differences are large enough that you would not want to substitute one for the other. Cohorts In this article, we'll look at how to create and use cohorts. You can perform many operations with cohorts in bulk, affecting many students at once. Creating a cohort To create a cohort, perform the following steps: From the main menu, select Site administration | Users | Accounts | Cohorts. On the Cohorts page, click on the Add button. The Add New Cohort page is displayed. Enter a Name for the cohort. This is the name that you will see when you work with the cohort. Enter a Cohort ID for the cohort. If you upload students in bulk to this cohort, you will specify the cohort using this identifier. You can use any characters you want in the Cohort ID; however, keep in mind that the file you upload to the cohort can come from a different computer system. To be safe, consider using only ASCII characters; such as letters, numbers, some special characters, and no spaces in the Cohort ID option. For example, Spring_2012_Freshmen. Enter a Description that will help you and other administrators remember the purpose of the cohort. Click on Save changes. Now that the cohort is created, you can begin adding users to this cohort. Adding students to a cohort Students can be added to a cohort manually by searching and selecting them. They can also be added in bulk by uploading a file to Moodle. Manually adding and removing students to a cohort If you add a student to a cohort, that student is enrolled in all the courses to which the cohort is synchronized. If you remove a student from a cohort, that student will be unenrolled from all the courses to which the cohort is synchronized. We will look at how to synchronize cohorts and course enrollments later. For now, here is how to manually add and remove students from a cohort: From the main menu, select Site administration | Users | Accounts | Cohorts. On the Cohorts page, for the cohort to which you want to add students, click on the people icon: The Cohort Assign page is displayed. The left-hand side panel displays users that are already in the cohort, if any. The right-hand side panel displays users that can be added to the cohort. Use the Search field to search for users in each panel. You can search for text that is in the user name and e-mail address fields. Use the Add and Remove button to move users from one panel to another. Adding students to a cohort in bulk – upload When you upload students to Moodle, you can add them to a cohort. After you have all the students in a cohort, you can quickly enroll and unenroll them in courses just by synchronizing the cohort to the course. If you are going to upload students in bulk, consider putting them in a cohort. This makes it easier to manipulate them later. Here is an example of a cohort. Note that there are 1,204 students enrolled in the cohort: These students were uploaded to the cohort under Administration | Site Administration | Users | Upload users: The file that was uploaded contained information about each student in the cohort. In a spreadsheet, this is how the file looks: username,email,firstname,lastname,cohort1 moodler_1,bill@williamrice.net,Bill,Binky,open-enrollmentmoodlers moodler_2,rose@williamrice.net,Rose,Krial,open-enrollmentmoodlers moodler_3,jeff@williamrice.net,Jeff,Marco,open-enrollmentmoodlers moodler_4,dave@williamrice.net,Dave,Gallo,open-enrollmentmoodlers In this example, we have the minimum required information to create new students. These are as follows: The username The e-mail address The first name The last name We also have the cohort ID (the short name of the cohort) in which we want to place a student. During the upload process, you can see a preview of the file that you will upload: Further down on the Upload users preview page, you can choose the Settings option to handle the upload: Usually, when we upload users to Moodle, we will create new users. However, we can also use the upload option to quickly enroll existing users in the cohort. You saw previously (Manually adding and removing students to a cohort) how to search for and then enroll users in a cohort. However, when you want to enroll hundreds of users in the cohort, it's often faster to create a text file and upload it, than to search your existing users. This is because when you create a text file, you can use powerful tools—such as spreadsheets and databases—to quickly create this file. If you want to perform this, you will find options to Update existing users under the Upload type field. In most Moodle systems, a user's profile must include a city and country. When you upload a user to a system, you can specify the city and country in the upload file or omit them from the upload file and assign the city and country to the system while the file is uploaded. This is performed under Default values on the Upload users page: Now that we have examined some of the capabilities and limitations of this process, let's list the steps to upload a cohort to Moodle: Prepare a plain file that has, at minimum, the username, email, firstname, lastname, and cohort1 information. If you were to create this in a spreadsheet, it may look similar to the following screenshot: Under Administration | Site Administration | Users | Upload users, select the text file that you will upload. On this page, choose Settings to describe the text file, such as delimiter (separator) and encoding. Click on the Upload users button. You will see the first few rows of the text file displayed. Also, additional settings become available on this page. In the Settings section, there are settings that affect what happens when you upload information about existing users. You can choose to have the system overwrite information for existing users, ignore information that conflicts with existing users, create passwords, and so on. In the Default values section, you can enter values to be entered into the user profiles. For example, you can select a city, country, and department for all the users. Click on the Upload users button to begin the upload. Cohort sync Using the cohort sync enrolment method, you can enroll and un-enroll large collections of students at once. Using cohort sync involves several steps: Creating a cohort. Enrolling students in the cohort. Enabling the cohort sync enrollment method. Adding the cohort sync enrollment method to a course. You saw the first two steps: how to create a cohort and how to enroll students in the cohort. We will cover the last two steps: enabling the cohort sync method and adding the cohort sync to a course. Enabling the cohort sync enrollment method To enable the cohort sync enrollment method, you will need to log in as an administrator. This cannot be done by someone who has only teacher rights: Select Site administration | Plugins | Enrolments | Manage enrol plugins. Click on the Enable icon located next to Cohort sync. Then, click on the Settings button located next to Cohort sync. On the Settings page, choose the default role for people when you enroll them in a course using Cohort sync. You can change this setting for each course. You will also choose the External unenrol action. This is what happens to a student when they are removed from the cohort. If you choose Unenrol user from course, the user and all his/her grades are removed from the course. The user's grades are purged from Moodle. If you were to read this user to the cohort, all the user's activity in this course will be blank, as if the user was never in the course. If you choose Disable course enrolment and remove roles, the user and all his/her grades are hidden. You will not see this user in the course's grade book. However, if you were to read this user to the cohort or to the course, this user's course records will be restored. After enabling the cohort sync method, it's time to actually add this method to a course. Adding the cohort sync enrollment method to a course To perform this, you will need to log in as an administrator or a teacher in the course: Log in and enter the course to which you want to add the enrolment method. Select Course administration | Users | Enrolment methods. From the Add method drop-down menu, select Cohort sync. In Custom instance name, enter a name for this enrolment method. This will enable you to recognize this method in a list of cohort syncs. For Active, select Yes. This will enroll the users. Select the Cohort option. Select the role that the members of the cohort will be given. Click on the Save changes button. All the users in the cohort will be given a selected role in the course. Un-enroll a cohort from a course There are two ways to un-enroll a cohort from a course. First, you can go to the course's enrollment methods page and delete the enrollment method. Just click on the X button located next to the cohort sync field that you added to the course. However, this will not just remove users from the course, but also delete all their course records. The second method preserves the student records. Once again, go to the course's enrollment methods page located next to the Cohort sync method that you added and click on the Settings icon. On the Settings page, select No for Active. This will remove the role that the cohort was given. However, the members of the cohort will still be listed as course participants. So, as the members of the cohort do not have a role in the course, they can no longer access this course. However, their grades and activity reports are preserved. Differences between cohort sync and enrolling a cohort Cohort sync and enrolling a cohort are two different methods. Each has advantages and limitations. If you follow the preceding instructions, you can synchronize a cohort's membership to a course's enrollment. As people are added to and removed from the cohort, they are enrolled and un-enrolled from the course. When working with a large group of users, this can be a great time saver. However, using cohort sync, you cannot un-enroll or change the role of just one person. Consider a scenario where you have a large group of students who want to enroll in several courses, all at once. You put these students in a cohort, enable the cohort sync enrollment method, and add the cohort sync enrollment method to each of these courses. In a few minutes, you have accomplished your goal. Now, if you want to un-enroll some users from some courses, but not from all courses, you remove them from the cohort. So, these users are removed from all the courses. This is how cohort sync works. Cohort sync is everyone or no one When a person is added to or removed from the cohort, this person is added to or removed from all the courses to which the cohort is synced. If that's what you want, great. If not, An alternative to cohort sync is to enroll a cohort. That is, you can select all the members of a cohort and enroll them in a course, all at once. However, this is a one-way journey. You cannot un-enroll them all at once. You will need to un-enroll them one at a time. If you enroll a cohort all at once, after enrollment, users are independent entities. You can un-enroll them and change their role (for example, from student to teacher) whenever you wish. To enroll a cohort in a course, perform the following steps: Enter the course as an administrator or teacher. Select Administration | Course administration | Users | Enrolled users. Click on the Enrol cohort button. A popup window appears. This window lists the cohorts on the site. Click on Enrol users next to the cohort that you want to enroll. The system displays a confirmation message. Now, click on the OK button. You will be taken back to the Enrolled users page. Note that although you can enroll all users in a cohort (all at once), there is no button to un-enroll them all at once. You will need to remove them one at a time from your course. Managing students with groups A group is a collection of students in a course. Outside of a course, a group has no meaning. Groups are useful when you want to separate students studying the same course. For example, if your organization is using the same course for several different classes or groups, you can use the group feature to separate students so that each group can see only their peers in the course. For example, you can create a new group every month for employees hired that month. Then, you can monitor and mentor them together. After you have run a group of people through a course, you may want to reuse this course for another group. You can use the group feature to separate groups so that the current group doesn't see the work done by the previous group. This will be like a new course for the current group. You may want an activity or resource to be open to just one group of people. You don't want others in the class to be able to use that activity or resource. Course versus activity You can apply the groups setting to an entire course. If you do this, every activity and resource in the course will be segregated into groups. You can also apply the groups setting to an individual activity or resource. If you do this, it will override the groups setting for the course. Also, it will segregate just this activity, or resource between groups. The three group modes For a course or activity, there are several ways to apply groups. Here are the three group modes: No groups: There are no groups for a course or activity. If students have been placed in groups, ignore it. Also, give everyone the same access to the course or activity. Separate groups: If students have been placed in groups, allow them to see other students and only the work of other students from their own group. Students and work from other groups are invisible. Visible groups: If students have been placed in groups, allow them to see other students and the work of other students from all groups. However, the work from other groups is read only. You can use the No groups setting on an activity in your course. Here, you want every student who ever took the course to be able to interact with each other. For example, you may use the No groups setting in the news forum so that all students who have ever taken the course can see the latest news. Also, you can use the Separate groups setting in a course. Here, you will run different groups at different times. For each group that runs through the course, it will be like a brand new course. You can use the Visible groups setting in a course. Here, students are part of a large and in-person class; you want them to collaborate in small groups online. Also, be aware that some things will not be affected by the groups setting. For example, no matter what the group setting, students will never see each other's assignment submissions. Creating a group There are three ways to create groups in a course. You can: Manually create and populate each group Automatically create and populate groups based on the characteristics of students Import groups using a text file We'll cover these methods in the following subsections. Manually creating and populating a group Don't be discouraged by the idea of manually populating a group with students. It takes only a few clicks to place a student in a group. To create and populate a group, perform the following steps: Select Course administration | Users | Groups. This takes you to the Groups page. Click on the Create group button. The Create group page is displayed. You must enter a Name for the group. This will be the name that teachers and administrators see when they manage a group. The Group ID number is used to match up this group with a group identifier in another system. If your organization uses a system outside Moodle to manage students and this system categorizes students in groups, you can enter the group ID from the other system in this field. It does not need to be a number. This field is optional. The Group description field is optional. It's good practice to use this to explain the purpose and criteria for belonging to a group. The Enrolment key is a code that you can give to students who self enroll in a course. When the student enrolls, he/she is prompted to enter the enrollment key. On entering this key, the student is enrolled in the course and made a member of the group. If you add a picture to this group, then when members are listed (as in a forum), the member will have the group picture shown next to them. Here is an example of a contributor to a forum on http://www.moodle.org with her group memberships: Click on the Save changes button to save the group. On the Groups page, the group appears in the left-hand side column. Select this group. In the right-hand side column, search for and select the students that you want to add to this group: Note the Search fields. These enable you to search for students that meet a specific criteria. You can search the first name, last name, and e-mail address. The other part of the user's profile information is not available in this search box. Automatically creating and populating a group When you automatically create groups, Moodle creates a number of groups that you specify and then takes all the students enrolled in the course and allocates them to these groups. Moodle will put the currently enrolled students in these groups even if they already belong to another group in the course. To automatically create a group, use the following steps: Click on the Auto-create groups button. The Auto-create groups page is displayed. In the Naming scheme field, enter a name for all the groups that will be created. You can enter any characters. If you enter @, it will be converted to sequential letters. If you enter #, it will be converted to sequential numbers. For example, if you enter Group @, Moodle will create Group A, Group B, Group C, and so on. In the Auto-create based on field, you will tell the system to choose either of the following options:     Create a specific number of groups and then fill each group with as many students as needed (Number of groups)     Create as many groups as needed so that each group has a specific number of students (Members per group). In the Group/member count field, you will tell the system to choose either of the following options:     How many groups to create (if you choose the preceding Number of groups option)     How many members to put in each group (if you choose the preceding Members per group option) Under Group members, select who will be put in these groups. You can select everyone with a specific role or everyone in a specific cohort. The setting for Prevent last small group is available if you choose Members per group. It prevents Moodle from creating a group with fewer than the number of students that you specify. For example, if your class has 12 students and you choose to create groups with five members per group, Moodle would normally create two groups of five. Then, it would create another group for the last two members. However, with Prevent last small group selected, it will distribute the remaining two members between the first two groups. Click on the Preview button to preview the results. The preview will not show you the names of the members in groups, but it will show you how many groups and members will be in each group. Importing groups The term importing groups may give you the impression that you will import students into a group. The import groups button does not import students into groups. It imports a text file that you can use to create groups. So, if you need to create a lot of groups at once, you can use this feature to do this. This needs to be done by a site administrator. If you need to import students and put them into groups, use the upload students feature. However, instead of adding students to the cohort, you will add them to a course and group. You perform this by specifying the course and group fields in the upload file, as shown in the following code: username,email,firstname,lastname,course1,group1,course2 moodler_1,bill@williamrice.net,Bill,Binky,history101,odds,science101 moodler_2,rose@williamrice.net,Rose,Krial,history101,even,science101 moodler_3,jeff@williamrice.net,Jeff,Marco,history101,odds,science101 moodler_4,dave@williamrice.net,Dave,Gallo,history101,even,science101 In this example, we have the minimum needed information to create new students. These are as follows: The username The e-mail address The first name The last name We have also enrolled all the students in two courses: history101 and science101. In the history101 course, Bill Binky, and Jeff Marco are placed in a group called odds. Rose Krial and Dave Gallo are placed in a group called even. In the science101 course, the students are not placed in any group. Remember that this student upload doesn't happen on the Groups page. It happens under Administration | Site Administration | Users | Upload users. Summary Cohorts and groups give you powerful tools to manage your students. Cohorts are a useful tool to quickly enroll and un-enroll large numbers of students. Groups enable you to separate students who are in the same course and give teachers the ability to quickly see only those students that they are responsible for. Useful Links: What's New in Moodle 2.0 Moodle for Online Communities Understanding Web-based Applications and Other Multimedia Forms
Read more
  • 0
  • 0
  • 10261

Packt
06 Jul 2015
10 min read
Save for later

Subtitles – tracking the video progression

Packt
06 Jul 2015
10 min read
In this article by Roberto Ulloa, author of the book Kivy – Interactive Applications and Games in Python Second Edition, we will learn how to use the progression of a video to display subtitles at the right moment. (For more resources related to this topic, see here.) Let's add subtitles to our application. We will do this in four simple steps: Create a Subtitle widget (subtitle.kv) derived from the Label class that will display the subtitles Place a Subtitle instance (video.kv) on top of the video widget Create a Subtitles class (subtitles.py) that will read and parse a subtitle file Track the Video progression (video.py) to display the corresponding subtitle The Step 1 involves the creation of a new widget in the subtitle.kv file: 1. # File name: subtitle.kv 2. <Subtitle@Label>: 3.     halign: 'center' 4.     font_size: '20px' 5.     size: self.texture_size[0] + 20, self.texture_size[1] + 20 6.     y: 50 7.     bcolor: .1, .1, .1, 0 8.     canvas.before: 9.         Color: 10.            rgba: self.bcolor 11.         Rectangle: 12.             pos: self.pos 13.             size: self.size There are two interesting elements in this code. The first one is the definition of the size property (line 4). We define it as 20 pixels bigger than the texture_size width and height. The texture_size property indicates the size of the text determined by the font size and text, and we use it to adjust the Subtitles widget size to its content. The texture_size is a read-only property because its value is calculated and dependent on other parameters, such as font size and height for text display. This means that we will read from this property but not write on it. The second element is the creation of the bcolor property (line 7) to store a background color, and how the rgba color of the rectangle has been bound to it (line 10). The Label widget (like many other widgets) doesn't have a background color, and creating a rectangle is the usual way to create such features. We add the bcolor property in order to change the color of the rectangle from outside the instance. We cannot directly modify parameters of the vertex instructions; however, we can create properties that control parameters inside the vertex instructions. Let's move on to Step 2 mentioned earlier. We need to add a Subtitle instance to our current Video widget in the video.kv file: 14. # File name: video.kv 15. ... 16. #:set _default_surl      "http://www.ted.com/talks/subtitles/id/97/lang/en" 18. <Video>: 19.     surl: _default_surl 20.     slabel: _slabel 21.     ... 23.     Subtitle: 24.         id: _slabel 25.         x: (root.width - self.width)/2 We added another constant variable called _default_surl (line 16), which contains the link to the URL with the corresponding subtitle TED video file. We set this value to the surl property (line 19), which we just created to store the subtitles' URL. We added the slabel property (line 20), that references the Subtitle instance through its ID (line 24). Then we made sure that the subtitle is centered (line 25). In order to start Step 3 (parse the subtitle file), we need to take a look at the format of the TED subtitles: 26. { 27.     "captions": [{ 28.         "duration":1976, 29.         "content": "When you have 21 minutes to speak,", 30.         "startOfParagraph":true, 31.         "startTime":0, 32.     }, ... TED uses a very simple JSON format (https://en.wikipedia.org/wiki/JSON) with a list of captions. Each caption contains four keys but we will only use duration, content, and startTime. We need to parse this file, and luckily Kivy provides a UrlRequest class (line 34) that will do most of the work for us. Here is the code for subtitles.py that creates the Subtitles class: 33. # File name: subtitles.py 34. from kivy.network.urlrequest import UrlRequest 36. class Subtitles: 38.     def __init__(self, url): 39.         self.subtitles = [] 40.         req = UrlRequest(url, self.got_subtitles) 42.     def got_subtitles(self, req, results): 43.         self.subtitles = results['captions'] 45.     def next(self, secs): 46.         for sub in self.subtitles: 47.             ms = secs*1000 - 12000 48.             st = 'startTime' 49.             d = 'duration' 50.             if ms >= sub[st] and ms <= sub[st] + sub[d]: 51.                 return sub 52.         return None The constructor of the Subtitles class will receive a URL (line 38) as a parameter. Then, it will make the petition to instantiate the UrlRequest class (line 40). The first parameter of the class instantiation is the URL of the petition, and the second is the method that is called when the result of the petition is returned (downloaded). Once the request returns the result, the method got_subtitles is called(line 42). The UrlRequest extracts the JSON and places it in the second parameter of got_subtitles. All we had to do is put the captions in a class attribute, which we called subtitles (line 43). The next method (line 45) receives the seconds (secs) as a parameter and will traverse the loaded JSON dictionary in order to search for the corresponding subtitle that belongs to that time. As soon as it finds one, the method returns it. We subtracted 12000 microseconds (line 47, ms = secs*1000 - 12000) because the TED videos have an introduction of approximately 12 seconds before the talk starts. Everything is ready for Step 4, in which we put the pieces together in order to see the subtitles working. Here are the modifications to the header of the video.py file: 53. # File name: video.py 54. ... 55. from kivy.properties import StringProperty 56. ... 57. from kivy.lang import Builder 59. Builder.load_file('subtitle.kv') 61. class Video(KivyVideo): 62.     image = ObjectProperty(None) 63.     surl = StringProperty(None) We imported StringProperty and added the corresponding property (line 55). We will use this property by the end of this chapter when we we can switch TED talks from the GUI. For now, we will just use _default_surl defined in video.kv (line 63). We also loaded the subtitle.kv file (line 59). Now, let's analyze the rest of the changes to the video.py file: 64.     ... 65.     def on_source(self, instance, value): 66.         self.color = (0,0,0,0) 67.         self.subs = Subtitles(name, self.surl) 68.         self.sub = None 70.     def on_position(self, instance, value): 71.         next = self.subs.next(value) 72.         if next is None: 73.             self.clear_subtitle() 74.         else: 75.             sub = self.sub 76.             st = 'startTime' 77.             if sub is None or sub[st] != next[st]: 78.                 self.display_subtitle(next) 80.     def clear_subtitle(self): 81.         if self.slabel.text != "": 82.             self.sub = None 83.             self.slabel.text = "" 84.             self.slabel.bcolor = (0.1, 0.1, 0.1, 0) 86.     def display_subtitle(self, sub): 87.         self.sub = sub 88.         self.slabel.text = sub['content'] 89.         self.slabel.bcolor = (0.1, 0.1, 0.1, .8) 90. (...) We introduced a few code lines to the on_source method in order to initialize the subtitles attribute with a Subtitles instance (line 67) using the surl property and initialize the sub attribute that contains the currently displayed subtitle (line 68), if any. Now, let's study how we keep track of the progression to display the corresponding subtitle. When the video plays inside the Video widget, the on_position event is triggered every second. Therefore, we implemented the logic to display the subtitles in the on_position method (lines 70 to 78). Each time the on_position method is called (each second), we ask the Subtitles instance (line 71) for the next subtitle. If nothing is returned, we clear the subtitle with the clear_subtitle method (line 73). If there is already a subtitle in the current second (line 74), then we make sure that there is no subtitle being displayed, or that the returned subtitle is not the one that we already display (line 164). If the conditions are met, we display the subtitle using the display_subtitle method (line 78). Notice that the clear_subtitle (lines 80 to 84) and display_subtitle (lines 86 to 89) methods use the bcolor property in order to hide the subtitle. This is another trick to make a widget invisible without removing it from its parent. Let's take a look at the current result of our videos and subtitles in the following screenshot: Summary In this article, we discussed how to control a video and how to associate the subtitles element of the screen with it. We also discussed how the Video widget incorporates synchronization of subtitles that we receive in a JSON format file with the progression of the video and a responsive control bar. We learned how to control its progression and add subtitles to it. Resources for Article: Further resources on this subject: Moving Further with NumPy Modules [article] Learning Selenium Testing Tools with Python [article] Python functions – Avoid repeating code [article]
Read more
  • 0
  • 0
  • 4049

article-image-jira-agile-scrum
Packt
03 Jul 2015
24 min read
Save for later

JIRA Agile for Scrum

Packt
03 Jul 2015
24 min read
In this article, by Patrick Li, author of the book, JIRA Agile Essentials, we will learn that Scrum is one of the agile methodologies supported by JIRA Agile. Unlike the old days, when a project manager would use either a spreadsheet or Microsoft project to keep track of the project progress, with JIRA Agile and Scrum, team participation is encouraged, to improve collaboration between different project stakeholders. (For more resources related to this topic, see here.) Roles in Scrum In any Scrum team, there are three primary roles. Although each role has its own specific functions and responsibilities, you need all three to work together as a cohesive team in order to be successful at Scrum. Product owner The product owner is usually the product or project manager, who is responsible for owning the overall vision and the direction of the product that the team is working on. As the product owner, they are in charge of the features that will be added to the backlog list, the priority of each feature, and planning the delivery of these features through sprints. Essentially, the product owner is the person who makes sure that the team is delivering the most value for the stakeholders in each sprint. The Scrum master The Scrum master's job is to make sure that the team is running and using Scrum effectively and efficiently; so, they should be very knowledgeable and experienced with using Scrum. The Scrum master has the following two primary responsibilities: To coach and help everyone on the team to understand Scrum; this includes the product owner, delivery team, as well as external people that the project team interacts with. In the role of a coach, the Scrum master may help the product owner to understand and better manage the backlog and plan for sprints as well as explain the process with the delivery team. To improve the team's Scrum process by removing any obstacles in the way. Obstacles, also known as impediments, are anything that may block or negatively affect the team's adoption of Scrum. These can include things such as poorly-organized product backlog or the lack of support from other teams/management. It is the responsibility of the Scrum master to either directly remove these impediments or work with the team to find a solution. Overall, the Scrum master is the advocate for Scrum, responsible for educating, facilitating, and helping people adopt and realize the advantages of using it. The delivery team The delivery team is primarily responsible for executing and delivering the final product. However, the team is also responsible for providing estimates on tasks and assisting the product owner to better plan sprints and delivery. Ideally, the team should consist of cross-functional members required for the project, such as developers, testers, and business analysts. Since each sprint can be viewed as a mini project by itself, it is critical to have all the necessary resources available at all times, as tasks are being worked on and passed along the workflow. Last but not least, the team is also responsible for retrospectively reviewing their performance at the end of each sprint, along with the product owner and Scrum master. This helps the team review what they have done and reveals how they can improve for the upcoming sprints. Understanding the Scrum process Now, we will give you a brief introduction to Scrum and an overview of the various roles that Scrum prescribes. Let's take a look at how a typical project is run with Scrum and some of the key activities. First, we have the backlog, which is a one-dimensional list of the features and requirements that need to be implemented by the team. The item's backlogs are listed from top to bottom by priority. While the product owner is the person in charge of the backlog, defining the priority based on his vision, everyone in the team can contribute by adding new items to the backlog, discussing priorities, and estimating efforts required for implementation. The team will then start planning their next immediate sprint. During this sprint planning meeting, the team will decide on the scope of the sprint. Usually, top priority items from the backlog will be included. The key here is that by the end of the sprint, the team should have produced a fully tested, potentially shippable product containing all the committed features. During the sprint, the team will have daily Scrum meetings, usually at the start of each day, where every member of the team will give a quick overview of what they have done, plan to do, and any impediments. The goal is to make sure that everyone is on the same page, so meetings should be short and sweet. At the end of the sprint, the team will have a sprint review meeting, where the team will present what they have produced to the stakeholder. During this meeting, new changes will often emerge as the product starts to take shape, and these changes will be added to the backlog, which the team will reprioritize before the next sprint commences. Another meeting called the sprint retrospective meeting will also take place at the end of the sprint, where the team will come together to discuss what they have done right, what they have done wrong, and how they can improve. Throughout this process, the Scrum master will act as the referee, where they will make sure all these activities are done correctly. For example, the Scrum master will guide the product owner and the team during the backlog and sprint planning meetings to make sure the items they have are scoped and described correctly. The Scrum master will also ensure that the meetings stay focused, productive, do not run overtime, and that the team members remain respectful without trying to talk over each other. So, now you have seen some of the advantages of using Scrum, the different roles, as well as a simple Scrum process; let's see how we can use JIRA Agile to run projects with Scrum. Creating a new Scrum board The first step to start using JIRA Agile for Scrum is to create a Scrum board for your project. If you created your project by using the Agile Scrum project template, a Scrum board is automatically created for you along with the project. However, if you want to create a board for existing projects, or if you want your board to span across multiple projects, you will need to create it separately. To create a new board, perform the following steps: Click on the Agile menu item from the top navigation bar and select the Manage Boards option. Click on the Create board button. This will bring up the Create an Agile board dialog. Select the Create a Scrum board option, as shown in the following screenshot: Select the way you want to create your board and click on the Next button. There are three options to choose from, as follows: New project and a new board: This is the same as creating a project using the Scrum Agile project template. A new project will be created along with a new Scrum board dedicated to the project. Board from an existing project: This option allows you to create a new board from your existing projects. The board will be dedicated to only one project. Board from an existing Saved Filter: This option allows you to create a board that can span across multiple projects with the use of a filter. So, in order to use this option, you will first have to create a filter that includes the projects and issues you need. If you have many issues in your project, you can also use filters to limit the number of issues to be included. Fill in the required information for the board. Depending on the option you have selected, you will either need to provide the project details or select a filter to use. The following screenshot shows an example of how to create a board with a filter. Click on the Create board button to finish: Understanding the Scrum board The Scrum board is what you and your team will be using to plan and run your project. It is your backlog as well as your sprint activity board. A Scrum board has the following three major modes: Backlog: The Backlog mode is where you will plan your sprints, organize your backlog, and create issues Active sprints: The Active sprints mode is where your team will be working in a sprint Reports: The Reports mode is where you can track the progress of your sprint The following screenshot shows a typical Scrum board in the Backlog mode. In the center of the page, you have the backlog, listing all the issues. You can drag them up and down to reorder their priorities. On the right-hand side, you have the issue details panel, which will be displayed when you click on an issue in the backlog: During the backlog planning meetings, the product owner and the team will use this Backlog mode to add new items to the backlog as well as decide on their priorities. Creating new issues When a Scrum board is first created, all the issues, if any (called user stories or stories for short), are placed in the backlog. During your sprint planning meetings, you can create more issues and add them to the backlog as you translate requirements into user stories. To create a new issue, perform the following steps: Browse to your Scrum board. Click on the Create button from the navigation bar at the top or press C on your keyboard. This will bring up the Create Issue dialog. Select the type of issue (for example, Story) you want to create from the Issue Type field. Provide additional information for the issue, such as Summary and Description. Click on the Create button to create the issue, as shown in the following screenshot: Once you have created the issue, it will be added to the backlog. You can then assign it to epics or version, and schedule it to be completed by adding it to sprints. When creating and refining your user stories, you will want to break them down as much as possible, so that when it comes to deciding on the scope of a sprint, it will be much easier for the team to provide an estimate. One approach is by using the INVEST characteristics defined by Bill Wake: Independent: It is preferable if each story can be done independently. While this is not always possible, independent tasks make implementation easier. Negotiable: The developers and product owners need to work together so that both parties are fully aware of what the story entails. Valuable: The story needs to provide value to the customer. Estimable: If a story is too big or complicated for the development team to provide an estimate, then it needs to be broken down further. Small: Each story needs to be small, often addressing a single feature that will fit into a single sprint (roughly 2 weeks). Testable: The story needs to describe the expected end result so that after it is implemented, it can be verified. Creating new epics Epics are big user stories that describe major application features. They are then broken down into smaller, more manageable user stories. In JIRA Agile, epics are a convenient way to group similar user stories together. To create a new epic from your Scrum board, perform the following steps: Expand the Epics panel if it is hidden, by clicking on EPICS from the left-hand side panel. Click on the Create Epic link from the Epics panel. The link will appear when you hover your mouse over the panel. This will bring up the Create Epic dialog, with the Project and Issue Type fields already preselected for you: You can also open the Create issue dialog, and select Issue Type as Epic. Provide a name for the epic in the Epic Name field. Provide a quick summary in the Summary field. Click on the Create button. Once you have created the epic, it will be added to the Epics panel. Epics do not show up as cards in sprints or in the backlog. After you have created your epic, you can start adding issues under it. Doing this helps you organize issues that are related to the same functionality or feature. There are two ways in which you can add issues to an epic: By creating new issues directly in the epic, expanding the epic you want, and clicking on the Create issue in epic link By dragging existing issues into the epic, as shown in the following screenshot: Estimating your work Estimation is an art and is a big part of Scrum. Being able to estimate well as a team will directly impact how successful your sprints will be. When it comes to Scrum, estimation means velocity. In other words, it means how much work your team can deliver in a sprint. This is different from the traditional idea of measuring and estimating by man hours. The concept of measuring velocity is to decouple estimation from time tracking. So, instead of estimating the work based on how many hours it will take to complete a story, which will inadvertently make people work long hours trying to keep the estimates accurate, it can be easily done by using an arbitrary number for measurement, which will help us avoid this pitfall. A common approach is to use what are known as story points. Story points are used to measure the complexity or level of effort required to complete a story, not how long it will take to complete it. For example, a complex story may have eight story points, while a simpler story will have only two. This does not mean that the complex story will take 8 hours to complete. It is simply a way to measure its complexity in relation to others. After you have estimated all your issues with story points, you need to figure out how many story points your team can deliver in a sprint. Of course, you will not know this for your first sprint, so you will have to estimate this again. Let's say your team is able to deliver 10 story points worth of work in a one-week sprint, then you can create sprints with any number of issues that add up to 10 story points. As your team starts working on the sprint, you will likely find that the estimate of 10 story points is too much or not enough, so you will need to adjust this for your second sprint. Remember that the goal here is not to get it right the first time but to continuously improve your estimates to a point where the team can consistently deliver the same amount of story points' worth of work, that is, your team's velocity. Once you accurately start predicting your team's velocity, it will become easier to manage the workload for each sprint. Now that you know how estimates work in Scrum, let's look at how JIRA Agile lets you estimate work. JIRA Agile provides several ways for you to estimate issues, and the most common approach is to use story points. Each story in your backlog has a field called Estimate, as shown in the following screenshot. To provide an estimate for the story, you just need to hover over the field, click on it, and enter the story point value: You cannot set estimates once the issue is in active development, that is, the sprint that the issue belongs to is active. Remember that the estimate value you provide here is arbitrary, as long as it can reflect the issues' complexity in relation to each other. Here are a few more points for estimation: Be consistent on how you estimate issues. Involve the team during estimation. If the estimates turn out to be incorrect, it is fine. The goal here is to improve and adjust. Ranking and prioritizing your issues During the planning session, it is important to rank your issues so that the list reflects their importance relative to each other. For those who are familiar with JIRA, there is a priority field, but since it allows you to have more than one issue sharing the same priority value, it becomes confusing when you have two issues both marked as critical. JIRA Agile addresses this issue by letting you simply drag an issue up and down the list according to its importance, with the more important issues at the top and the less important issues at the bottom. This way, you end up with an easy-to-understand list. Creating new versions In a software development team, you will likely be using versions to plan your releases. Using versions allows you to plan and organize issues in your backlog and schedule when they will be completed. You can create multiple versions and plan your roadmap accordingly. To create a new version, follow these steps: Expand the Versions panel if it is hidden, by clicking on VERSIONS from the left-hand side panel. Click on the Create Version link from the Versions panel. The link will appear when you hover your mouse over the panel. This will bring up the Create Version dialog with the Project field preselected for you, as shown in the following screenshot: Provide a name for the version in the Name field. You can also specify the start and release dates for the version. These fields are optional, and you can change them later. Click on the Create button. Once the version is created, it will be added to the Versions panel. Just like epics, you can add issues to a version by dragging and dropping the issue over onto the target version. In Scrum, a version can span across many sprints. Clicking on a version will display the issues that are part of the version. As shown in the following screenshot, Version 2.0 spans across three sprints: Planning sprints The sprint planning meeting is where the project team comes together at the start of each sprint and decides what they should focus and work on next. With JIRA Agile, you will be using the Backlog mode of your board to create and plan the new sprint's scope. Now we illustrate some of the key components during sprint planning: Backlog: This includes all the issues that are not in any sprint yet. In other words, it includes the issues that are not yet scheduled for completion. For a new board, all existing issues will be placed in the backlog. Sprints: These are displayed above the backlog. You can have multiple sprints and plan ahead. Issue details: This is the panel on your right-hand side. It displays details of the issue you are clicking on. Epics: This is one of the panels on your left-hand side. It displays all the epics you have. Versions: This is the other panel on your left-hand side. It displays all the versions you have. The highlighted area in the following screenshot is the new sprint, and the issues inside the sprint are what the team has committed to deliver at the end of the sprint: Starting a sprint Once all the epics and issues have been created, it is time to start preparing a sprint. The first step is to create a new sprint by clicking on the Create Sprint button. There are two ways to add issues to a sprint: By dragging the issues you want from backlog and dropping them into the sprint By dragging the sprint footer down, to include all the issues you want to be part of the sprint You can create multiple sprints and plan beyond the current one by filling each sprint with issues from your backlog. Once you have all the issues you want in the sprint, click on the Start Sprint link. As shown in the following screenshot, you will be asked to set the start and end dates of the sprint. By default, JIRA Agile will automatically set the start date to the current date, and the end date to one week after that. You can change these dates, of course. The general best practices include the following: Keeping your sprints short, usually 1 or 2 weeks long. Keeping the length of your sprints consistent; this way, you will be able to accurately predict your team's velocity: Once you have started your sprint, you will be taken to the active sprints mode for the board. Note that for you to start a sprint, you have to take following things into consideration: There must be no sprint already active. You can only have one active sprint per board at any time. You must have the Administer Projects permission for all projects included in the board. Working on a sprint You will enter the active sprint mode once you have started a sprint; all the issues that are part of the sprint will be displayed. In the active sprint mode, the board will be divided into two major sections. The left section will contain all the issues in the current sprint. You will notice that it is divided into several columns. These columns represent the various states or statuses that an issue can be in, and they should reflect your team's workflow. By default, there are three columns: To Do: The issue is waiting to start In Progress: The issue is currently being worked on Done: The issue has been completed If you are using epics to organize your issues, this section will also be divided into several horizontal swimlanes. Swimlanes help you group similar issues together on the board. Swimlanes group issues by criteria, such as assignee, story, or epic. By default, swimlanes are grouped by stories, so subtasks for the same story will all be placed in one swimlane. So, you can see that columns group issues by statuses, while swimlanes group issues by similarity. As shown in the following screenshot, we have three columns and two swimlanes: The section on the right-hand side displays the currently selected issue's details, such as its summary and description, comments, and attachments. In a typical scenario, at the start of a sprint, all the issues will be in the left-most To Do column. During the daily Scrum meetings, team members will review the current status of the board and decide what to focus on for the day. For example, each member of the team may take on an issue and move it to the In Progress column by simply dragging and dropping the issue cards into the column. Once they have finished working on the issues, they can drag them into the Done column. The team will continue this cycle throughout the sprint until all the issues are completed: During the sprint, the Scrum master as well as the product owner will need to make sure not to interrupt the team unless it is urgent. The Scrum master should also assist with removing impediments that are preventing team members from completing their assigned tasks. The product owner should also ensure that no additional stories are added to the sprint, and any new feature requests are added to the backlog for future sprints instead. JIRA Agile will alert you if you try to add a new issue to the currently active sprint. Completing a sprint On the day the sprint ends, you will need to complete the sprint by performing the following steps: Go to your Scrum board and click on Active sprints. Click on the Complete Sprint link. This will bring up the Complete Sprint dialog, summarizing the current status of the sprint. As shown in the following screenshot, we have a total of six issues in this sprint. Three issues are completed and three are not: Click on the Complete button to complete the sprint. When you complete a sprint, any unfinished issues will be automatically moved back to the top of the backlog. Sometimes, it might be tempting to extend your sprint if you only have one or two issues outstanding, but you should not do this. Remember that the goal here is not to make your estimates appear accurate by extending sprints or to force your team to work harder in order to complete everything. You want to get to a point where the team is consistently completing the same amount of work in each sprint. If you have leftovers from a sprint, it means that your team's velocity should be lowered. Therefore, for the next sprint, you should plan to include less work. Reporting a sprint's progress As your team busily works through the issues in the sprint, you need to have a way to track the progress. JIRA Agile provides a number of useful reports via the Report mode. You can access the Report mode anytime during the sprint. These reports are also very useful during sprint retrospective meetings, as they provide detailed insights on how the sprint progressed. The sprint report The sprint report gives you a quick snapshot of how the sprint is tracking. It includes a burndown chart and a summary table that lists all the issues in the sprint and their statuses, as shown here: As shown in the preceding sprint report, we have completed four issues in the sprint. One issue was not completed and was placed back in the backlog. The burndown chart The burndown chart shows you a graphical representation of the estimated or ideal work left versus actual progress. The gray line acts as a guideline of the projected progress of the project, and the red line is the actual progress. In an ideal world, both lines should be as close to each other as possible, as the sprint progresses each day: The velocity chart The velocity chart shows you the amount of work originally committed to the sprint (the gray bar) versus the actual amount of work completed (the green bar), based on how you decide to estimate, such as in the case of story points. The chart will include past sprints, so you can get an idea of the trend and be able to predict the team's velocity. As shown in the following screenshot, from sprint 1 to 3, we have over-committed the amount of work, and for sprint 4, we have completed all our committed work. So, one way to work out your team's velocity is to calculate the average based on the Completed column, and this should give you an indication of your team's true velocity. Of course, this requires: That your sprints stay consistent in duration That your team members stay consistent That your estimation stays consistent As your team starts using Scrum, you can expect to see improvements to the team's velocity, as you continuously refine your process. Over time, you will get to a point where the team's velocity becomes consistent and can be used as a reliable indicator for work estimation. This will allow you to avoid over and under committing on work delivery, as shown in the following velocity chart: Summary In this article, we looked at how to use JIRA Agile for Scrum. We looked at the Scrum board and how you can use it to organize your issue backlog, plan and run your sprint, and review and track its progress with reports and charts. Remember that the keys for a successfully running sprint are consistency, review, and continuous improvement. It is fine if you find your estimates are incorrect, especially for the first few sprints; just make sure that you review, adjust, and improve. Resources for Article: Further resources on this subject: Advanced JIRA 5.2 Features [article] JIRA – an Overview [article] Gadgets in JIRA [article]
Read more
  • 0
  • 0
  • 3152

article-image-using-rest-api-unity-part-2-extracting-meaningful-json-api
Travis and
01 Jul 2015
5 min read
Save for later

Using a REST API with Unity, Part 2

Travis and
01 Jul 2015
5 min read
One of the main challenges we encountered when first trying to use JSON with C#, was it wasn't the simple world of JavaScript we had grown accustomed to. For anyone that is unsure of what JSON is, it is Javascript Object Notation, a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. For anyone not familiar with JSON, hopefully you have some background in XML. It's for a similar purpose, but JSON uses Javascript Notation, rather than a Markup Language as XML does. This article is not a point of discussion for each of these data types, instead, we will be focusing primarily with JSON, as that is the general standard for REST API's. Extracting JSON As I mentioned in Part 1 of this series, the main tool we use to extract JSON is a library for C# called SimpleJSON. Before we can start extracting data, lets create a fake JSON object, returned from a REST endpoint. { employees: [ { name: "Jim" age: "41", salary: "60000" }, { name: "Sally" age: "33", salary: "60000" } ] } Perfect. This data is just some arbitrary data for our examples. Assuming we queried and collected this data into a "result" variable, let's parse this data with simpleJSON. We think the best course of action is to show the code as a whole, and then discuss what is going on in each step. Some of this code could be trimmed a bit shorter, but we're going to write it out in longer form to help demonstrate what is going on. It's not long code anyway. import SimpleJSON ... private string result; private List<Employee> employees; ... var jsonData = JSON.parse(result); foreach(JSONClass employee in jsonData["employees"].AsArray) { string name = employee.AsObject["name"].Value; string age = employee.AsObject["age"].Value; string salary = employee.AsObject["salary"].Value; employees.Add(new Employee(name, age, salary)); } Now let's step through what we've done here to demonstrate what each piece of code is doing. First, we must import the SimpleJSON library. To get this package, see this link to download. You can import the package to Unity using the file menu, Assets > Import Package > Custom Package. Once you have imported this package, we need to include: import SimpleJSON; This belongs at the stop of our script. Assuming we have completed the GET request earlier, and now have the data in a variable called result, we can move to the next step. var jsonData = JSON.parse(result); As we talked about earlier, JSON is an object made up of Javascript Object Notation. If you come from a background in JavaScript, these sorts of objects are just part of your daily norm. However, objects like these don't exist in C#. (They of course do, but are not written like this, and appear more abstract). So we know these sort of objects are not native to C#, or most languages for that matter, so how do we import the data? Fear not, as JSON's are imported from REST endpoints as strings. This allows each system to import it as they like, and come up with their own solutions to read these files. In our case, SimpleJSON will take the imported string, and make a JSONClass object out of the string. That is what resides in jsonData. Navigating a JSON with Simple JSON Now that we have the JSON parsed, our next step was moving one step inside the returned JSON, and extracting all the employees. The "employees" value is an array of employees. Knowing that this data is an array, we can use this in a foreach loop, extracting each employee as we pass by using a cast. Lets look at the loop first. foreach(JSONClass employee in jsonData["employees"].AsArray) { ... } So we extract each employee from the employees array. Now, the employee is a JSONClass, but we have not told the system it's an object, so we need to do so when we start digging deeper in the json, like so. string name = employee.AsObject["name"].Value; string age = employee.AsObject["age"].Value; string salary = employee.AsObject["salary"].Value; Once we are inside the foreach loop, we will take the JSONClass employee, cast it correctly to an object, and take the string we need in it. The trick is, SimpleJSON still doesn't know what type of object is on the other end, so we need to tell it that we want the value from this return. Since we know the structure of the JSON we can construct our code to handle the JSON. Frequently you will find yourself iterating through a list of data, creating objects out of that piece of data. To handle that, we recommend you create an object and add it to a list. It's a simple way to store the data. employees.Add(new Employee(name, age, salary)); Conclusion We hope this walkthrough of Simple JSON gave you an idea on how to use this library. It's a very simple tool to use. The only frustrating part is working with the AsObject and AsArray methods, as you can sometimes easily mistake which instance you need at a certain time.
Read more
  • 0
  • 0
  • 3837

article-image-how-to-build-12-factor-design-microservices-on-docker-part-2
Cody A.
29 Jun 2015
14 min read
Save for later

How to Build 12 Factor Microservices on Docker - Part 2

Cody A.
29 Jun 2015
14 min read
Welcome back to our how-to on Building and Running 12 Factor Microservices on Docker. In Part 1, we introduced a very simple python flask application which displayed a list of users from a relational database. Then we walked through the first four of these factors, reworking the example application to follow these guidelines. In Part 2, we'll be introducing a multi-container Docker setup as the execution environment for our application. We’ll continue from where we left off with the next factor, number five. Build, Release, Run. A 12-factor app strictly separates the process for transforming a codebase into a deploy into distinct build, release, and run stages. The build stage creates an executable bundle from a code repo, including vendoring dependencies and compiling binaries and asset packages. The release stage combines the executable bundle created in the build with the deploy’s current config. Releases are immutable and form an append-only ledger; consequently, each release must have a unique release ID. The run stage runs the app in the execution environment by launching the app’s processes against the release. This is where your operations meet your development and where a PaaS can really shine. For now, we’re assuming that we’ll be using a Docker-based containerized deploy strategy. We’ll start by writing a simple Dockerfile. The Dockerfile starts with an ubuntu base image and then I add myself as the maintainer of this app. FROM ubuntu:14.04.2 MAINTAINER codyaray Before installing anything, let’s make sure that apt has the latest versions of all the packages. RUN echo "deb http://archive.ubuntu.com/ubuntu/ $(lsb_release -sc) main universe" >> /etc/apt/sources.list RUN apt-get update Install some basic tools and the requirements for running a python webapp RUN apt-get install -y tar curl wget dialog net-tools build-essential RUN apt-get install -y python python-dev python-distribute python-pip RUN apt-get install -y libmysqlclient-dev Copy over the application to the container. ADD /. /src Install the dependencies. RUN pip install -r /src/requirements.txt Finally, set the current working directory, expose the port, and set the default command. EXPOSE 5000 WORKDIR /src CMD python app.py Now, the build phase consists of building a docker image. You can build and store locally with docker build -t codyaray/12factor:0.1.0 . If you look at your local repository, you should see the new image present. $ docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE codyaray/12factor 0.1.0 bfb61d2bbb17 1 hour ago 454.8 MB The release phase really depends on details of the execution environment. You’ll notice that none of the configuration is stored in the image produced from the build stage; however, we need a way to build a versioned release with the full configuration as well. Ideally, the execution environment would be responsible for creating releases from the source code and configuration specific to that environment. However, if we’re working from first principles with Docker rather than a full-featured PaaS, one possibility is to build a new docker image using the one we just built as a base. Each environment would have its own set of configuration parameters and thus its own Dockerfile. It could be something as simple as FROM codyaray/12factor:0.1.0 MAINTAINER codyaray ENV DATABASE_URL mysql://sa:mypwd@mydbinstance.abcdefghijkl.us-west-2.rds.amazonaws.com/mydb This is simple enough to be programmatically generated given the environment-specific configuration and the new container version to be deployed. For the demonstration purposes, though, we’ll call the above file Dockerfile-release so it doesn’t conflict with the main application’s Dockerfile. Then we can build it with docker build -f Dockerfile-release -t codyaray/12factor-release:0.1.0.0 . The resulting built image could be stored in the environment’s registry as codyaray/12factor-release:0.1.0.0. The images in this registry would serve as the immutable ledger of releases. Notice that the version has been extended to include a fourth level which, in this instance, could represent configuration version “0” applied to source version “0.1.0”. The key here is that these configuration parameters aren’t collated into named groups (sometimes called “environments”). For example, these aren’t static files named like Dockerfile.staging or Dockerfile.dev in a centralized repo. Rather, the set of parameters is distributed so that each environment maintains its own environment mapping in some fashion. The deployment system would be setup such that a new release to the environment automatically applies the environment variables it has stored to create a new Docker image. As always, the final deploy stage depends on whether you’re using a cluster manager, scheduler, etc. If you’re using standalone Docker, then it would boil down to docker run -P -t codyaray/12factor-release:0.1.0.0 Processes. A 12-factor app is executed as one or more stateless processes which share nothing and are horizontally partitionable. All data which needs to be stored must use a stateful backing service, usually a database. This means no sticky sessions and no in-memory or local disk-based caches. These processes should never daemonize or write their own PID files; rather, they should rely on the execution environment’s process manager (such as Upstart). This factor must be considered up-front, in line with the discussions on antifragility, horizontal scaling, and overall application design. As the example app delegates all stateful persistence to a database, we’ve already succeeded on this point. However, it is good to note that a number of issues have been found using the standard ubuntu base image for Docker, one of which is its process management (or lack thereof). If you would like to use a process manager to automatically restart crashed daemons, or to notify a service registry or operations team, check out baseimage-docker. This image adds runit for process supervision and management, amongst other improvements to base ubuntu for use in Docker such as obsoleting the need for pid files. To use this new image, we have to update the Dockerfile to set the new base image and use its init system instead of running our application as the root process in the container. FROM phusion/baseimage:0.9.16 MAINTAINER codyaray RUN echo "deb http://archive.ubuntu.com/ubuntu/ $(lsb_release -sc) main universe" >> /etc/apt/sources.list RUN apt-get update RUN apt-get install -y tar git curl nano wget dialog net-tools build-essential RUN apt-get install -y python python-dev python-distribute python-pip RUN apt-get install -y libmysqlclient-dev ADD /. /src RUN pip install -r /src/requirements.txt EXPOSE 5000 WORKDIR /src RUN mkdir /etc/service/12factor ADD 12factor.sh /etc/service/12factor/run # Use baseimage-docker's init system. CMD ["/sbin/my_init"]  Notice the file 12factor.sh that we’re now adding to /etc/service. This is how we instruct runit to run our application as a service. Let’s add the new 12factor.sh file. #!/bin/sh python /src/app.py Now the new containers we deploy will attempt to be a little more fault-tolerant by using an OS-level process manager. Port Binding. A 12-factor app must be self-contained and bind to a port specified as an environment variable. It can’t rely on the injection of a web container such as tomcat or unicorn; instead it must embed a server such as jetty or thin. The execution environment is responsible for routing requests from a public-facing hostname to the port-bound web process. This is trivial with most embedded web servers. If you’re currently using an external web server, this may require more effort to support an embedded server within your application. For the example python app (which uses the built-in flask web server), it boils down to port = int(os.environ.get("PORT", 5000)) app.run(host='0.0.0.0', port=port) Now the execution environment is free to instruct the application to listen on whatever port is available. This obviates the need for the application to tell the environment what ports must be exposed, as we’ve been required to do with Docker. Concurrency. Because a 12-factor exclusively uses stateless processes, it can scale out by adding processes. A 12-factor app can have multiple process types, such as web processes, background worker processes, or clock processes (for cron-like scheduled jobs). As each process type is scaled independently, each logical process would become its own Docker container as well. We’ve already seen building a web process; other processes are very similar. In most cases, scaling out simply means launching more instances of the container. (Its usually not desirable to scale out the clock processes, though, as they often generate events that you want to be scheduled singletons within your infrastructure.) Disposability. A 12-factor app’s processes can be started or stopped (with a SIGTERM) anytime. Thus, minimizing startup time and gracefully shutting down is very important. For example, when a web service receives a SIGTERM, it should stop listening on the HTTP port, allow in-flight requests to finish, and then exit. Similar, processes should be robust against sudden death; for example, worker processes should use a robust queuing backend. You want to ensure the web server you select can gracefully shutdown. The is one of the trickier parts of selecting a web server, at least for many of the common python http servers that I’ve tried.  In theory, shutting down based on receiving a SIGTERM should be as simple as follows. import signal signal.signal(signal.SIGTERM, lambda *args: server.stop(timeout=60)) But often times, you’ll find that this will immediately kill the in-flight requests as well as closing the listening socket. You’ll want to test this thoroughly if dependable graceful shutdown is critical to your application. Dev/Prod Parity. A 12-factor app is designed to keep the gap between development and production small. Continuous deployment shrinks the amount of time that code lives in development but not production. A self-serve platform allows developers to deploy their own code in production, just like they do in their local development environments. Using the same backing services (databases, caches, queues, etc) in development as production reduces the number of subtle bugs that arise in inconsistencies between technologies or integrations. As we’re deploying this solution using fully Dockerized containers and third-party backing services, we’ve effectively achieved dev/prod parity. For local development, I use boot2docker on my Mac which provides a Docker-compatible VM to host my containers. Using boot2docker, you can start the VM and setup all the env variables automatically with boot2docker up $(boot2docker shellinit) Once you’ve initialized this VM and set the DOCKER_HOST variable to its IP address with shellinit, the docker commands given above work exactly the same for development as they do for production. Logs. Consider logs as a stream of time-ordered events collected from all running processes and backing services. A 12-factor app doesn’t concern itself with how its output is handled. Instead, it just writes its output to its `stdout` stream. The execution environment is responsible for collecting, collating, and routing this output to its final destination(s). Most logging frameworks either support logging to stderr/stdout by default or easily switching from file-based logging to one of these streams. In a 12-factor app, the execution environment is expected to capture these streams and handle them however the platform dictates. Because our app doesn’t have specific logging yet, and the only logs are from flask and already to stderr, we don’t have any application changes to make.  However, we can show how an execution environment which could be used handle the logs. We’ll setup a Docker container which collects the logs from all the other docker containers on the same host. Ideally, this would then forward the logs to a centralized service such as Elasticsearch. Here we’ll demo using Fluentd to capture and collect the logs inside the log collection container; a simple configuration change would allow us to switch from writing these logs to disk as we demo here and instead send them from Fluentd to a local Elasticsearch cluster. We’ll create a Dockerfile for our new logcollector container type. For more detail, you can find a Docker fluent tutorial here. We can call this file Dockerfile-logcollector. FROM kiyoto/fluentd:0.10.56-2.1.1 MAINTAINER kiyoto@treasure-data.com RUN mkdir /etc/fluent ADD fluent.conf /etc/fluent/ CMD "/usr/local/bin/fluentd -c /etc/fluent/fluent.conf" We use an existing fluentd base image with a specific fluentd configuration. Notably this tails all the log files in /var/lib/docker/containers/<container-id>/<container-id>-json.log, adds the container ID to the log message, and then writes to JSON-formatted files inside /var/log/docker. <source> type tail path /var/lib/docker/containers/*/*-json.log pos_file /var/log/fluentd-docker.pos time_format %Y-%m-%dT%H:%M:%S tag docker.* format json </source> <match docker.var.lib.docker.containers.*.*.log> type record_reformer container_id ${tag_parts[5]} tag docker.all </match> <match docker.all> type file path /var/log/docker/*.log format json include_time_key true </match> As usual, we create a Docker image. Don’t forget to specify the logcollector Dockerfile. docker build -f Dockerfile-logcollector -t codyaray/docker-fluentd . We’ll need to mount two directories from the Docker host into this container when we launch it. Specifically, we’ll mount the directory containing the logs from all the other containers as well as the directory to which we’ll be writing the consolidated JSON logs. docker run -d -v /var/lib/docker/containers:/var/lib/docker/containers -v /var/log/docker:/var/log/docker codyaray/docker-fluentd Now if you check in the /var/log/docker directory, you’ll see the collated JSON log files. Note that this is on the docker host rather than in any container; if you’re using boot2docker, you can ssh into the docker host with boot2docker ssh and then check /var/log/docker. Admin Processes. Any admin or management tasks for a 12-factor app should be run as one-off processes within a deploy’s execution environment. This process runs against a release using the same codebase and configs as any process in that release and uses the same dependency isolation techniques as the long-running processes. This is really a feature of your app's execution environment. If you’re running a Docker-like containerized solution, this may be pretty trivial. docker run -i -t --entrypoint /bin/bash codyaray/12factor-release:0.1.0.0 The -i flag instructs docker to provide interactive session, that is, to keep the input and output ttys attached. Then we instruct docker to run the /bin/bash command instead of another 12factor app instance. This creates a new container based on the same docker image, which means we have access to all the code and configs for this release. This will drop us into a bash terminal to do whatever we want. But let’s say we want to add a new “friends” table to our database, so we wrote a migration script add_friends_table.py. We could run it as follows: docker run -i -t --entrypoint python codyaray/12factor-release:0.1.0.0 /src/add_friends_table.py As you can see, following the few simple rules specified in the 12 Factor manifesto really allows your execution environment to manage and scale your application. While this may not be the most feature-rich integration within a PaaS, it is certainly very portable with a clean separation of responsibilities between your app and its environment. Much of the tools and integration demonstrated here were a do-it-yourself container approach to the environment, which would be subsumed by an external vertically integrated PaaS such as Deis. If you’re not familiar with Deis, its one of several competitors in the open source platform-as-a-service space which allows you to run your own PaaS on a public or private cloud. Like many, Deis is inspired by Heroku. So instead of Dockerfiles, Deis uses a buildpack to transform a code repository into an executable image and a Procfile to specify an app’s processes. Finally, by default you can use a specialized git receiver to complete a deploy. Instead of having to manage separate build, release, and deploy stages yourself like we described above, deploying an app to Deis could be a simple as git push deis-prod While it can’t get much easier than this, you’re certainly trading control for simplicity. It's up to you to determine which works best for your business. Find more Docker tutorials alongside our latest releases on our dedicated Docker page. About the Author Cody A. Ray is an inquisitive, tech-savvy, entrepreneurially-spirited dude. Currently, he is a software engineer at Signal, an amazing startup in downtown Chicago, where he gets to work with a dream team that’s changing the service model underlying the Internet.
Read more
  • 0
  • 1
  • 29008

article-image-how-to-build-12-factor-design-microservices-on-docker-part-1
Cody A.
26 Jun 2015
9 min read
Save for later

How to Build 12 Factor Microservices on Docker - Part 1

Cody A.
26 Jun 2015
9 min read
As companies continue to reap benefits of the cloud beyond cost savings, DevOps teams are gradually transforming their infrastructure into a self-serve platform. Critical to this effort is designing applications to be cloud-native and antifragile. In this post series, we will examine the 12 factor methodology for application design, how this design approach interfaces with some of the more popular Platform-as-a-Service (PaaS) providers, and demonstrate how to run such microservices on the Deis PaaS. What began as Service Oriented Architectures in the data center are realizing their full potential as microservices in the cloud, led by innovators such as Netflix and Heroku. Netflix was arguably the first to design their applications to not only be resilient but to be antifragile; that is, by intentionally introducing chaos into their systems, their applications become more stable, scalable, and graceful in the presence of errors. Similarly, by helping thousands of clients building cloud applications, Heroku recognized a set of common patterns emerging and set forth the 12 factor methodology. ANTIFRAGILITY You may have never heard of antifragility. This concept was introduced by Nassim Taleb, the author of Fooled by Randomness and The Black Swan. Essentially, antifragility is what gains from volatility and uncertainty (up to a point). Think of the MySQL server that everyone is afraid to touch lest it crash vs the Cassandra ring which can handle the loss of multiple servers without a problem. In terms more familiar to the tech crowd, a “pet” is fragile while “cattle” are antifragile (or at least robust, that is, they neither gain nor lose from volatility). Adrian Cockroft seems to have discovered this concept with his team at Netflix. During their transition from a data center to Amazon Web Services, they claimed that “the best way to avoid failure is to fail constantly.” (http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html) To facilitate this process, one of the first tools Netflix built was Chaos Monkey, the now-infamous tool which kills your Amazon instances to see if and how well your application responds. By constantly injecting failure, their engineers were forced to design their applications to be more fault tolerant, to degrade gracefully, and to be better distributed so as to avoid any Single Points Of Failure (SPOF). As a result, Netflix has a whole suite of tools which form the Netflix PaaS. Many of these have been released as part of the Netflix OSS ecosystem. 12 FACTOR APPS Because many companies want to avoid relying too heavily on tools from any single third-party, it may be more beneficial to look at the concepts underlying such a cloud-native design. This will also help you evaluate and compare multiple options for solving the core issues at hand. Heroku, being a platform on which thousands or millions of applications are deployed, have had to isolate the core design patterns for applications which operate in the cloud and provide an environment which makes such applications easy to build and maintain. These are described as a manifesto entitled the 12-Factor App. The first part of this post walks through the first five factors and reworks a simple python webapp with them in mind. Part 2 continues with the remaining seven factors, demonstrating how this design allows easier integration with cloud-native containerization technologies like Docker and Deis. Let’s say we’re starting with a minimal python application which simply provides a way to view some content from a relational database. We’ll start with a single-file application, app.py. from flask import Flask import mysql.connector as db import json app = Flask(__name__) def execute(query): con = None try: con = db.connect(host='localhost', user='testdb', password='t123', database='testdb') cur = con.cursor() cur.execute(query) return cur.fetchall() except db.Error, e: print "Error %d: %s" % (e.args[0], e.args[1]) return None finally: if con: con.close() def list_users(): users = execute("SELECT id, username, email FROM users") or [] return [{"id": user_id, "username": username, "email": email} for (user_id, username, email) in users] @app.route("/users") def users_index(): return json.dumps(list_users()) if __name__ == "__main__": app.run(host='0.0.0.0', port=5000, debug=True) We can assume you have a simple mysql database setup already. CREATE DATABASE testdb; CREATE TABLE users ( id INT NOT NULL AUTO_INCREMENT, username VARCHAR(80) NOT NULL, email VARCHAR(120) NOT NULL, PRIMARY KEY (id), UNIQUE INDEX (username), UNIQUE INDEX (email) ); INSERT INTO users VALUES (1, "admin", "admin@example.com"); INSERT INTO users VALUES (2, "guest", "guest@example.com"); As you can see, the application is currently implemented as about the most naive approach possible and contained within this single file. We’ll now walk step-by-step through the 12 Factors and apply them to this simple application. THE 12 FACTORS: STEP BY STEP Codebase. A 12-factor app is always tracked in a version control system, such as Git, Mercurial, or Subversion. If there are multiple codebases, its a distributed system in which each component may be a 12-factor app. There are many deploys, or running instances, of each application, including production, staging, and developers' local environments. Since many people are familiar with git today, let’s choose that as our version control system. We can initialize a git repo for our new project. First ensure we’re in the app directory which, at this point, only contains the single app.py file. cd 12factor git init . After adding the single app.py file, we can commit to the repo. git add app.py git commit -m "Initial commit" Dependencies. All dependencies must be explicitly declared and isolated. A 12-factor app never depends on packages to be installed system-wide and uses a dependency isolation tool during execution to stop any system-wide packages from “leaking in.” Good examples are Gem Bundler for Ruby (Gemfile provides declaration and `bundle exec` provides isolation) and Pip/requirements.txt and Virtualenv for Python (where pip/requirements.txt provides declaration and `virtualenv --no-site-packages` provides isolation). We can create and use (source) a virtualenv environment which explicitly isolates the local app’s environment from the global “site-packages” installations. virtualenv env --no-site-packages source env/bin/activate A quick glance at the code we’ll show that we’re only using two dependencies currently, flask and mysql-connector-python, so we’ll add them to the requirements file. echo flask==0.10.1 >> requirements.txt echo mysql-python==1.2.5 >> requirements.txt Let’s use the requirements file to install all the dependencies into our isolated virtualenv. pip install -r requirements.txt Config. An app’s config must be stored in environment variables. This config is what may vary between deploys in developer environments, staging, and production. The most common example is the database credentials or resource handle. We currently have the host, user, password, and database name hardcoded. Hopefully you’ve at least already extracted this to a configuration file; either way, we’ll be moving them to environment variables instead. import os DATABASE_CREDENTIALS = { 'host': os.environ['DATABASE_HOST'], 'user': os.environ['DATABASE_USER'], 'password': os.environ['DATABASE_PASSWORD'], 'database': os.environ['DATABASE_NAME'] } Don’t forget to update the actual connection to use the new credentials object: con = db.connect(**DATABASE_CREDENTIALS) Backing Services. A 12-factor app must make no distinction between a service running locally or as a third-party. For example, a deploy should be able to swap out a local MySQL database with a third-party replacement such as Amazon RDS without any code changes, just by updating a URL or other handle/credentials inside the config. Using a database abstraction layer such as SQLAlchemy (or your own adapter) lets you treat many backing services similarly so that you can switch between them with a single configuration parameter. In this case, it has the added advantage of serving as an Object Relational Mapper to better encapsulate our database access logic. We can replace the hand-rolled execute function and SELECT query with a model object from flask.ext.sqlalchemy import SQLAlchemy app = Flask(__name__) app.config['SQLALCHEMY_DATABASE_URI'] = os.environ['DATABASE_URL'] db = SQLAlchemy(app) class User(db.Model): __tablename__ = 'users' id = db.Column(db.Integer, primary_key=True) username = db.Column(db.String(80), unique=True) email = db.Column(db.String(120), unique=True) def __init__(self, username, email): self.username = username self.email = email def __repr__(self): return '<User %r>' % self.username @app.route("/users") def users_index(): to_json = lambda user: {"id": user.id, "name": user.username, "email": user.email} return json.dumps([to_json(user) for user in User.query.all()]) Now we set the DATABASE_URL environment property to something like export DATABASE_URL=mysql://testdb:t123@localhost/testdb But its should be easy to switch to Postgres or Amazon RDS (still backed by MySQL). DATABASE_URL=postgresql://testdb:t123@localhost/testdb We’ll continue this demo using a MySQL cluster provided by Amazon RDS. DATABASE_URL=mysql://sa:mypwd@mydbinstance.abcdefghijkl.us-west-2.rds.amazonaws.com/mydb As you can see, this makes attaching and detaching from different backing services trivial from a code perspective, allowing you to focus on more challenging issues. This is important during the early stages of code because it allows you to performance test multiple databases and third-party providers against one another, and in general keeps with the notion of avoiding vendor lock-in. In Part 2, we'll continue reworking this application so that it fully conforms to the 12 Factors. The remaining eight factors concern the overall application design and how it interacts with the execution environment in which its operated. We’ll assume that we’re operating the app in a multi-container Docker environment. This container-up approach provides the most flexibility and control over your execution environment. We’ll then conclude the article by deploying our application to Deis, a vertically integrated Docker-based PaaS, to demonstrate the tradeoff of configuration vs convention in selecting your own PaaS. About the Author Cody A. Ray is an inquisitive, tech-savvy, entrepreneurially-spirited dude. Currently, he is a software engineer at Signal, an amazing startup in downtown Chicago, where he gets to work with a dream team that’s changing the service model underlying the Internet.
Read more
  • 0
  • 0
  • 34595
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-introduction-ggplot2-and-plotting-environments-r
Packt
25 Jun 2015
15 min read
Save for later

Introduction to ggplot2 and the plotting environments in R

Packt
25 Jun 2015
15 min read
In this article by Donato Teutonico, author of the book ggplot2 Essentials, we are going to explore different plotting environments in R and subsequently learn about the package, ggplot2. R provides a complete series of options available for realizing graphics, which make this software quite advanced concerning data visualization. The core of the graphics visualization in R is within the package grDevices, which provides the basic structure of data plotting, as for instance the colors and fonts used in the plots. Such graphic engine was then used as starting point in the development of more advanced and sophisticated packages for data visualization; the most commonly used being graphics and grid. (For more resources related to this topic, see here.) The graphics package is often referred to as the base or traditional graphics environment, since historically it was already available among the default packages delivered with the base installation of R and it provides functions that allow to the generation of complete plots. The grid package developed by Paul Murrell, on the other side, provides an alternative set of graphics tools. This package does not provide directly functions that generate complete plots, so it is not frequently used directly for generating graphics, but it was used in the development of advanced data visualization packages. Among the grid-based packages, the most widely used are lattice and ggplot2, although they are built implementing different visualization approaches. In fact lattice was build implementing the Trellis plots, while ggplot2 was build implementing the grammar of graphics. A diagram representing the connections between the tools just mentioned is represented in the Figure 1. Figure 1: Overview of the most widely used R packages for graphics Just keep in mind that this is not a complete overview of the packages available, but simply a small snapshot on the main packages used for data visualization in R, since many other packages are built on top of the tools just mentioned. If you would like to get a more complete overview of the graphics tools available in R, you may have a look at the web page of the R project summarizing such tools, http://cran.r-project.org/web/views/Graphics.html. ggplot2 and the Grammar of Graphics The package ggplot2 was developed by Hadley Wickham by implementing a completely different approach to statistical plots. As in the case of lattice, this package is also based on grid, providing a series of high-level functions which allow the creation of complete plots. The ggplot2 package provides an interpretation and extension of the principles of the book The Grammar of Graphics by Leland Wilkinson. Briefly, the Grammar of Graphics assumes that a statistical graphic is a mapping of data to aesthetic attributes and geometric objects used to represent the data, like points, lines, bars, and so on. Together with the aesthetic attributes, the plot can also contain statistical transformation or grouping of the data. As in lattice, also in ggplot2 we have the possibility to split data by a certain variable obtaining a representation of each subset of data in an independent sub-plot; such representation in ggplot2 is called faceting. In a more formal way, the main components of the grammar of graphics are: the data and their mapping, the aesthetic, the geometric objects, the statistical transformations, scales, coordinates and faceting. A more detailed description of these elements is provided along the book ggplot2 Essentials, but this is a summary of the general principles The data that must be visualized are mapped to aesthetic attributes which define how the data should be perceived The geometric objects describe what is actually represented on the plot like lines, points, or bars; the geometric objects basically define which kind of plot you are going to draw The statistical transformations are transformations which are applied to the data to group them; an example of statistical transformations would be, for instance, the smooth line or the regression lines of the previous examples or the binning of the histograms. Scales represent the connection between the aesthetic spaces with the actual values which should be represented. Scales maybe also be used to draw legends The coordinates represent the coordinate system in which the data are drawn The faceting, which we have already mentioned, is a grouping of data in subsets defined by a value of one variable In ggplot2 there are two main high-level functions, capable of creating directly creating a plot, qplot() and ggplot(); qplot() stands for quick plot and it is a simple function with serve a similar purpose to the plot() function in graphics. The function ggplot() on the other side is a much more advanced function which allow the user to have a deep control of the plot layout and details. In this article we will see some examples of qplot() in order to provide you with a taste of the typical plots which can be realized with ggplot2, but for more advanced data visualization the function ggplot(), is much more flexible. If you have a look on the different forums of R programming, there is quite some discussion about which of these two functions would be more convenient to use. My general recommendation would be that it depends on the type of graph you are drawing more frequently. For simple and standard plot, where basically only the data should be represented and some minor modification of standard layout, the qplot() function will do the job. On the other side, if you would need to apply particular transformations to the data or simply if you would like to keep the freedom of controlling and defining the different details of the plot layout, I would recommend to focus in learning the code of ggplot(). In the code below you will see an example of plot realized with ggplot2 where you can identify some of the components of the grammar of graphics. The example is realized with the function ggplot() which allow a more direct comparison with the grammar, but just below you may also find the corresponding code for the use of qplot(). Both codes generate the graph depicted on Figure 2. require(ggplot2) ## Load ggplot2 data(Orange) # Load the data   ggplot(data=Orange,    ## Data used aes(x=circumference,y=age, color=Tree))+  ##mapping to aesthetic geom_point()+      ##Add geometry (plot with data points) stat_smooth(method="lm",se=FALSE) ##Add statistics(linear regression)   ### Corresponding code with qplot() qplot(circumference,age,data=Orange, ## Data used color=Tree, ## Aestetic mapping geom=c("point","smooth"),method="lm",se=FALSE) This simple example can give you an idea of the role of each portion of code in a ggplot2 graph; you have seen how the main function body create the connection between the data and the aesthetic we are interested to represent and how, on top of this, you add the components of the plot like in this case the geometry element of points and the statistical element of regression. You can also notice how the components which need to be added to the main function call are included using the + sign. One more thing worth to mention at this point, is the if you run just the main body function in the ggplot() function, you will get an error message. This is because this call is not able to generate an actual plot. The step during which the plot is actually created is when you include the geometric attributes, in this case geom_point(). This is perfectly in line with the grammar of graphics, since as we have seen the geometry represent the actual connection between the data and what is represented on the plot. Is in fact at this stage that we specify we are interested in having points representing the data, before that nothing was specified yet about which plot we were interested in drawing. Figure 2: Example of plot of Orange dataset with ggplot2 The qplot() function The qplot (quick plot) function is a basic high level function of ggplot2. The general syntax that you should use with this function is the following qplot(x, y, data, colour, shape, size, facets, geom, stat) where x and y represent the variables to plot (y is optional with a default value NULL) data define the dataset containing the variables colour, shape and size are the aesthetic arguments that can be mapped on additional variables facets define the optional faceting of the plot based on one variable contained in the dataset geom allows you to select the actual visualization of the data, which basically will represent the plot which will be generated. Possible values are point, line or boxplot, but we will see several different examples in the next pages stat define the statistics to be used on the data These options represents the most important options available in qplot(). You may find a descriptions of the other function arguments in the help page of the function accessible with ?qplot, or on the ggplot2 website under the following link http://docs.ggplot2.org/0.9.3/qplot.html. Most of the options just discussed can be applied to different types of plots, since most of the concepts of the grammar of graphics, embedded in the code, may be translated from one plot to the other. For instance, you may use the argument colour to do an aesthetics mapping to one variable; these same concepts can in example be applied to scatterplots as well as histograms. Exactly the same principle would be applied to facets, which can be used for splitting plots independently on the type of plot considered. Histograms and density plots Histograms are plots used to explore how one (or more) quantitative variables are distributed. To show some examples of histograms we will use the iris data. This dataset contains measurements in centimetres of the variables sepal length and width and petal length and width for 50 flowers from each of three species of the flower iris: iris setosa, versicolor, and virginica. You may find more details running ?iris. The geometric attribute used to produce histograms is simply by specifying geom=”histogram” in the qplot() function. This default histogram will represent the variable specified on the x axis while the y axis will represent the number of elements in each bin. One other very useful way of representing distributions is to look at the kernel density function, which will basically produce a sort of continuous histogram instead of different bins by estimating the probability density function. For example let’s plot the petal length of all the three species of iris as histogram and density plot. data(iris)   ## Load data qplot(Petal.Length, data=iris, geom="histogram") ## Histogram qplot(Petal.Length, data=iris, geom="density")   ## Density plot The output of this code is showed in Figure 3. Figure 3: Histogram (left) and density plot (right) As you can see in both plots of Figure 3, it appears that the data are not distributed homogenously, but there are at least two distinct distribution clearly separated. This is very reasonably due to a different distribution for one of the iris species. To try to verify if the two distributions are indeed related to specie differences, we could generate the same plot using aesthetic attributes and have a different colour for each subtype of iris. To do this, we can simply map the fill to the Species column in the dataset; also in this case we can do that for the histogram and the density plot too. Below you may see the code we built, and in Figure 4 the resulting output. qplot(Petal.Length, data=iris, geom="histogram", colour=Species, fill=Species) qplot(Petal.Length, data=iris, geom="density", colour=Species, fill=Species) Figure 4: Histogram (left) and density plot (right) with aesthetic attribute for colour and fill In the distribution we can see that the lower data are coming from the Setosa species, while the two other distributions are partly overlapping. Scatterplots Scatterplots are probably the most common plot, since they are usually used to display the relationship between two quantitative variables. When two variables are provided, ggplot2 will make a scatterplot by default. For our example on how to build a scatterplot, we will use a dataset called ToothGrowth, which is available in the base R installation. In this dataset are reported measurements of teeth length of 10 guinea pig for three different doses of vitamin C (0.5, 1, and 2 mg) delivered in two different ways, as orange juice or as ascorbic acid (a compound having vitamin C activity). You can find, as usual, details on these data on the dataset help page at ?ToothGrowth. We are interested in seeing how the length of the teeth changed for the different doses. We are not able to distinguish among the different guinea pigs, since this information is not contained in the data, so for the moment we will plot just all the data we have. So let’s load the dataset and do a basic plot of the dose vs. length. require(ggplot2) data(ToothGrowth) qplot(dose, len, data=ToothGrowth, geom="point") ##Alternative coding qplot(dose, len, data=ToothGrowth) The resulting plot is reproduced in Figure 5. As you have seen, the default plot generated, also without a geom argument, is the scatter plot, which is the default bivariate plot type. In this plot we may have an idea of the tendency the data have, for instance we see that the teeth length increase by increasing the amount of vitamin C intake. On the other side, we know that there are two different subgroups in our data, since the vitamin C was provided in two different ways, as orange juice or as ascorbic acid, so it could be interesting to check if these two groups behave differently. Figure 5: Scatterplot of length vs. dose of ToothGrowth data The first approach could be to have the data in two different colours. To do that we simply need to assign the colour attribute to the column sup in the data, which defines the way of vitamin intake. The resulting plot is in Figure 6. qplot(dose, len,data=ToothGrowth, geom="point", col=supp) We now can distinguish from which intake route come each data in the plot and it looks like the data from orange juice shown are a little higher compared to ascorbic acid, but to differentiate between them it is not really easy. We could then try with the facets, so that the data will be completely separated in two different sub-plots. So let´s see what happens. Figure 6: Scatterplot of length vs. dose of ToothGrowth with data in different colours depending on vitamin intake. qplot(dose, len,data=ToothGrowth, geom="point", facets=.~supp) In this new plot, showed in Figure 7, we definitely have a better picture of the data, since we can see how the tooth growth differs for the different intakes. As you have seen, in this simple example, you will find that the best visualization may be different depending on the data you have. In some cases grouping a variable with colours or dividing the data with faceting may give you a different idea about the data and their tendency. For instance in our case with the plot in Figure 7 we can see that the way how the tooth growth increase with dose seems to be different for the different intake routes. Figure 7: Scatterplot of length vs. dose of ToothGrowth with faceting One approach to see the general tendency of the data could be to include a smooth line to the graph. In this case in fact we can see that the growth in the case of the orange juice does not looks really linear, so a smooth line could be a nice way to catch this. In order to do that we simply add a smooth curve to the vector of geometry components in the qplot() function. qplot(dose, len,data=ToothGrowth, geom=c("point","smooth"), facets=.~supp) As you can see from the plot obtained (Figure 8) we now see not only clearly the different data thanks to the faceting, but we can also see the tendency of the data with respect to the dose administered. As you have seen, requiring for the smooth line in ggplot2 will also include a confidence interval in the plot. If you would like to not to have the confidence interval you may simply add the argument se=FALSE. Figure 8: Scatterplot of length vs. dose of ToothGrowth with faceting and smooth line Summary In this short article we have seen some basic concept of ggplot2, ranging from the basic principles in comparison with the other R packages for graphics, up to some basic plots as for instance histograms, density plots or scatterplots. In this case we have limited our example to the use of qplot(), which enable us to obtain plots with some easy commands, but on the other side, in order to have a full control of plot appearance as well as data representation the function ggplot() will provide you with much more advanced functionalities. You can find a more detailed description of these functions as well as of the different features of ggplot2 together illustrated in various examples in the book ggplot2 Essentials. Resources for Article: Further resources on this subject: Data Analysis Using R [article] Data visualization [article] Using R for Statistics, Research, and Graphics [article]
Read more
  • 0
  • 0
  • 10526

article-image-querying-and-filtering-data
Packt
25 Jun 2015
28 min read
Save for later

Querying and Filtering Data

Packt
25 Jun 2015
28 min read
In this article by Edwood Ng and Vineeth Mohan, authors of the book Lucene 4 Cookbook, we will cover the following recipes: Performing advanced filtering Creating a custom filter Searching with QueryParser TermQuery and TermRangeQuery BooleanQuery PrefixQuery and WildcardQuery PhraseQuery and MultiPhraseQuery FuzzyQuery (For more resources related to this topic, see here.) When it comes to search application, usability is always a key element that either makes or breaks user impression. Lucene does an excellent job of giving you the essential tools to build and search an index. In this article, we will look into some more advanced techniques to query and filter data. We will arm you with more knowledge to put into your toolbox so that you can leverage your Lucene knowledge to build a user-friendly search application. Performing advanced filtering Before we start, let us try to revisit these questions: what is a filter and what is it for? In simple terms, a filter is used to narrow the search space or, in another words, search within a search. Filter and Query may seem to provide the same functionality, but there is a significant difference between the two. Scores are calculated in querying to rank results, based on their relevancy to the search terms, while a filter has no effect on scores. It's not uncommon that users may prefer to navigate through a hierarchy of filters in order to land on the relevant results. You may often find yourselves in a situation where it is necessary to refine a result set so that users can continue to search or navigate within a subset. With the ability to apply filters, we can easily provide such search refinements. Another situation is data security where some parts of the data in the index are protected. You may need to include an additional filter behind the scene that's based on user access level so that users are restricted to only seeing items that they are permitted to access. In both of these contexts, Lucene's filtering features will provide the capability to achieve the objectives. Lucene has a few built-in filters that are designed to fit most of the real-world applications. If you do find yourself in a position where none of the built-in filters are suitable for the job, you can rest assured that Lucene's expansibility will allow you to build your own custom filters. Let us take a look at Lucene's built-in filters: TermRangeFilter: This is a filter that restricts results to a range of terms that are defined by lower bound and upper bound of a submitted range. This filter is best used on a single-valued field because on a tokenized field, any tokens within a range will return by this filter. This is for textual data only. NumericRangeFilter: Similar to TermRangeFilter, this filter restricts results to a range of numeric values. FieldCacheRangeFilter: This filter runs on top of the number of range filters, including TermRangeFilter and NumericRangeFilter. It caches filtered results using FieldCache for improved performance. FieldCache is stored in the memory, so performance boost can be upward of 100x faster than the normal range filter. Because it uses FieldCache, it's best to use this on a single-valued field only. This filter will not be applicable for multivalued field and when the available memory is limited, since it maintains FieldCache (in memory) on filtered results. QueryWrapperFilter: This filter acts as a wrapper around a Query object. This filter is useful when you have complex business rules that are already defined in a Query and would like to reuse for other business purposes. It constructs a Query to act like a filter so that it can be applied to other Queries. Because this is a filter, scoring results from the Query within is irrelevant. PrefixFilter: This filter restricts results that match what's defined in the prefix. This is similar to a substring match, but limited to matching results with a leading substring only. FieldCacheTermsFilter: This is a term filter that uses FieldCache to store the calculated results in memory. This filter works on a single-valued field only. One use of it is when you have a category field where results are usually shown by categories in different pages. The filter can be used as a demarcation by categories. FieldValueFilter: This filter returns a document containing one or more values on the specified field. This is useful as a preliminary filter to ensure that certain fields exist before querying. CachingWrapperFilter: This is a wrapper that adds a caching layer to a filter to boost performance. Note that this filter provides a general caching layer; it should be applied on a filter that produces a reasonably small result set, such as an exact match. Otherwise, larger results may unnecessarily drain the system's resources and can actually introduce performance issues. If none of the above filters fulfill your business requirements, you can build your own, extending the Filter class and implementing its abstract method getDocIdSet (AtomicReaderContext, Bits). How to do it... Let's set up our test case with the following code: Analyzer analyzer = new StandardAnalyzer(); Directory directory = new RAMDirectory(); IndexWriterConfig config = new   IndexWriterConfig(Version.LATEST, analyzer); IndexWriter indexWriter = new IndexWriter(directory, config); Document doc = new Document(); StringField stringField = new StringField("name", "",   Field.Store.YES); TextField textField = new TextField("content", "",   Field.Store.YES); IntField intField = new IntField("num", 0, Field.Store.YES); doc.removeField("name"); doc.removeField("content"); doc.removeField("num"); stringField.setStringValue("First"); textField.setStringValue("Humpty Dumpty sat on a wall,"); intField.setIntValue(100); doc.add(stringField); doc.add(textField); doc.add(intField); indexWriter.addDocument(doc); doc.removeField("name"); doc.removeField("content"); doc.removeField("num"); stringField.setStringValue("Second"); textField.setStringValue("Humpty Dumpty had a great fall."); intField.setIntValue(200); doc.add(stringField); doc.add(textField); doc.add(intField); indexWriter.addDocument(doc); doc.removeField("name"); doc.removeField("content"); doc.removeField("num"); stringField.setStringValue("Third"); textField.setStringValue("All the king's horses and all the king's men"); intField.setIntValue(300); doc.add(stringField); doc.add(textField); doc.add(intField); indexWriter.addDocument(doc); doc.removeField("name"); doc.removeField("content"); doc.removeField("num"); stringField.setStringValue("Fourth"); textField.setStringValue("Couldn't put Humpty together   again."); intField.setIntValue(400); doc.add(stringField); doc.add(textField); doc.add(intField); indexWriter.addDocument(doc); indexWriter.commit(); indexWriter.close(); IndexReader indexReader = DirectoryReader.open(directory); IndexSearcher indexSearcher = new IndexSearcher(indexReader); How it works… The preceding code adds four documents into an index. The four documents are: Document 1 Name: First Content: Humpty Dumpty sat on a wall, Num: 100 Document 2 Name: Second Content: Humpty Dumpty had a great fall. Num: 200 Document 3 Name: Third Content: All the king's horses and all the king's men Num: 300 Document 4 Name: Fourth Content: Couldn't put Humpty together again. Num: 400 Here is our standard test case: IndexReader indexReader = DirectoryReader.open(directory); IndexSearcher indexSearcher = new IndexSearcher(indexReader); Query query = new TermQuery(new Term("content", "humpty")); TopDocs topDocs = indexSearcher.search(query, FILTER, 100); System.out.println("Searching 'humpty'"); for (ScoreDoc scoreDoc : topDocs.scoreDocs) {    doc = indexReader.document(scoreDoc.doc);    System.out.println("name: " + doc.getField("name").stringValue() +        " - content: " + doc.getField("content").stringValue() + " - num: " + doc.getField("num").stringValue()); } indexReader.close(); Running the code as it is will produce the following output, assuming the FILTER variable is declared: Searching 'humpty' name: First - content: Humpty Dumpty sat on a wall, - num: 100 name: Second - content: Humpty Dumpty had a great fall. - num: 200 name: Fourth - content: Couldn't put Humpty together again. - num: 400 This is a simple search on the word humpty. The search would return the first, second, and fourth sentences. Now, let's take a look at a TermRangeFilter example: TermRangeFilter termRangeFilter = TermRangeFilter.newStringRange("name", "A", "G", true, true); Applying this filter to preceding search (by setting FILTER as termRangeFilter) will produce the following output: Searching 'humpty' name: First - content: Humpty Dumpty sat on a wall, - num: 100 name: Fourth - content: Couldn't put Humpty together again. - num: 400 Note that the second sentence is missing from the results due to this filter. This filter removes documents with name outside of A through G. Both first and fourth sentences start with F that's within the range so their results are included. The second sentence's name value Second is outside the range, so the document is not considered by the query. Let's move on to NumericRangeFilter: NumericRangeFilter numericRangeFilter = NumericRangeFilter.newIntRange("num", 200, 400, true, true); This filter will produce the following results: Searching 'humpty' name: Second - content: Humpty Dumpty had a great fall. - num: 200 name: Fourth - content: Couldn't put Humpty together again. - num: 400 Note that the first sentence is missing from results. It's because its num 100 is outside the specified numeric range 200 to 400 in NumericRangeFilter. Next one is FieldCacheRangeFilter: FieldCacheRangeFilter fieldCacheTermRangeFilter = FieldCacheRangeFilter.newStringRange("name", "A", "G", true, true); The output of this filter is similar to the TermRangeFilter example: Searching 'humpty' name: First - content: Humpty Dumpty sat on a wall, - num: 100 name: Fourth - content: Couldn't put Humpty together again. - num: 400 This filter provides a caching layer on top of TermRangeFilter. Results are similar, but performance is a lot better because the calculated results are cached in memory for the next retrieval. Next is QueryWrapperFiler: QueryWrapperFilter queryWrapperFilter = new QueryWrapperFilter(new TermQuery(new Term("content", "together"))); This example will produce this result: Searching 'humpty' name: Fourth - content: Couldn't put Humpty together again. - num: 400 This filter wraps around TermQuery on term together on the content field. Since the fourth sentence is the only one that contains the word "together" search results is limited to this sentence only. Next one is PrefixFilter: PrefixFilter prefixFilter = new PrefixFilter(new Term("name", "F")); This filter produces the following: Searching 'humpty' name: First - content: Humpty Dumpty sat on a wall, - num: 100 name: Fourth - content: Couldn't put Humpty together again. - num: 400 This filter limits results where the name field begins with letter F. In this case, the first and fourth sentences both have the name field that begins with F (First and Fourth); hence, the results. Next is FieldCacheTermsFilter: FieldCacheTermsFilter fieldCacheTermsFilter = new FieldCacheTermsFilter("name", "First"); This filter produces the following: Searching 'humpty' name: First - content: Humpty Dumpty sat on a wall, - num: 100 This filter limits results with the name containing the word first. Since the first sentence is the only one that contains first, only one sentence is returned in search results. Next is FieldValueFilter: FieldValueFilter fieldValueFilter = new FieldValueFilter("name1"); This would produce the following: Searching 'humpty' Note that there are no results because this filter limits results in which there is at least one value on the filed name1. Since the name1 field doesn't exist in our current example, no documents are returned by this filter; hence, zero results. Next is CachingWrapperFilter: TermRangeFilter termRangeFilter = TermRangeFilter.newStringRange("name", "A", "G", true, true); CachingWrapperFilter cachingWrapperFilter = new CachingWrapperFilter(termRangeFilter); This wrapper wraps around the same TermRangeFilter from above, so the result produced is similar: Searching 'humpty' name: First - content: Humpty Dumpty sat on a wall, - num: 100 name: Fourth - content: Couldn't put Humpty together again. - num: 400 Filters work in conjunction with Queries to refine the search results. As you may have already noticed, the benefit of Filter is its ability to cache results, while Query calculates in real time. When choosing between Filter and Query, you will want to ask yourself whether the search (or filtering) will be repeated. Provided you have enough memory allocation, a cached Filter will always provide a positive impact to search experiences. Creating a custom filter Now that we've seen numerous examples on Lucene's built-in Filters, we are ready for a more advanced topic, custom filters. There are a few important components we need to go over before we start: FieldCache, SortedDocValues, and DocIdSet. We will be using these items in our example to help you gain practical knowledge on the subject. In the FieldCache, as you already learned, is a cache that stores field values in memory in an array structure. It's a very simple data structure as the slots in the array basically correspond to DocIds. This is also the reason why FieldCache only works for a single-valued field. A slot in an array can only hold a single value. Since this is just an array, the lookup time is constant and very fast. The SortedDocValues has two internal data mappings for values' lookup: a dictionary mapping an ordinal value to a field value and a DocId to an ordinal value (for the field value) mapping. In the dictionary data structure, the values are deduplicated, dereferenced, and sorted. There are two methods of interest in this class: getOrd(int) and lookupTerm(BytesRef). The getOrd(int) returns an ordinal for a DocId (int) and lookupTerm(BytesRef) returns an ordinal for a field value. This data structure is the opposite of the inverted index structure, as this provides a DocId to value lookup (similar to FieldCache), instead of value to a DocId lookup. DocIdSet, as the name implies, is a set of DocId. A FieldCacheDocIdSet subclass we will be using is a combination of this set and FieldCache. It iterates through the set and calls matchDoc(int) to find all the matching documents to be returned. In our example, we will be building a simple user security Filter to determine which documents are eligible to be viewed by a user based on the user ID and group ID. The group ID is assumed to be hereditary, where as a smaller ID inherits rights from a larger ID. For example, the following will be our group ID model in our implementation: 10 – admin 20 – manager 30 – user 40 – guest A user with group ID 10 will be able to access documents where its group ID is 10 or above. How to do it... Here is our custom Filter, UserSecurityFilter: public class UserSecurityFilter extends Filter {   private String userIdField; private String groupIdField; private String userId; private String groupId;   public UserSecurityFilter(String userIdField, String groupIdField, String userId, String groupId) {    this.userIdField = userIdField;    this.groupIdField = groupIdField;    this.userId = userId;    this.groupId = groupId; }   public DocIdSet getDocIdSet(AtomicReaderContext context, Bits acceptDocs) throws IOException {    final SortedDocValues userIdDocValues = FieldCache.DEFAULT.getTermsIndex(context.reader(), userIdField);    final SortedDocValues groupIdDocValues = FieldCache.DEFAULT.getTermsIndex(context.reader(), groupIdField);      final int userIdOrd = userIdDocValues.lookupTerm(new BytesRef(userId));    final int groupIdOrd = groupIdDocValues.lookupTerm(new BytesRef(groupId));      return new FieldCacheDocIdSet(context.reader().maxDoc(), acceptDocs) {      @Override      protected final boolean matchDoc(int doc) {        final int userIdDocOrd = userIdDocValues.getOrd(doc);        final int groupIdDocOrd = groupIdDocValues.getOrd(doc);        return userIdDocOrd == userIdOrd || groupIdDocOrd >= groupIdOrd;      }    }; } } This Filter accepts four arguments in its constructor: userIdField: This is the field name for user ID groupIdField: This is the field name for group ID userId: This is the current session's user ID groupId: This is the current session's group ID of the user Then, we implement getDocIdSet(AtomicReaderContext, Bits) to perform our filtering by userId and groupId. We first acquire two SortedDocValues, one for the user ID and one for the group ID, based on the Field names we obtained from the constructor. Then, we look up the ordinal values for the current session's user ID and group ID. The return value is a new FieldCacheDocIdSet object implementing its matchDoc(int) method. This is where we compare both the user ID and group ID to determine whether a document is viewable by the user. A match is true when the user ID matches and the document's group ID is greater than or equal to the user's group ID. To test this Filter, we will set up our index as follows:    Analyzer analyzer = new StandardAnalyzer();    Directory directory = new RAMDirectory();    IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);    IndexWriter indexWriter = new IndexWriter(directory, config);    Document doc = new Document();    StringField stringFieldFile = new StringField("file", "", Field.Store.YES);    StringField stringFieldUserId = new StringField("userId", "", Field.Store.YES);    StringField stringFieldGroupId = new StringField("groupId", "", Field.Store.YES);      doc.removeField("file"); doc.removeField("userId"); doc.removeField("groupId");    stringFieldFile.setStringValue("Z:\shared\finance\2014- sales.xls");    stringFieldUserId.setStringValue("1001");    stringFieldGroupId.setStringValue("20");    doc.add(stringFieldFile); doc.add(stringFieldUserId); doc.add(stringFieldGroupId);    indexWriter.addDocument(doc);      doc.removeField("file"); doc.removeField("userId"); doc.removeField("groupId");    stringFieldFile.setStringValue("Z:\shared\company\2014- policy.doc");    stringFieldUserId.setStringValue("1101");    stringFieldGroupId.setStringValue("30");    doc.add(stringFieldFile); doc.add(stringFieldUserId);    doc.add(stringFieldGroupId);    indexWriter.addDocument(doc);    doc.removeField("file"); doc.removeField("userId");    doc.removeField("groupId");    stringFieldFile.setStringValue("Z:\shared\company\2014- terms-and-conditions.doc");    stringFieldUserId.setStringValue("1205");    stringFieldGroupId.setStringValue("40");    doc.add(stringFieldFile); doc.add(stringFieldUserId);    doc.add(stringFieldGroupId);    indexWriter.addDocument(doc);    indexWriter.commit();    indexWriter.close(); The setup adds three documents to our index with different user IDs and group ID settings in each document, as follows: UserSecurityFilter userSecurityFilter = new UserSecurityFilter("userId", "groupId", "1001", "40"); IndexReader indexReader = DirectoryReader.open(directory); IndexSearcher indexSearcher = new IndexSearcher(indexReader); Query query = new MatchAllDocsQuery(); TopDocs topDocs = indexSearcher.search(query, userSecurityFilter,   100); for (ScoreDoc scoreDoc : topDocs.scoreDocs) { doc = indexReader.document(scoreDoc.doc); System.out.println("file: " + doc.getField("file").stringValue() +" - userId: " + doc.getField("userId").stringValue() + " - groupId: " +       doc.getField("groupId").stringValue());} indexReader.close(); We initialize UserSecurityFilter with the matching names for user ID and group ID fields, and set it up with user ID 1001 and group ID 40. For our test and search, we use MatchAllDocsQuery to basically search without any queries (as it will return all the documents). Here is the output from the code: file: Z:sharedfinance2014-sales.xls - userId: 1001 - groupId: 20 file: Z:sharedcompany2014-terms-and-conditions.doc - userId: 1205 - groupId: 40 The search specifically filters by user ID 1001, so the first document is returned because its user ID is also 1001. The third document is returned because its group ID, 40, is greater than or equal to the user's group ID, which is also 40. Searching with QueryParser QueryParser is an interpreter tool that transforms a search string into a series of Query clauses. It's not absolutely necessary to use QueryParser to perform a search, but it's a great feature that empowers users by allowing the use of search modifiers. A user can specify a phrase match by putting quotes (") around a phrase. A user can also control whether a certain term or phrase is required by putting a plus ("+") sign in front of the term or phrase, or use a minus ("-") sign to indicate that the term or phrase must not exist in results. For Boolean searches, the user can use AND and OR to control whether all terms or phrases are required. To do a field-specific search, you can use a colon (":") to specify a field for a search (for example, content:humpty would search for the term "humpty" in the field "content"). For wildcard searches, you can use the standard wildcard character asterisk ("*") to match 0 or more characters, or a question mark ("?") for matching a single character. As you can see, the general syntax for a search query is not complicated, though the more advanced modifiers can seem daunting to new users. In this article, we will cover more advanced QueryParser features to show you what you can do to customize a search. How to do it.. Let's look at the options that we can set in QueryParser. The following is a piece of code snippet for our setup: Analyzer analyzer = new StandardAnalyzer(); Directory directory = new RAMDirectory(); IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer); IndexWriter indexWriter = new IndexWriter(directory, config); Document doc = new Document(); StringField stringField = new StringField("name", "", Field.Store.YES); TextField textField = new TextField("content", "", Field.Store.YES); IntField intField = new IntField("num", 0, Field.Store.YES);   doc.removeField("name"); doc.removeField("content"); doc.removeField("num"); stringField.setStringValue("First"); textField.setStringValue("Humpty Dumpty sat on a wall,"); intField.setIntValue(100); doc.add(stringField); doc.add(textField); doc.add(intField); indexWriter.addDocument(doc);   doc.removeField("name"); doc.removeField("content"); doc.removeField("num"); stringField.setStringValue("Second"); textField.setStringValue("Humpty Dumpty had a great fall."); intField.setIntValue(200); doc.add(stringField); doc.add(textField); doc.add(intField); indexWriter.addDocument(doc);   doc.removeField("name"); doc.removeField("content"); doc.removeField("num"); stringField.setStringValue("Third"); textField.setStringValue("All the king's horses and all the king's men"); intField.setIntValue(300); doc.add(stringField); doc.add(textField); doc.add(intField); indexWriter.addDocument(doc);   doc.removeField("name"); doc.removeField("content"); doc.removeField("num"); stringField.setStringValue("Fourth"); textField.setStringValue("Couldn't put Humpty together again."); intField.setIntValue(400); doc.add(stringField); doc.add(textField); doc.add(intField); indexWriter.addDocument(doc);   indexWriter.commit(); indexWriter.close();   IndexReader indexReader = DirectoryReader.open(directory); IndexSearcher indexSearcher = new IndexSearcher(indexReader); QueryParser queryParser = new QueryParser("content", analyzer); // configure queryParser here Query query = queryParser.parse("humpty"); TopDocs topDocs = indexSearcher.search(query, 100); We add four documents and instantiate a QueryParser object with a default field and an analyzer. We will be using the same analyzer that was used in indexing to ensure that we apply the same text treatment to maximize matching capability. Wildcard search The query syntax for a wildcard search is the asterisk ("*") or question mark ("?") character. Here is a sample query: Query query = queryParser.parse("humpty*"); This query will return the first, second, and fourth sentences. By default, QueryParser does not allow a leading wildcard character because it has a significant performance impact. A leading wildcard would trigger a full scan on the index since any term can be a potential match. In essence, even an inverted index would become rather useless for a leading wildcard character search. However, it's possible to override this default setting to allow a leading wildcard character by calling setAllowLeadingWildcard(true). You can go ahead and run this example with different search strings to see how this feature works. Depending on where the wildcard character(s) is placed, QueryParser will produce either a PrefixQuery or WildcardQuery. In this specific example in which there is only one wildcard character and it's not the leading character, a PrefixQuery will be produced. Term range search We can produce a TermRangeQuery by using TO in a search string. The range has the following syntax: [start TO end] – inclusive {start TO end} – exclusive As indicated, the angle brackets ( [ and ] ) is inclusive of start and end terms, and curly brackets ( { and } ) is exclusive of start and end terms. It's also possible to mix these brackets to inclusive on one side and exclusive on the other side. Here is a code snippet: Query query = queryParser.parse("[aa TO c]"); This search will return the third and fourth sentences, as their beginning words are All and Couldn't, which are within the range. You can optionally analyze the range terms with the same analyzer by setting setAnalyzeRangeTerms(true). Autogenerated phrase query QueryParser can automatically generate a PhraseQuery when there is more than one term in a search string. Here is a code snippet: queryParser.setAutoGeneratePhraseQueries(true); Query query = queryParser.parse("humpty+dumpty+sat"); This search will generate a PhraseQuery on the phrase humpty dumpty sat and will return the first sentence. Date resolution If you have a date field (by using DateTools to convert date to a string format) and would like to do a range search on date, it may be necessary to match the date resolution on a specific field. Here is a code snippet on setting the Date resolution: queryParser.setDateResolution("date", DateTools.Resolution.DAY); queryParser.setLocale(Locale.US); queryParser.setTimeZone(TimeZone.getTimeZone("Am erica/New_York")); This example sets the resolution to day granularity, locale to US, and time zone to New York. The locale and time zone settings are specific to the date format only. Default operator The default operator on a multiterm search string is OR. You can change the default to AND so all the terms are required. Here is a code snippet that will require all the terms in a search string: queryParser.setDefaultOperator(QueryParser.Operator.AND); Query query = queryParser.parse("humpty dumpty"); This example will return first and second sentences as these are the only two sentences with both humpty and dumpty. Enable position increments This setting is enabled by default. Its purpose is to maintain a position increment of the token that follows an omitted token, such as a token filtered by a StopFilter. This is useful in phrase queries when position increments may be important for scoring. Here is an example on how to enable this setting: queryParser.setEnablePositionIncrements(true); Query query = queryParser.parse(""humpty dumpty""); In our scenario, it won't change our search results. This attribute only enables position increments information to be available in the resulting PhraseQuery. Fuzzy query Lucene's fuzzy search implementation is based on Levenshtein distance. It compares two strings and finds out the number of single character changes that are needed to transform one string to another. The resulting number indicates the closeness of the two strings. In a fuzzy search, a threshold number of edits is used to determine if the two strings are matched. To trigger a fuzzy match in QueryParser, you can use the tilde ~ character. There are a couple configurations in QueryParser to tune this type of query. Here is a code snippet: queryParser.setFuzzyMinSim(2f); queryParser.setFuzzyPrefixLength(3); Query query = queryParser.parse("hump~"); This example will return first, second, and fourth sentences as the fuzzy match matches hump to humpty because these two words are missed by two characters. We tuned the fuzzy query to a minimum similarity to two in this example. Lowercase expanded term This configuration determines whether to automatically lowercase multiterm queries. An analyzer can do this already, so this is more like an overriding configuration that forces multiterm queries to be lowercased. Here is a code snippet: queryParser.setLowercaseExpandedTerms(true); Query query = queryParser.parse(""Humpty Dumpty""); This code will lowercase our search string before search execution. Phrase slop Phrase search can be tuned to allow some flexibility in phrase matching. By default, phrase match is exact. Setting a slop value will give it some tolerance on terms that may not always be matched consecutively. Here is a code snippet that will demonstrate this feature: queryParser.setPhraseSlop(3); Query query = queryParser.parse(""Humpty Dumpty wall""); Without setting a phrase slop, this phrase Humpty Dumpty wall will not have any matches. By setting phrase slop to three, it allows some tolerance so that this search will now return the first sentence. Go ahead and play around with this setting in order to get more familiarized with its behavior. TermQuery and TermRangeQuery A TermQuery is a very simple query that matches documents containing a specific term. The TermRangeQuery is, as its name implies, a term range with a lower and upper boundary for matching. How to do it.. Here are a couple of examples on TermQuery and TermRangeQuery: query = new TermQuery(new Term("content", "humpty")); query = new TermRangeQuery("content", new BytesRef("a"), new BytesRef("c"), true, true); The first line is a simple query that matches the term humpty in the content field. The second line is a range query matching documents with the content that's sorted within a and c. BooleanQuery A BooleanQuery is a combination of other queries in which you can specify whether each subquery must, must not, or should match. These options provide the foundation to build up to logical operators of AND, OR, and NOT, which you can use in QueryParser. Here is a quick review on QueryParser syntax for BooleanQuery: "+" means required; for example, a search string +humpty dumpty equates to must match humpty and should match "dumpty" "-" means must not match; for example, a search string -humpty dumpty equates to must not match humpty and should match dumpty AND, OR, and NOT are pseudo Boolean operators. Under the hood, Lucene uses BooleanClause.Occur to model these operators. The options for occur are MUST, MUST_NOT, and SHOULD. In an AND query, both terms must match. In an OR query, both terms should match. Lastly, in a NOT query, the term MUST_NOT exists. For example, humpty AND dumpty means must match both humpty and dumpty, humpty OR dumpty means should match either or both humpty or dumpty, and NOT humpty means the term humpty must not exist in matching. As mentioned, rudimentary clauses of BooleanQuery have three option: must match, must not match, and should match. These options allow us to programmatically create Boolean operations through an API. How to do it.. Here is a code snippet that demonstrates BooleanQuery: BooleanQuery query = new BooleanQuery(); query.add(new BooleanClause( new TermQuery(new Term("content", "humpty")), BooleanClause.Occur.MUST)); query.add(new BooleanClause(new TermQuery( new Term("content", "dumpty")), BooleanClause.Occur.MUST)); query.add(new BooleanClause(new TermQuery( new Term("content", "wall")), BooleanClause.Occur.SHOULD)); query.add(new BooleanClause(new TermQuery( new Term("content", "sat")), BooleanClause.Occur.MUST_NOT)); How it works… In this demonstration, we will use TermQuery to illustrate the building of BooleanClauses. It's equivalent to this logic: (humpty AND dumpty) OR wall NOT sat. This code will return the second sentence from our setup. Because of the last MUST_NOT BooleanClause on the word "sat", the first sentence is filtered from the results. Note that BooleanClause accepts two arguments: a Query and a BooleanClause.Occur. BooleanClause.Occur is where you specify the matching options: MUST, MUST_NOT, and SHOULD. PrefixQuery and WildcardQuery PrefixQuery, as the name implies, matches documents with terms starting with a specified prefix. WildcardQuery allows you to use wildcard characters for wildcard matching. A PrefixQuery is somewhat similar to a WildcardQuery in which there is only one wildcard character at the end of a search string. When doing a wildcard search in QueryParser, it would return either a PrefixQuery or WildcardQuery, depending on the wildcard character's location. PrefixQuery is simpler and more efficient than WildcardQuery, so it's preferable to use PrefixQuery whenever possible. That's exactly what QueryParser does. How to do it... Here is a code snippet to demonstrate both Query types: PrefixQuery query = new PrefixQuery(new Term("content", "hum")); WildcardQuery query2 = new WildcardQuery(new Term("content", "*um*")); How it works… Both queries would return the same results from our setup. The PrefixQuery will match anything that starts with hum and the WildcardQuery would match anything that contains um. PhraseQuery and MultiPhraseQuery A PhraseQuery matches a particular sequence of terms, while a MultiPhraseQuery gives you an option to match multiple terms in the same position. For example, MultiPhrasQuery supports a phrase such as humpty (dumpty OR together) in which it matches humpty in position 0 and dumpty or together in position 1. How to do it... Here is a code snippet to demonstrate both Query types: PhraseQuery query = new PhraseQuery(); query.add(new Term("content", "humpty")); query.add(new Term("content", "together")); MultiPhraseQuery query2 = new MultiPhraseQuery(); Term[] terms1 = new Term[1];terms1[0] = new Term("content", "humpty"); Term[] terms2 = new Term[2];terms2[0] = new Term("content", "dumpty"); terms2[1] = new Term("content", "together"); query2.add(terms1); query2.add(terms2); How it works… The first Query, PhraseQuery, searches for the phrase humpty together. The second Query, MultiPhraseQuery, searches for the phrase humpty (dumpty OR together). The first Query would return sentence four from our setup, while the second Query would return sentence one, two, and four. Note that in MultiPhraseQuery, multiple terms in the same position are added as an array. FuzzyQuery A FuzzyQuery matches terms based on similarity, using the Damerau-Levenshtein algorithm. We are not going into the details of the algorithm as it is outside of our topic. What we need to know is a fuzzy match is measured in the number of edits between terms. FuzzyQuery allows a maximum of 2 edits. For example, between "humptX" and humpty is first edit and between humpXX and humpty are two edits. There is also a requirement that the number of edits must be less than the minimum term length (of either the input term or candidate term). As another example, ab and abcd would not match because the number of edits between the two terms is 2 and it's not greater than the length of ab, which is 2. How to do it... Here is a code snippet to demonstrate FuzzyQuery: FuzzyQuery query = new FuzzyQuery(new Term("content", "humpXX")); How it works… This Query will return sentences one, two, and four from our setup, as humpXX matches humpty within the two edits. In QueryParser, FuzzyQuery can be triggered by the tilde ( ~ ) sign. An equivalent search string would be humpXX~. Summary This gives you a glimpse of the various querying and filtering features that have been proven to build successful search engines. Resources for Article: Further resources on this subject: Extending ElasticSearch with Scripting [article] Downloading and Setting Up ElasticSearch [article] Lucene.NET: Optimizing and merging index segments [article]
Read more
  • 0
  • 0
  • 10430

article-image-json-jsonnet
Packt
25 Jun 2015
16 min read
Save for later

JSON with JSON.Net

Packt
25 Jun 2015
16 min read
In this article by Ray Rischpater, author of the book JavaScript JSON Cookbook, we show you how you can use strong typing in your applications with JSON using C#, Java, and TypeScript. You'll find the following recipes: How to deserialize an object using Json.NET How to handle date and time objects using Json.NET How to deserialize an object using gson for Java How to use TypeScript with Node.js How to annotate simple types using TypeScript How to declare interfaces using TypeScript How to declare classes with interfaces using TypeScript Using json2ts to generate TypeScript interfaces from your JSON (For more resources related to this topic, see here.) While some say that strong types are for weak minds, the truth is that strong typing in programming languages can help you avoid whole classes of errors in which you mistakenly assume that an object of one type is really of a different type. Languages such as C# and Java provide strong types for exactly this reason. Fortunately, the JSON serializers for C# and Java support strong typing, which is especially handy once you've figured out your object representation and simply want to map JSON to instances of classes you've already defined. We use Json.NET for C# and gson for Java to convert from JSON to instances of classes you define in your application. Finally, we take a look at TypeScript, an extension of JavaScript that provides compile-time checking of types, compiling to plain JavaScript for use with Node.js and browsers. We'll look at how to install the TypeScript compiler for Node.js, how to use TypeScript to annotate types and interfaces, and how to use a web page by Timmy Kokke to automatically generate TypeScript interfaces from JSON objects. How to deserialize an object using Json.NET In this recipe, we show you how to use Newtonsoft's Json.NET to deserialize JSON to an object that's an instance of a class. We'll use Json.NET because although this works with the existing .NET JSON serializer, there are other things that I want you to know about Json.NET, which we'll discuss in the next two recipes. Getting ready To begin, you need to be sure you have a reference to Json.NET in your project. The easiest way to do this is to use NuGet; launch NuGet, search for Json.NET, and click on Install, as shown in the following screenshot: You'll also need a reference to the Newonsoft.Json namespace in any file that needs those classes with a using directive at the top of your file: usingNewtonsoft.Json; How to do it… Here's an example that provides the implementation of a simple class, converts a JSON string to an instance of that class, and then converts the instance back into JSON: using System; usingNewtonsoft.Json;   namespaceJSONExample {   public class Record {    public string call;    public double lat;    public double lng; } class Program {    static void Main(string[] args)      {        String json = @"{ 'call': 'kf6gpe-9',        'lat': 21.9749, 'lng': 159.3686 }";          var result = JsonConvert.DeserializeObject<Record>(          json, newJsonSerializerSettings            {        MissingMemberHandling = MissingMemberHandling.Error          });        Console.Write(JsonConvert.SerializeObject(result));          return;        } } } How it works… In order to deserialize the JSON in a type-safe manner, we need to have a class that has the same fields as our JSON. The Record class, defined in the first few lines does this, defining fields for call, lat, and lng. The Newtonsoft.Json namespace provides the JsonConvert class with static methods SerializeObject and DeserializeObject. DeserializeObject is a generic method, taking the type of the object that should be returned as a type argument, and as arguments the JSON to parse, and an optional argument indicating options for the JSON parsing. We pass the MissingMemberHandling property as a setting, indicating with the value of the enumeration Error that in the event that a field is missing, the parser should throw an exception. After parsing the class, we convert it again to JSON and write the resulting JSON to the console. There's more… If you skip passing the MissingMember option or pass Ignore (the default), you can have mismatches between field names in your JSON and your class, which probably isn't what you want for type-safe conversion. You can also pass the NullValueHandling field with a value of Include or Ignore. If Include, fields with null values are included; if Ignore, fields with Null values are ignored. See also The full documentation for Json.NET is at http://www.newtonsoft.com/json/help/html/Introduction.htm. Type-safe deserialization is also possible with JSON support using the .NET serializer; the syntax is similar. For an example, see the documentation for the JavaScriptSerializer class at https://msdn.microsoft.com/en-us/library/system.web.script.serialization.javascriptserializer(v=vs.110).aspx. How to handle date and time objects using Json.NET Dates in JSON are problematic for people because JavaScript's dates are in milliseconds from the epoch, which are generally unreadable to people. Different JSON parsers handle this differently; Json.NET has a nice IsoDateTimeConverter that formats the date and time in ISO format, making it human-readable for debugging or parsing on platforms other than JavaScript. You can extend this method to converting any kind of formatted data in JSON attributes, too, by creating new converter objects and using the converter object to convert from one value type to another. How to do it… Simply include a new IsoDateTimeConverter object when you call JsonConvert.Serialize, like this: string json = JsonConvert.SerializeObject(p, newIsoDateTimeConverter()); How it works… This causes the serializer to invoke the IsoDateTimeConverter instance with any instance of date and time objects, returning ISO strings like this in your JSON: 2015-07-29T08:00:00 There's more… Note that this can be parsed by Json.NET, but not JavaScript; in JavaScript, you'll want to use a function like this: Function isoDateReviver(value) { if (typeof value === 'string') { var a = /^(d{4})-(d{2})-(d{2})T(d{2}):(d{2}):(d{2}(?:.d*)?)(?:([+-])(d{2}):(d{2}))?Z?$/ .exec(value); if (a) {      var utcMilliseconds = Date.UTC(+a[1],          +a[2] - 1,          +a[3],          +a[4],          +a[5],          +a[6]);        return new Date(utcMilliseconds);    } } return value; } The rather hairy regular expression on the third line matches dates in the ISO format, extracting each of the fields. If the regular expression finds a match, it extracts each of the date fields, which are then used by the Date class's UTC method to create a new date. Note that the entire regular expression—everything between the/characters—should be on one line with no whitespace. It's a little long for this page, however! See also For more information on how Json.NET handles dates and times, see the documentation and example at http://www.newtonsoft.com/json/help/html/SerializeDateFormatHandling.htm. How to deserialize an object using gson for Java Like Json.NET, gson provides a way to specify the destination class to which you're deserializing a JSON object. Getting ready You'll need to include the gson JAR file in your application, just as you would for any other external API. How to do it… You use the same method as you use for type-unsafe JSON parsing using gson using fromJson, except you pass the class object to gson as the second argument, like this: // Assuming we have a class Record that looks like this: /* class Record { private String call; private float lat; private float lng;    // public API would access these fields } */   Gson gson = new com.google.gson.Gson(); String json = "{ "call": "kf6gpe-9", "lat": 21.9749, "lng": 159.3686 }"; Record result = gson.fromJson(json, Record.class); How it works… The fromGson method always takes a Java class. In the example in this recipe, we convert directly to a plain old Java object that our application can use without needing to use the dereferencing and type conversion interface of JsonElement that gson provides. There's more… The gson library can also deal with nested types and arrays as well. You can also hide fields from being serialized or deserialized by declaring them transient, which makes sense because transient fields aren't serialized. See also The documentation for gson and its support for deserializing instances of classes is at https://sites.google.com/site/gson/gson-user-guide#TOC-Object-Examples. How to use TypeScript with Node.js Using TypeScript with Visual Studio is easy; it's just part of the installation of Visual Studio for any version after Visual Studio 2013 Update 2. Getting the TypeScript compiler for Node.js is almost as easy—it's an npm install away. How to do it… On a command line with npm in your path, run the following command: npm install –g typescript The npm option –g tells npm to install the TypeScript compiler globally, so it's available to every Node.js application you write. Once you run it, npm downloads and installs the TypeScript compiler binary for your platform. There's more… Once you run this command to install the compiler, you'll have the TypeScript compiler tsc available on the command line. Compiling a file with tsc is as easy as writing the source code and saving in a file that ends in .ts extension, and running tsc on it. For example, given the following TypeScript saved in the file hello.ts: function greeter(person: string) { return "Hello, " + person; }   var user: string = "Ray";   console.log(greeter(user)); Running tschello.ts at the command line creates the following JavaScript: function greeter(person) { return "Hello, " + person; }   var user = "Ray";   console.log(greeter(user)); Try it! As we'll see in the next section, the function declaration for greeter contains a single TypeScript annotation; it declares the argument person to be string. Add the following line to the bottom of hello.ts: console.log(greeter(2)); Now, run the tschello.ts command again; you'll get an error like this one: C:UsersrarischpDocumentsnode.jstypescripthello.ts(8,13): error TS2082: Supplied parameters do not match any signature of call target:        Could not apply type 'string' to argument 1 which is         of type 'number'. C:UsersrarischpDocumentsnode.jstypescripthello.ts(8,13): error TS2087: Could not select overload for 'call' expression. This error indicates that I'm attempting to call greeter with a value of the wrong type, passing a number where greeter expects a string. In the next recipe, we'll look at the kinds of type annotations TypeScript supports for simple types. See also The TypeScript home page, with tutorials and reference documentation, is at http://www.typescriptlang.org/. How to annotate simple types using TypeScript Type annotations with TypeScript are simple decorators appended to the variable or function after a colon. There's support for the same primitive types as in JavaScript, and to declare interfaces and classes, which we will discuss next. How to do it… Here's a simple example of some variable declarations and two function declarations: function greeter(person: string): string { return "Hello, " + person; }   function circumference(radius: number) : number { var pi: number = 3.141592654; return 2 * pi * radius; }   var user: string = "Ray";   console.log(greeter(user)); console.log("You need " + circumference(2) + " meters of fence for your dog."); This example shows how to annotate functions and variables. How it works… Variables—either standalone or as arguments to a function—are decorated using a colon and then the type. For example, the first function, greeter, takes a single argument, person, which must be a string. The second function, circumference, takes a radius, which must be a number, and declares a single variable in its scope, pi, which must be a number and has the value 3.141592654. You declare functions in the normal way as in JavaScript, and then add the type annotation after the function name, again using a colon and the type. So, greeter returns a string, and circumference returns a number. There's more… TypeScript defines the following fundamental type decorators, which map to their underlying JavaScript types: array: This is a composite type. For example, you can write a list of strings as follows: var list:string[] = [ "one", "two", "three"]; boolean: This type decorator can contain the values true and false. number: This type decorator is like JavaScript itself, can be any floating-point number. string: This type decorator is a character string. enum: An enumeration, written with the enum keyword, like this: enumColor { Red = 1, Green, Blue }; var c : Color = Color.Blue; any: This type indicates that the variable may be of any type. void: This type indicates that the value has no type. You'll use void to indicate a function that returns nothing. See also For a list of the TypeScript types, see the TypeScript handbook at http://www.typescriptlang.org/Handbook. How to declare interfaces using TypeScript An interface defines how something behaves, without defining the implementation. In TypeScript, an interface names a complex type by describing the fields it has. This is known as structural subtyping. How to do it… Declaring an interface is a little like declaring a structure or class; you define the fields in the interface, each with its own type, like this: interface Record { call: string; lat: number; lng: number; }   Function printLocation(r: Record) { console.log(r.call + ': ' + r.lat + ', ' + r.lng); }   var myObj = {call: 'kf6gpe-7', lat: 21.9749, lng: 159.3686};   printLocation(myObj); How it works… The interface keyword in TypeScript defines an interface; as I already noted, an interface consists of the fields it declares with their types. In this listing, I defined a plain JavaScript object, myObj and then called the function printLocation, that I previously defined, which takes a Record. When calling printLocation with myObj, the TypeScript compiler checks the fields and types each field and only permits a call to printLocation if the object matches the interface. There's more… Beware! TypeScript can only provide compile-type checking. What do you think the following code does? interface Record { call: string; lat: number; lng: number; }   Function printLocation(r: Record) { console.log(r.call + ': ' + r.lat + ', ' + r.lng); }   var myObj = {call: 'kf6gpe-7', lat: 21.9749, lng: 159.3686}; printLocation(myObj);   var json = '{"call":"kf6gpe-7","lat":21.9749}'; var myOtherObj = JSON.parse(json); printLocation(myOtherObj); First, this compiles with tsc just fine. When you run it with node, you'll see the following: kf6gpe-7: 21.9749, 159.3686 kf6gpe-7: 21.9749, undefined What happened? The TypeScript compiler does not add run-time type checking to your code, so you can't impose an interface on a run-time created object that's not a literal. In this example, because the lng field is missing from the JSON, the function can't print it, and prints the value undefined instead. This doesn't mean that you shouldn't use TypeScript with JSON, however. Type annotations serve a purpose for all readers of the code, be they compilers or people. You can use type annotations to indicate your intent as a developer, and readers of the code can better understand the design and limitation of the code you write. See also For more information about interfaces, see the TypeScript documentation at http://www.typescriptlang.org/Handbook#interfaces. How to declare classes with interfaces using TypeScript Interfaces let you specify behavior without specifying implementation; classes let you encapsulate implementation details behind an interface. TypeScript classes can encapsulate fields or methods, just as classes in other languages. How to do it… Here's an example of our Record structure, this time as a class with an interface: class RecordInterface { call: string; lat: number; lng: number;   constructor(c: string, la: number, lo: number) {} printLocation() {}   }   class Record implements RecordInterface { call: string; lat: number; lng: number; constructor(c: string, la: number, lo: number) {    this.call = c;    this.lat = la;    this.lng = lo; }   printLocation() {    console.log(this.call + ': ' + this.lat + ', ' + this.lng); } }   var myObj : Record = new Record('kf6gpe-7', 21.9749, 159.3686);   myObj.printLocation(); How it works… The interface keyword, again, defines an interface just as the previous section shows. The class keyword, which you haven't seen before, implements a class; the optional implements keyword indicates that this class implements the interface RecordInterface. Note that the class implementing the interface must have all of the same fields and methods that the interface prescribes; otherwise, it doesn't meet the requirements of the interface. As a result, our Record class includes fields for call, lat, and lng, with the same types as in the interface, as well as the methods constructor and printLocation. The constructor method is a special method called when you create a new instance of the class using new. Note that with classes, unlike regular objects, the correct way to create them is by using a constructor, rather than just building them up as a collection of fields and values. We do that on the second to the last line of the listing, passing the constructor arguments as function arguments to the class constructor. See also There's a lot more you can do with classes, including defining inheritance and creating public and private fields and methods. For more information about classes in TypeScript, see the documentation at http://www.typescriptlang.org/Handbook#classes. Using json2ts to generate TypeScript interfaces from your JSON This last recipe is more of a tip than a recipe; if you've got some JSON you developed using another programming language or by hand, you can easily create a TypeScript interface for objects to contain the JSON by using Timmy Kokke's json2ts website. How to do it… Simply go to http://json2ts.com and paste your JSON in the box that appears, and click on the generate TypeScript button. You'll be rewarded with a second text-box that appears and shows you the definition of the TypeScript interface, which you can save as its own file and include in your TypeScript applications. How it works… The following figure shows a simple example: You can save this typescript as its own file, a definition file, with the suffix .d.ts, and then include the module with your TypeScript using the import keyword, like this: import module = require('module'); Summary In this article we looked at how you can adapt the type-free nature of JSON with the type safety provided by languages such as C#, Java, and TypeScript to reduce programming errors in your application. Resources for Article: Further resources on this subject: Playing with Swift [article] Getting Started with JSON [article] Top two features of GSON [article]
Read more
  • 0
  • 0
  • 4590

article-image-introduction-reactive-programming
Packt
24 Jun 2015
23 min read
Save for later

An Introduction to Reactive Programming

Packt
24 Jun 2015
23 min read
In this article written by Nickolay Tsvetinov, author of the book Learning Reactive Programming with Java 8, this article will present RxJava (https://github.com/ReactiveX/RxJava), an open source Java implementation of the reactive programming paradigm. Writing code using RxJava requires a different kind of thinking, but it will give you the power to create complex logic using simple pieces of well-structured code. In this article, we will cover: What reactive programming is Reasons to learn and use this style of programming Setting up RxJava and comparing it with familiar patterns and structures A simple example with RxJava (For more resources related to this topic, see here.) What is reactive programming? Reactive programming is a paradigm that revolves around the propagation of change. In other words, if a program propagates all the changes that modify its data to all the interested parties (users, other programs, components, and subparts), then this program can be called reactive. A simple example of this is Microsoft Excel. If you set a number in cell A1 and another number in cell 'B1', and set cell 'C1' to SUM(A1, B1); whenever 'A1' or 'B1' changes, 'C1' will be updated to be their sum. Let's call this the reactive sum. What is the difference between assigning a simple variable c to be equal to the sum of the a and b variables and the reactive sum approach? In a normal Java program, when we change 'a' or 'b', we will have to update 'c' ourselves. In other words, the change in the flow of the data represented by 'a' and 'b', is not propagated to 'c'. Here is this illustrated through source code: int a = 4; int b = 5; int c = a + b; System.out.println(c); // 9   a = 6; System.out.println(c); // 9 again, but if 'c' was tracking the changes of 'a' and 'b', // it would've been 6 + 5 = 11 This is a very simple explanation of what "being reactive" means. Of course, there are various implementations of this idea and there are various problems that these implementations must solve. Why should we be reactive? The easiest way for us to answer this question is to think about the requirements we have while building applications these days. While 10-15 years ago it was normal for websites to go through maintenance or to have a slow response time, today everything should be online 24/7 and should respond with lightning speed; if it's slow or down, users would prefer an alternative service. Today slow means unusable or broken. We are working with greater volumes of data that we need to serve and process fast. HTTP failures weren't something rare in the recent past, but now, we have to be fault-tolerant and give our users readable and reasonable message updates. In the past, we wrote simple desktop applications, but today we write web applications, which should be fast and responsive. In most cases, these applications communicate with a large number of remote services. These are the new requirements we have to fulfill if we want our software to be competitive. So in other words we have to be: Modular/dynamic: This way, we will be able to have 24/7 systems, because modules can go offline and come online without breaking or halting the entire system. Additionally, this helps us better structure our applications as they grow larger and manage their code base. Scalable: This way, we are going to be able to handle a huge amount of data or large numbers of user requests. Fault-tolerant: This way, the system will appear stable to its users. Responsive: This means fast and available. Let's think about how to accomplish this: We can become modular if our system is event-driven. We can divide the system into multiple micro-services/components/modules that are going to communicate with each other using notifications. This way, we are going to react to the data flow of the system, represented by notifications. To be scalable means to react to the ever-growing data, to react to load without falling apart. Reacting to failures/errors will make the system more fault-tolerant. To be responsive means reacting to user activity in a timely manner. If the application is event-driven, it can be decoupled into multiple self-contained components. This helps us become more scalable, because we can always add new components or remove old ones without stopping or breaking the system. If errors and failures are passed to the right component, which can handle them as notifications, the application can become more fault-tolerant or resilient. So if we build our system to be event-driven, we can more easily achieve scalability and failure tolerance, and a scalable, decoupled, and error-proof application is fast and responsive to users. The Reactive Manifesto (http://www.reactivemanifesto.org/) is a document defining the four reactive principles that we mentioned previously. Each reactive system should be message-driven (event-driven). That way, it can become loosely coupled and therefore scalable and resilient (fault-tolerant), which means it is reliable and responsive (see the preceding diagram). Note that the Reactive Manifesto describes a reactive system and is not the same as our definition of reactive programming. You can build a message-driven, resilient, scalable, and responsive application without using a reactive library or language. Changes in the application data can be modeled with notifications, which can be propagated to the right handlers. So, writing applications using reactive programming is the easiest way to comply with the Manifesto. Introducing RxJava To write reactive programs, we need a library or a specific programming language, because building something like that ourselves is quite a difficult task. Java is not really a reactive programming language (it provides some tools like the java.util.Observable class, but they are quite limited). It is a statically typed, object-oriented language, and we write a lot of boilerplate code to accomplish simple things (POJOs, for example). But there are reactive libraries in Java that we can use. In this article, we will be using RxJava (developed by people in the Java open source community, guided by Netflix). Downloading and setting up RxJava You can download and build RxJava from Github (https://github.com/ReactiveX/RxJava). It requires zero dependencies and supports Java 8 lambdas. The documentation provided by its Javadoc and the GitHub wiki pages is well structured and some of the best out there. Here is how to check out the project and run the build: $ git clone git@github.com:ReactiveX/RxJava.git $ cd RxJava/ $ ./gradlew build Of course, you can also download the prebuilt JAR. For this article, we'll be using version 1.0.8. If you use Maven, you can add RxJava as a dependency to your pom.xml file: <dependency> <groupId>io.reactivex</groupId> <artifactId>rxjava</artifactId> <version>1.0.8</version> </dependency> Alternatively, for Apache Ivy, put this snippet in your Ivy file's dependencies: <dependency org="io.reactivex" name="rxjava" rev="1.0.8" /> If you use Gradle instead, update your build.gradle file's dependencies as follows: dependencies { ... compile 'io.reactivex:rxjava:1.0.8' ... } Now, let's take a peek at what RxJava is all about. We are going to begin with something well known, and gradually get into the library's secrets. Comparing the iterator pattern and the RxJava observable As a Java programmer, it is highly possible that you've heard or used the Iterator pattern. The idea is simple: an Iterator instance is used to traverse through a container (collection/data source/generator), pulling the container's elements one by one when they are required, until it reaches the container's end. Here is a little example of how it is used in Java: List<String> list = Arrays.asList("One", "Two", "Three", "Four", "Five"); // (1)   Iterator<String> iterator = list.iterator(); // (2)   while(iterator.hasNext()) { // 3 // Prints elements (4) System.out.println(iterator.next()); } Every java.util.Collection object is an Iterable instance which means that it has the method iterator(). This method creates an Iterator instance, which has as its source the collection. Let's look at what the preceding code does: We create a new List instance containing five strings. We create an Iterator instance from this List instance, using the iterator() method. The Iterator interface has two important methods: hasNext() and next(). The hasNext() method is used to check whether the Iterator instance has more elements for traversing. Here, we haven't begun going through the elements, so it will return True. When we go through the five strings, it will return False and the program will proceed after the while loop. The first five times, when we call the next() method on the Iterator instance, it will return the elements in the order they were inserted in the collection. So the strings will be printed. In this example, our program consumes the items from the List instance using the Iterator instance. It pulls the data (here, represented by strings) and the current thread blocks until the requested data is ready and received. So, for example, if the Iterator instance was firing a request to a web server on every next() method call, the main thread of our program would be blocked while waiting for each of the responses to arrive. RxJava's building blocks are the observables. The Observable class (note that this is not the java.util.Observable class that comes with the JDK) is the mathematical dual of the Iterator class, which basically means that they are like the two sides of the same coin. It has an underlying collection or computation that produces values that can be consumed by a consumer. But the difference is that the consumer doesn't "pull" these values from the producer like in the Iterator pattern. It is exactly the opposite; the producer 'pushes' the values as notifications to the consumer. Here is an example of the same program but written using an Observable instance: List<String> list = Arrays.asList("One", "Two", "Three", "Four", "Five"); // (1)   Observable<String> observable = Observable.from(list); // (2)   observable.subscribe(new Action1<String>() { // (3) @Override public void call(String element) {    System.out.println(element); // Prints the element (4) } }); Here is what is happening in the code: We create the list of strings in the same way as in the previous example. Then, we create an Observable instance from the list, using the from(Iterable<? extends T> iterable) method. This method is used to create instances of Observable that send all the values synchronously from an Iterable instance (the list in our case) one by one to their subscribers (consumers). Here, we can subscribe to the Observable instance. By subscribing, we tell RxJava that we are interested in this Observable instance and want to receive notifications from it. We subscribe using an anonymous class implementing the Action1 interface, by defining a single method—call(T). This method will be called by the Observable instance every time it has a value, ready to be pushed. Always creating new Action1 instances may seem too verbose, but Java 8 solves this verbosity. So, every string from the source list will be pushed through to the call() method, and it will be printed. Instances of the RxJava Observable class behave somewhat like asynchronous iterators, which notify that there is a next value their subscribers/consumers by themselves. In fact, the Observable class adds to the classic Observer pattern (implemented in Java—see java.util.Observable, see Design Patterns: Elements of Reusable Object-Oriented Software by the Gang Of Four) two things available in the Iterable type. The ability to signal the consumer that there is no more data available. Instead of calling the hasNext() method, we can attach a subscriber to listen for a 'OnCompleted' notification. The ability to signal the subscriber that an error has occurred. Instead of try-catching an error, we can attach an error listener to the Observable instance. These listeners can be attached using the subscribe(Action1<? super T>, Action1 <Throwable>, Action0) method. Let's expand the Observable instance example by adding error and completed listeners: List<String> list = Arrays.asList("One", "Two", "Three", "Four", "Five");   Observable<String> observable = Observable.from(list); observable.subscribe(new Action1<String>() { @Override public void call(String element) {    System.out.println(element); } }, new Action1<Throwable>() { @Override public void call(Throwable t) {    System.err.println(t); // (1) } }, new Action0() { @Override public void call() {    System.out.println("We've finnished!"); // (2) } }); The new things here are: If there is an error while processing the elements, the Observable instance will send this error through the call(Throwable) method of this listener. This is analogous to the try-catch block in the Iterator instance example. When everything finishes, this call() method will be invoked by the Observable instance. This is analogous to using the hasNext() method in order to see if the traversal over the Iterable instance has finished and printing "We've finished!". We saw how we can use the Observable instances and that they are not so different from something familiar to us—the Iterator instance. These Observable instances can be used for building asynchronous streams and pushing data updates to their subscribers (they can have multiple subscribers).This is an implementation of the reactive programming paradigm. The data is being propagated to all the interested parties—the subscribers. Coding using such streams is a more functional-like implementation of Reactive Programming. Of course, there are formal definitions and complex terms for it, but this is the simplest explanation. Subscribing to events should be familiar; for example, clicking on a button in a GUI application fires an event which is propagated to the subscribers—handlers. But, using RxJava, we can create data streams from anything—file input, sockets, responses, variables, caches, user inputs, and so on. On top of that, consumers can be notified that the stream is closed, or that there has been an error. So, by using these streams, our applications can react to failure. To summarize, a stream is a sequence of ongoing messages/events, ordered as they are processed in real time. It can be looked at as a value that is changing through time, and these changes can be observed by subscribers (consumers), dependent on it. So, going back to the example from Excel, we have effectively replaced the traditional variables with "reactive variables" or RxJava's Observable instances. Implementing the reactive sum Now that we are familiar with the Observable class and the idea of how to use it to code in a reactive way, we are ready to implement the reactive sum, mentioned at the beginning of this article. Let's look at the requirements our program must fulfill: It will be an application that runs in the terminal. Once started, it will run until the user enters exit. If the user enters a:<number>, the a collector will be updated to the <number>. If the user enters b:<number>, the b collector will be updated to the <number>. If the user enters anything else, it will be skipped. When both the a and b collectors have initial values, their sum will automatically be computed and printed on the standard output in the format a + b = <sum>. On every change in a or b, the sum will be updated and printed. The first piece of code represents the main body of the program: ConnectableObservable<String> input = from(System.in); // (1)   Observable<Double> a = varStream("a", input); (2) Observable<Double> b = varStream("b", input);   ReactiveSum sum = new ReactiveSum(a, b); (3)   input.connect(); (4) There are a lot of new things happening here: The first thing we must do is to create an Observable instance, representing the standard input stream (System.in). So, we use the from(InputStream) method (implementation will be presented in the next code snippet) to create a ConnectableObservable variable from the System.in. The ConnectableObservable variable is an Observable instance and starts emitting events coming from its source only after its connect() method is called. We create two Observable instances representing the a and b values, using the varStream(String, Observable) method, which we are going to examine later. The source stream for these values is the input stream. We create a ReactiveSum instance, dependent on the a and b values. And now, we can start listening to the input stream. This code is responsible for building dependencies in the program and starting it off. The a and b values are dependent on the user input and their sum is dependent on them. Now let's look at the implementation of the from(InputStream) method, which creates an Observable instance with the java.io.InputStream source: static ConnectableObservable<String> from(final InputStream stream) { return from(new BufferedReader(new InputStreamReader(stream)));   // (1) }   static ConnectableObservable<String> from(final BufferedReader reader) { return Observable.create(new OnSubscribe<String>() { // (2)    @Override    public void call(Subscriber<? super String> subscriber) {      if (subscriber.isUnsubscribed()) { // (3)        return;      }      try {        String line;        while(!subscriber.isUnsubscribed() &&          (line = reader.readLine()) != null) { // (4)            if (line == null || line.equals("exit")) { // (5)              break;            }            subscriber.onNext(line); // (6)          }        }        catch (IOException e) { // (7)          subscriber.onError(e);        }        if (!subscriber.isUnsubscribed()) // (8)        subscriber.onCompleted();      }    } }).publish(); // (9) } This is one complex piece of code, so let's look at it step-by-step: This method implementation converts its InputStream parameter to the BufferedReader object and to calls the from(BufferedReader) method. We are doing that because we are going to use strings as data, and working with the Reader instance is easier. So the actual implementation is in the second method. It returns an Observable instance, created using the Observable.create(OnSubscribe) method. This method is the one we are going to use the most in this article. It is used to create Observable instances with custom behavior. The rx.Observable.OnSubscribe interface passed to it has one method, call(Subscriber). This method is used to implement the behavior of the Observable instance because the Subscriber instance passed to it can be used to emit messages to the Observable instance's subscriber. A subscriber is the client of an Observable instance, which consumes its notifications. If the subscriber has already unsubscribed from this Observable instance, nothing should be done. The main logic is to listen for user input, while the subscriber is subscribed. Every line the user enters in the terminal is treated as a message. This is the main loop of the program. If the user enters the word exit and hits Enter, the main loop stops. Otherwise, the message the user entered is passed as a notification to the subscriber of the Observable instance, using the onNext(T) method. This way, we pass everything to the interested parties. It's their job to filter out and transform the raw messages. If there is an IO error, the subscribers are notified with an OnError notification through the onError(Throwable) method. If the program reaches here (through breaking out of the main loop) and the subscriber is still subscribed to the Observable instance, an OnCompleted notification is sent to the subscribers using the onCompleted() method. With the publish() method, we turn the new Observable instance into ConnectableObservable instance. We have to do this because, otherwise, for every subscription to this Observable instance, our logic will be executed from the beginning. In our case, we want to execute it only once and all the subscribers to receive the same notifications; this is achievable with the use of a ConnectableObservable instance. This illustrates a simplified way to turn Java's IO streams into Observable instances. Of course, with this main loop, the main thread of the program will block waiting for user input. This can be prevented using the right Scheduler instances to move the logic to another thread. Now, every line the user types into the terminal is propagated as a notification by the ConnectableObservable instance created by this method. The time has come to look at how we connect our value Observable instances, representing the collectors of the sum, to this input Observable instance. Here is the implementation of the varStream(String, Observable) method, which takes a name of a value and source Observable instance and returns an Observable instance representing this value: public static Observable<Double> varStream(final String varName, Observable<String> input) { final Pattern pattern = Pattern.compile("\^s*" + varName +   "\s*[:|=]\s*(-?\d+\.?\d*)$"); // (1) return input .map(new Func1<String, Matcher>() {    public Matcher call(String str) {      return pattern.matcher(str); // (2)    } }) .filter(new Func1<Matcher, Boolean>() {    public Boolean call(Matcher matcher) {      return matcher.matches() && matcher.group(1) != null; //       (3)    } }) .map(new Func1<Matcher, Double>() {    public Double call(Matcher matcher) {      return Double.parseDouble(matcher.group(1)); // (4)    } }); } The map() and filter() methods called on the Observable instance here are part of the fluent API provided by RxJava. They can be called on an Observable instance, creating a new Observable instance that depends on these methods and that transforms or filters the incoming data. Using these methods the right way, you can express complex logic in a series of steps leading to your objective: Our variables are interested only in messages in the format <var_name>: <value> or <var_name> = <value>, so we are going to use this regular expression to filter and process only these kinds of messages. Remember that our input Observable instance sends each line the user writes; it is our job to handle it the right way. Using the messages we receive from the input, we create a Matcher instance using the preceding regular expression as a pattern. We pass through only data that matches the regular expression. Everything else is discarded. Here, the value to set is extracted as a Double number value. This is how the values a and b are represented by streams of double values, changing in time. Now we can implement their sum. We implemented it as a class that implements the Observer interface, because I wanted to show you another way of subscribing to Observable instances—using the Observer interface. Here is the code: public static final class ReactiveSum implements Observer<Double> { // (1) private double sum; public ReactiveSum(Observable<Double> a, Observable<Double> b) {    this.sum = 0;    Observable.combineLatest(a, b, new Func2<Double, Double,     Double>() { // (5)      public Double call(Double a, Double b) {       return a + b;      }    }).subscribe(this); // (6) } public void onCompleted() {    System.out.println("Exiting last sum was : " + this.sum); //     (4) } public void onError(Throwable e) {    System.err.println("Got an error!"); // (3)    e.printStackTrace(); } public void onNext(Double sum) {    this.sum = sum;    System.out.println("update : a + b = " + sum); // (2) } } This is the implementation of the actual sum, dependent on the two Observable instances representing its collectors: It is an Observer interface. The Observer instance can be passed to the Observable instance's subscribe(Observer) method and defines three methods that are named after the three types of notification: onNext(T), onError(Throwable), and onCompleted. In our onNext(Double) method implementation, we set the sum to the incoming value and print an update to the standard output. If we get an error, we just print it. When everything is done, we greet the user with the final sum. We implement the sum with the combineLatest(Observable, Observable, Func2) method. This method creates a new Observable instance. The new Observable instance is updated when any of the two Observable instances, passed to combineLatest receives an update. The value emitted through the new Observable instance is computed by the third parameter—a function that has access to the latest values of the two source sequences. In our case, we sum up the values. There will be no notification until both of the Observable instances passed to the method emit at least one value. So, we will have the sum only when both a and b have notifications. We subscribe our Observer instance to the combined Observable instance. Here is sample of what the output of this example would look like: Reacitve Sum. Type 'a: <number>' and 'b: <number>' to try it. a:4 b:5 update : a + b = 9.0 a:6 update : a + b = 11.0 So this is it! We have implemented our reactive sum using streams of data. Summary In this article, we went through the reactive principles and the reasons we should learn and use them. It is not so hard to build a reactive application; it just requires structuring the program in little declarative steps. With RxJava, this can be accomplished by building multiple asynchronous streams connected the right way, transforming the data all the way through its consumer. The two examples presented in this article may look a bit complex and confusing at first glance, but in reality, they are pretty simple. If you want to read more about reactive programming, take a look at Reactive Programming in the Netflix API with RxJava, a fine article on the topic, available at http://techblog.netflix.com/2013/02/rxjava-netflix-api.html. Another fine post introducing the concept can be found here: https://gist.github.com/staltz/868e7e9bc2a7b8c1f754. And these are slides about reactive programming and RX by Ben Christensen, one of the creators of RxJava: https://speakerdeck.com/benjchristensen/reactive-programming-with-rx-at-qconsf-2014. Resources for Article: Further resources on this subject: The Observer Pattern [article] The Five Kinds of Python Functions Python 3.4 Edition [article] Discovering Python's parallel programming tools [article]
Read more
  • 0
  • 0
  • 4496
article-image-animation-fundamentals
Packt
24 Jun 2015
12 min read
Save for later

Animation Fundamentals

Packt
24 Jun 2015
12 min read
In this article by Alan Thorn, author of the book, Unity Animation Essentials, you learn the fundamentals of animation. The importance of animation cannot be understated. Without animation, everything in-game would be statuesque, lifeless and perhaps boring. This holds true for nearly everything in games: doors must open, characters must move, foliage should sway with the wind, sparkles and particles should explode and shine, and so on. Consequently, learning animation and how to animate properly will unquestionably empower you as a developer, no matter what your career plans are. As a subject, animation creeps unavoidably into most game fields, and it's a critical concern for all members of a team—obviously for artists and animators, but also for programmers, sound designers, and level designers. The aim is to quickly and effectively introduce the fundamental concepts and practices surrounding animation in real-time games, specifically animation in Unity. You will be capable of making effective animations that express your artistic vision, as well as gaining an understanding of how and where you can expand your knowledge to the next level. But to reach that stage we'll begin, with the most basic concepts of animation—the groundwork for any understanding of animation. (For more resources related to this topic, see here.) Understanding animation At its most fundamental level, animation is about a relationship between two specific and separate properties, namely change on one hand and time on the other. Technically, animation defines change over time, that is, how a property adjusts or varies across time, such as how the position of a car changes over time, or how the color of a traffic light transitions over time from red to green. Thus, every animation occurs for a total length of time (duration), and throughout its lifetime, the properties of the objects will change at specific moments (frames), anywhere from the beginning to the end of the animation. This definition is itself technical and somewhat dry, but relevant and important. However, it fails to properly encompass the aesthetic and artistic properties of animation. Through animation and through creative changes in properties over time, moods, atmospheres, worlds, and ideas can be conveyed effectively. Even so, the emotional and artistic power that comes from animation is ultimately a product of the underlying relationship of change with time. Within this framework of change over time, we may identify further key terms, specifically in computer animation. You may already be familiar with these concepts, but let's define them more formally. Frames Within an animation, time must necessarily be divided into separate and discrete units where change can occur. These units are called frames. Time is essentially a continuous and unbreakable quantity, insofar as you can always subdivide time (such as a second) to get an even smaller unit of time (such as a millisecond), and so on. In theory, this process of subdivision could essentially be carried on ad infinitum, resulting in smaller and smaller fractions of time. The concept of amoment or eventin time is, by contrast, a human-made, discrete, and self-contained entity. It is a discrete thing that we perceive in time to make our experience of the world more intelligible. Unlike time, a moment is what it is, and it cannot be broken down into something smaller without ceasing to exist altogether. Inside a moment, or a frame, things can happen. A frame is an opportunity for properties to change—for doors to open, characters to move, colors to change, and more. In video game animation specifically, each second can sustain or contain a specified number of frames. The amount of frames passing within a second will vary from computer to computer, depending on the hardware capacity, the software installed, and other factors. The frame capacity per second is called FPS (frames per second). It's often used as a measure of performance for a game, since lower frame rates are typically associated with jittery and poor performance. Key frames Although a frame represents an opportunity for change, it doesn't necessarily mean change will occur. Many frames can pass by in a second, and not every frame requires a change. Moreover, even if a change needs to happen for a frame, it would be tedious if animators had to define every frame of action. One of the benefits of computer animation, contrasted with manual, or "old", animation techniques, is that it can make our lives easier. Animators can instead define key, or important, frames within an animation sequence, and then have the computer automatically generate the intervening frames. Consider a simple animation in which a standard bedroom door opens by rotating outwards on its hinges by 90 degrees. The animation begins with the door in the closed position and ends in an open position. Here, we have defined two key states for the door (open and closed), and these states mark the beginning and end of the animation sequence. These are called key frames, because they define key moments within the animation. On the basis of key frames, Unity (as we'll see) can autogenerate the in-between frames (tweens), smoothly rotating the door from its starting frame to its ending frame. The mathematical process of generating tweens is termed as interpolation. Animation types The previous section defined the core concepts underpinning animation generally. Specifically, it covered change, time, frames, key frames, tweens, and interpolation. On the basis of this, we can identify several types of animation in video games from a technical perspective, as opposed to an artistic one. All variations depend on the concepts we've seen, but they do so in different and important ways. These animation types are significant for Unity because the differences in their nature require us to handle and work with them differently, using specific workflows and techniques. The animation types are listed throughout this section, as follows. Rigid body animation Rigid body animation is used to create pre-made animation sequences that move or change the properties of objects, considering those objects as whole or complete entities, as opposed to objects with smaller and moving parts. Some examples of this type of animation are a car racing along the road, a door opening on its hinges, a spaceship flying through space on its trajectory, and a piano falling from the side of a building. Despite the differences among these examples, they all have an important common ingredient. Specifically, although the object changes across key frames, it does so as a single and complete object. In other words, although the door may rotate on its hinges from a closed state to an open state, it still ends the animation as a door, with the same internal structure and composition as before. It doesn't morph into a tiger or a lion. It doesn't explode or turn into jelly. It doesn't melt into rain drops. Throughout the animation, the door retains its physical structure. It changes only in terms of its position, rotation and scale. Thus, in rigid body animation, changes across key frames apply to whole objects and their highest level properties. They do not filter down to sub properties and internal components, and they don't change the essence or internal forms of objects. These kinds of animation can be defined either directly in the Unity animation editor, or inside 3D animation software (such as Maya, Max, or Blender) and then imported to Unity through mesh files. Key frame animation for rigid bodies Rigged or bone-based animation If you need to animate human characters, animals, flesh-eating goo, or exploding and deforming objects, then rigid body animation probably won't be enough. You'll need bone-based animation (also called rigged animation). This type of animation changes not the position, rotation, or scale of an object, but the movement and deformation of its internal parts across key frames. It works like this: the animation artist creates a network of special bone objects to approximate the underlying skeleton of a mesh, allowing independent and easy control of the surrounding and internal geometry. This is useful for animating arms, legs, head turns, mouth movements, tree rustling, and a lot more. Typically, bone-based animation is created as a complete animation sequence in 3D modeling software and is imported to Unity inside a mesh file, which can be processed and accessed via Mecanim, the Unity animation system. Bone-based animation is useful for character meshes Sprite animation For 2D games, graphical user interfaces, and a variety of special effects in 3D (such as water textures), you'll sometimes need a standard quad or plane mesh with a texture that animates. In this case, neither the object moves, as with rigid body animation, nor do any of its internal parts change, as with rigged animation. Rather, the texture itself animates. This animation type is called sprite animation. It takes a sequence of images or frames and plays them in order at a specified frame rate to achieve a consistent and animated look, for example, a walk cycle for a character in a 2D side-scrolling game. Physics-based animation In many cases, you can predefine your animation. That is, you can fully plan and create animation sequences for objects that will play in a predetermined way at runtime, such as walk cycles, sequences of door opening, explosions, and others. But sometimes, you need animation that appears realistic and yet responds to its world dynamically, based on decisions made by the player and other variable factors of the world that cannot be predicted ahead of time. There are different ways to handle these scenarios, but one is to use the Unity physics system, which includes components and other data that can be attached to objects to make them behave realistically. Examples of this include falling to the ground under the effects of gravity, and bending and twisting like cloth in the wind. Physics animation Morph animation Occasionally, none of the animation methods you've read so far—rigid body, physics-based, rigged, or sprite animation—give you what's needed. Maybe, you need to morph one thing into another, such as a man into a werewolf, a toad into a princess, or a chocolate bar into a castle. In some instances, you need to blend, or merge smoothly, the state of a mesh in one frame into a different state in a different frame. This is called morph animation, or blend shapes. Essentially, this method relies on snapshots of a mesh's vertices across key frames in an animation, and blends between the states via tweens. The downside to this method is its computational expense. It's typically performance intensive, but its results can be impressive and highly realistic. See the following screenshot for the effects of blend shapes: Morph animation start state BlendShapes transition a model from one state to another. See the following figure for the destination state: Morph animation end state Video animation Perhaps one of Unity's lesser known animation features is its ability to play video files as animated textures on desktop platforms and full-screen movies on mobile devices such as iOS and Android devices. Unity accepts OGV (Ogg Theora) videos as assets, and can replay both videos and sounds from these files as an animated texture on mesh objects in the scene. This allows developers to replay pre-rendered video file output from any animation package directly in their games. This feature is powerful and useful, but also performance intensive. Video file animation Particle animation Most animation methods considered so far are for clearly defined, tangible things in a scene, such as sprites and meshes. These are objects with clearly marked boundaries that separate them from other things. But you'll frequently need to animate less tangible, less solid, and less physical matter, such as smoke, fire, bubbles, sparkles, smog, swarms, fireworks, clouds, and others. For these purposes, a particle system is indispensable. Particle systems are entirely configurable objects that can be used to simulate rain, snow, flock of birds, and more. See the following screenshot for a particle system in action: Particle system animation Programmatic animation Surprisingly, the most common animation type is perhaps programmatic animation, or dynamic animation. If you need a spaceship to fly across the screen, a user-controlled character to move around an environment, or a door to open when approached, you'll probably need some programmatic animation. This refers to changes made to properties in objects over time, which arise because of programming—code that a developer has written specifically for that purpose. Unlike many other forms of animation, the programmatic form is not created or built in advance by an artist or animator per se, because its permutations and combinations cannot be known upfront. So, it's coded by a programmer and has the flexibility to change and adjust according to conditions and variables at runtime. Of course, in many cases, animations are made by artists and animators and the code simply triggers or guides the animation at runtime. Summary This article considered animation abstractly, as a form of art, and as a science. We covered the types of animation that are most common in Unity games. In addition, we examined some core tasks and ideas in programmatic animation, including the ability to animate and change objects dynamically through code without relying on pre-scripted or predefined animations, which will engross you in much of this article. Resources for Article: Further resources on this subject: Saying Hello to Unity and Android [article] Looking Back, Looking Forward [article] What's Your Input? [article]
Read more
  • 0
  • 0
  • 2689

article-image-entering-people-information
Packt
24 Jun 2015
9 min read
Save for later

Entering People Information

Packt
24 Jun 2015
9 min read
In this article by Pravin Ingawale, author of the book Oracle E-Business Suite R12.x HRMS – A Functionality Guide, will learn about entering a person's information in Oracle HRMS. We will understand the hiring process in Oracle. This, actually, is part of the Oracle I-recruitment module in Oracle apps. Then we will see how to create an employee in Core HR. Then, we will learn the concept of person types and defining person types. We will also learn about entering information for an employee, including additional information. Let's see how to create an employee in core HR. (For more resources related to this topic, see here.) Creating an employee An employee is the most important entity in an organization. Before creating an employee, the HR officer must know the date from which the employee will be active in the organization. In Oracle terminology, you can call it the employee's hire date. Apart from this, the HR officer must know basic details of the employee such as first name, last name, date of birth, and so on. Navigate to US HRMS Manager | People | Enter and Maintain. This is the basic form, called People in Oracle HRMS, which is used to create an employee in the application. As you can see in the form, there is a field named Last, which is marked in yellow. This indicates that this is mandatory to create an employee record. First, you need to set the effective date on the form. You can set this by clicking on the icon, as shown in the following screenshot: You need to enter the mandatory field data along with additional data. The following screenshot shows the data entered: Once you enter the required data, you need to specify the action for the entered record. The action we have selected is Create Employment. The Create Employment action will create an employee in the application. There are other actions such as Create Applicant, which is used to create an applicant for I-Recruitment. The Create Placement action is used to create a contingent worker in your enterprise. Once you select this action, it will prompt you to enter the person type of this employee as in the following screenshot. Select the Person Type as Employee and save the record. We will see the concept of person type in the next section. Once you select the employee person type and then save the record, the system will automatically generate the employee number for the person. In our case, the system has generated an employee number 10160. So now, we have created an employee in the application. Concept of person types In any organization, you need to identify different types of people. Here, you can say that you need to group different types of people. There are basically three types of people you capture in HRMS system. They are as follows: Employees: These include current employees and past employees. Past employees are those who were part of your enterprise earlier and are no longer active in the system. You can call them terminated or ex-employees. Applicants: If you are using I-recruitment, applicants can be created. External people: Contact is a special category of external type. Contacts are associated with an employee or an applicant. For example, there might be a need to record the name, address, and phone number of an emergency contact for each employee in your organization. There might also be a need to keep information on dependents of an employee for medical insurance purposes or for some payments in payroll processing. Using person types There are predefined person types in Oracle HRMS. You can add more person types as per your requirements. You can also change the name of existing person types when you install the system. Let's take an example for your understanding. Your organization has employees. There might be employees of different types; you might have regular employees and employees who are contractors in your organization. Hence, you can categorize employees in your organization into two types: Regular employees Consultants The reason for creating these categories is to easily identify the employee type and store different types of information for each category. Similarly, if you are using I-recruitment, then you will have candidates. Hence, you can categorize candidates into two types. One will be internal candidate and the other will be external candidate. Internal candidates will be employees within your organization who can apply for an opening within your organization. An external candidate is an applicant who does not work for your organization but is applying for a position that is open in your company. Defining person types In an earlier section, you learned the concept of person types, and now you will learn how to define person types in the system. Navigate to US HRMS Manager | Other Definitions | Person Types. In the preceding screenshot, you can see four fields, that is, User Name, System Name, Active, and Default flag. There are eight person types recognized by the system and identified by a system name. For each system name, there are predefined usernames. A username can be changed as per your needs. There must be one username that should be the default. While creating an employee, the person types that are marked by the default flag will come by default. To change a username for a person type, delete the contents of the User Name field and type the name you'd prefer to keep. To add a new username to a person type system name: Select New Record from the Edit menu. Enter a unique username and select the system name you want to use. Deactivating person types You cannot delete person types, but you can deactivate them by unchecking the Active checkbox. Entering personal and additional information Until now, you learned how to create an employee by entering basic details such as title, gender, and date of birth. In addition to this, you can enter some other information for an employee. As you can see on the people form, there are various tabs such as Employment, Office details, Background, and so on. Each tab has some fields that can store information. For example, in our case, we have stored the e-mail address of the employee in the Office Details tab. Whenever you enter any data for an employee and then click on the Save button, it will give you two options as shown in the following screenshot: You have to select one of the options to save the data. The differences between both the options are explained with an example. Let's say you have hired a new employee as of 01-Jan-2014. Hence, a new record will be created in the application with the start date as 01-Jan-2014. This is called an effective start date of the record. There is no end date for this record, so Oracle gives it a default end date, which is 31-Dec-4712. This is called the effective end date of the record. Now, in our case, Oracle has created a single record with the start date and end date as 01-Jan-2014 and 31-Dec-4712, respectively. When we try to enter additional data for this record (in our case, it is phone number) then Oracle will prompt you to select the Correction or Update option. This is called the date-tracked option. If you select the correction mode, then Oracle will update an existing record in the application. Now, if you date track to, say, 01-Aug-2014 and then enter the phone number and select the update mode, then it will end the historical data with the new date minus one and create a new record with the start date 01-Aug-2014 with the phone number that you have entered. Thus, the historical data will be preserved and a new record will be created with the start date 01-Aug-2014 and a phone number. The following tabular representation will help you understand better in Correction mode: Employee Number LastName Effective Start Date Effective End Date Phone Number 10160 Test010114 01-Jan-2014 31-Dec-4712 +0099999999 Now, if you want to change the phone number from 01-Aug-2014 in Update mode (date 01-Aug-2014), then the record will be as follows: Employee Number LastName Effective Start Date Effective End Date Phone Number 10160 Test010114 01-Jan-2014 31-Jul-2014 +0099999999 10160 Test010114 01-Aug-2014 31-Jul-2014 +0088888888 Thus, in update mode, you can see that historical data is intact. If HR wants to view some historical data, then the HR employee can easily view this data. Everything associated with Oracle HRMS is date-tracked. Every characteristic about the organization, person, position, salary, and benefits is tightly date-tracked. This concept is very important in Oracle and is used in almost all the forms in which you store employee-related information. Thus, you have learned about the date tracking concept in Oracle APPS. There are some additional fields, which can be configured as per your requirements. Additional personal data can be stored in these fields. These are called as descriptive flexfields in Oracle. We created personal DFF to store data about Years of Industry Experience and whether an employee is Oracle Certified or not. This data can be stored in the People form DFF as marked in the following screenshot: When you click on the box, it will open the new form as shown in the following screenshot. Here, you can enter the additional data. This is called Additional Personal Details DFF. It is stored in personal data; this is normally referred to as the People form DFF. We have created a Special Information Types (SIT) to store information on languages known by an employee. This data will have two attributes, namely, the language known and the fluency. This can be entered by navigating to US HRMS Manager | People | Enter and Maintain | Special Info. Click on the Details section. This will open a new form to enter the required details. Each record in the SIT is date-tracked. You can enter the start date and the end date. Thus, we have seen DFF in which you stored additional person data and we have seen KFF, where you enter the SIT data. Summary In this article, you have learned about creating a new employee, entering employee data, and additional data using DFF and KFF. You also learned the concept of person type. Resources for Article: Further resources on this subject: Knowing the prebuilt marketing, sales, and service organizations [article] Oracle E-Business Suite with Desktop Integration [article] Oracle Integration and Consolidation Products [article]
Read more
  • 0
  • 0
  • 7421

article-image-using-rest-api-unity-part-1-what-rest-and-basic-queries
Denny and
24 Jun 2015
6 min read
Save for later

Using a REST API with Unity Part 1

Denny and
24 Jun 2015
6 min read
Introduction While developing a game, there a number of reasons why you would want to connect to a server. Downloading new assets, such as models, or collecting data from an external source is one reason. Downloading bundle assets can be done through your own server, which allows your game to connect to a server and download the most recent versions of bundle assets. Suppose your game also allowed users to see if that item was available at Amazon, and for what price? If you had the sku number available, you could connect to Amazon's API, and check the price and availability of that item. The most common way to connect to external API's these days, is through a RESTful API. What is a REST API A RESTful api is a common approach to creating scalable web services. It provides users with endpoints to collect and create data using the same HTTP calls to collect web pages (GET, POST, PUT, DELETE). For example, a url like www.fake.com/users could return a JSON of User data. Of course, there is often more security involved with these calls, but this is a good starting point. Once you begin understanding REST API's, it becomes very second nature to query them. Before doing anything in code, you can try a query! Go to the browser and go to the url: http://jsonplaceholder.typicode.com/posts. You should be returned a JSON of some post data. You can see REST endpoints in action already. Your browser posted a GET request to the /posts endpoint, which returns all the posts. What if we want just a specific post? The standard way to do this is to add the id of the post next. Like this: http://jsonplaceholder.typicode.com/posts/1. You should get just a single post this time. Great! Often when building Unity scripts to connect to a REST endpoint, we'll frequently use this site to test on, before we move to the actual REST endpoints I want! Setting up your own server Setting up our own server is a bit out of the scope of this article. In previous projects, we've used a framework like Sails JS to create a Node Server, with REST endpoints. Parsing JSON in Unity Basic REST One of the worst parts of querying REST data is the parsing in Unity. Compared to parsing JSON on the web, Unity's parsing can feel tricky. The primary tool we use to make life a little easier is called SimpleJSON. It allows us to create JSON objects in C#, which we can use to build or read JSON data, and then manipulate them to our need. We won't be going into detail on how to use SimpleJSON, as much as just using the data retrieved from it. For further reading, we recommend looking at their documentation. Just to note though, SimpleJSON does not allow for parsing of GameObjects and such in Unity, instead it deals with only more primitive attributes like strings and floats. For example, let's assume we wanted to upload a list of products to a server from our Unity project, without the game running (in editor). Assuming we collected the data from our game and its currently residing in a JSON file, let's see the code on how we can upload this data to the server from Unity. string productlist = File.ReadAllText(Application.dataPath + "/Resources/AssetBundles/" + "AssetBundleInfo.json"); UploadProductListJSON(productList); static void UploadProductListJSON(string data) { Debug.Log (data); WWWForm form = new WWWForm(); form.AddField("productlist", data); WWW www = new WWW("localhost:1337/product/addList", form); } So we pass the collected data to a function that will create a new form, add the data to that form and then use the WWW variable to upload our form to the server. This will use the POST request to add new data. We normally don't want to create a different end point to add data, such as /addList. We could have added data one at a time, and used the standard REST endpoint (/product). This would likely be the cleaner solution, but for the sake of simplicity, we've added an endpoint that accepts a list of data. Building REST Factories for In Game REST Calls Rather than having random scripts containing API calls, we recommend following the standard web procedure and building REST factories. Scripts with the sole purpose of querying rest endpoints. When contacting a server from in game, the standard approach is to use a coroutine, as to not lock your game on the thread. Let's take a look at the standard DB factory we use. private string results; public String Results { get { return results; } } public WWW GET(string url, System.Action onComplete ) { WWW www = new WWW (url); StartCoroutine (WaitForRequest (www, onComplete)); return www; } public WWW POST(string url, Dictionary<string,string> post, System.Action onComplete) { WWWForm form = new WWWForm(); foreach(KeyValuePair<string,string> post_arg in post) { form.AddField(post_arg.Key, post_arg.Value); } WWW www = new WWW(url, form); StartCoroutine(WaitForRequest(www, onComplete)); return www; } private IEnumerator WaitForRequest(WWW www, System.Action onComplete) { yield return www; // check for errors if (www.error == null) { results = www.text; onComplete(); } else { Debug.Log (www.error); } } The url data here would be something like our example above: http://jsonplaceholder.typicode.com/posts. The System.Action OnComplete is a callback to be called once the action is complete. This will normally be some method that requires the downloaded data. In both our GET and POST methods, we will connect to a passed URL, and then pass our www objects to a co-routine. This will allow our game to continue while the queries are being resolved in the WaitForRequest method. This method will either collect the result, and call any callbacks, or it will log the error for us. Conclusion This just touches the basics of building a game that allows connecting and usage of REST endpoints. In later editions, we can talk about building a thorough, modular system to connect to REST endpoints, extracting meaningful data from your queries using simple JSON, user authentication, and how to build a manager system to handle multiple REST calls. About the Authors Denny is a Mobile Application Developer at Canadian Tire Development Operations. While working, Denny regularly uses Unity to create in-store experiences, but also works on other technologies like Famous, Phaser.IO, LibGDX, and CreateJS when creating game-like apps. He also enjoys making non-game mobile apps, but who cares about that, am I right? Travis is a Software Engineer, living in the bitter region of Winnipeg, Canada. His work and hobbies include Game Development with Unity or Phaser.IO, as well as Mobile App Development. He can enjoy a good video game or two, but only if he knows he'll win!
Read more
  • 0
  • 1
  • 23739
article-image-tuning-server-performance-memory-management-and-swap
Packt
24 Jun 2015
7 min read
Save for later

Tuning server performance with memory management and swap

Packt
24 Jun 2015
7 min read
In this article, by Jonathan Hobson, the author of Troubleshooting CentOS, we will learn about memory management, swap, and swappiness. (For more resources related to this topic, see here.) A deeper understanding of the underlying active processes in CentOS 7 is an essential skill for any troubleshooter. From high load averages to slow response times, system overloads to dead and dying processes, there comes a time when every server may start to feel sluggish, act impoverished, or fail to respond, and as a consequence, it will require your immediate attention. Regardless of how you look at it, the question of memory usage remains critical to the life cycle of a system, and whether you are maintaining system health or troubleshooting a particular service or application, you will always need to remember that the use of memory is a critical resource to your system. For this reason, we will begin by calling the free command in the following way: # free -m The main elements of the preceding command will look similar to this:          Total   used   free   shared   buffers   cached Mem:     1837     274   1563         8         0       108 -/+ buffers/cache: 164   1673 Swap:     2063       0   2063 In the example shown, I have used the -m option to ensure that the output is formatted in megabytes. This makes it easier to read, but for the sake of troubleshooting, rather than trying to understand every numeric value shown, let's reduce the scope of the original output to highlight the relevant area of concern: -/+ buffers/cache: 164   1673 The importance of this line is based on the fact that it accounts for the associated buffers and caches to illustrate what memory is currently being used and what is held in reserve. Where the first value indicates how much memory is being used, the second value tells us how much memory is available to our applications. In the example shown, this instance translates into 164 MB of used memory and 1673 MB of available memory. Bearing this in mind, let me draw your attention to the final line in order that we can examine the importance of swap: Swap:     2063       0   2063 Swapping typically occurs when memory usage is impacting performance. As we can see from the preceding example, the first value tells us that there is a total amount of system swap set at 2063 MB, with the second value indicating how much swap is being used (0 MB), while the third value shows the amount of swap that is still available to the system as a whole (2063 MB). So yes, based on the example data shown here, we can conclude that this is a healthy system, and no swap is being used, but while we are here, let's use this time to discover more about the swap space on your system. To begin, we will revisit the contents of the proc directory and reveal the total and used swap size by typing the following command: # cat /proc/swaps Assuming that you understand the output shown, you should then investigate the level of swappiness used by your system with the following command: # cat /proc/sys/vm/swappiness Having done this, you will now see a numeric value between the ranges of 0-100. The numeric value is a percentage and it implies that, if your system has a value of 30, for example, it will begin to use swap memory at 70 percent occupation of RAM. The default for all Linux systems is usually set with a notional value between 30 to 60, but you can use either of the following commands to temporarily change and modify the swappiness of your system. This can be achieved by replacing the value of X with a numeric value from 1-100 by typing: # echo X > /proc/sys/vm/swappiness Or more specifically, this can also be achieved with: # sysctl -w vm.swappiness=X If you change your mind at any point, then you have two options in order to ensure that no permanent changes have been made. You can either repeat one of the preceding two commands and return the original values, or issue a full system reboot. On the other hand, if you want to make the change persist, then you should edit the /etc/sysctl.conf file and add your swappiness preferences in the following way: vm.swappiness=X When complete, simply save and close the file to ensure that the changes take effect. The level of swappiness controls the tendency of the kernel to move a process out of the physical RAM on to a swap disk. This is memory management at work, but it is important to realize that swapping will not occur immediately, as the level of swappiness is actually expressed as a percentage value. For this reason, the process of swapping should be viewed more as a measurement of preference when using the cache, and as every administrator will know, there is an option for you to clear the swap by using the commands swapoff -a and swapon -a to achieve the desired result. The golden rule is to realize that a system displaying a level of swappiness close to the maximum value (100) will prefer to begin swapping inactive pages. This is because a value of 100 is a representative of 0 percent occupation of RAM. By comparison, the closer your system is to the lowest value (0), the less likely swapping is to occur as 0 is representative of 100 percent occupation of RAM. Generally speaking, we would all probably agree that systems with a very large pool of RAM would not benefit from aggressive swapping. However, and just to confuse things further, let's look at it in a different way. We all know that a desktop computer will benefit from a low swappiness value, but in certain situations, you may also find that a system with a large pool of RAM (running batch jobs) may also benefit from a moderate to aggressive swap in a fashion similar to a system that attempts to do a lot but only uses small amounts of RAM. So, in reality, there are no hard and fast rules; the use of swap should be based on the needs of the system in question rather than looking for a single solution that can be applied across the board. Taking this further, special care and consideration should be taken while making changes to the swapping values as RAM that is not used by an application is used as disk cache. In this situation, by decreasing swappiness, you are actually increasing the chance of that application not being swapped-out, and you are thereby decreasing the overall size of the disk cache. This can make disk access slower. However, if you do increase the preference to swap, then because hard disks are slower than memory modules, it can lead to a slower response time across the overall system. Swapping can be confusing, but by knowing this, we can also appreciate the hidden irony of swappiness. As Newton's third law of motion states, for every action, there is an equal and opposite reaction, and finding the optimum swappiness value may require some additional experimentation. Summary In this article, we learned some basic yet vital commands that help us gauge and maintain server performance with the help of swapiness. Resources for Article: Further resources on this subject: Installing CentOS [article] Managing public and private groups [article] Installing PostgreSQL [article]
Read more
  • 0
  • 0
  • 5030

article-image-moving-further-numpy-modules
Packt
23 Jun 2015
23 min read
Save for later

Moving Further with NumPy Modules

Packt
23 Jun 2015
23 min read
NumPy has a number of modules inherited from its predecessor, Numeric. Some of these packages have a SciPy counterpart, which may have fuller functionality. In this article by Ivan Idris author of the book NumPy: Beginner's Guide - Third Edition we will cover the following topics: The linalg package The fft package Random numbers Continuous and discrete distributions (For more resources related to this topic, see here.) Linear algebra Linear algebra is an important branch of mathematics. The numpy.linalg package contains linear algebra functions. With this module, you can invert matrices, calculate eigenvalues, solve linear equations, and determine determinants, among other things (see http://docs.scipy.org/doc/numpy/reference/routines.linalg.html). Time for action – inverting matrices The inverse of a matrix A in linear algebra is the matrix A-1, which, when multiplied with the original matrix, is equal to the identity matrix I. This can be written as follows: A A-1 = I The inv() function in the numpy.linalg package can invert an example matrix with the following steps: Create the example matrix with the mat() function: A = np.mat("0 1 2;1 0 3;4 -3 8") print("An", A) The A matrix appears as follows: A [[ 0 1 2] [ 1 0 3] [ 4 -3 8]] Invert the matrix with the inv() function: inverse = np.linalg.inv(A) print("inverse of An", inverse) The inverse matrix appears as follows: inverse of A [[-4.5 7. -1.5] [-2.   4. -1. ] [ 1.5 -2.   0.5]] If the matrix is singular, or not square, a LinAlgError is raised. If you want, you can check the result manually with a pen and paper. This is left as an exercise for the reader. Check the result by multiplying the original matrix with the result of the inv() function: print("Checkn", A * inverse) The result is the identity matrix, as expected: Check [[ 1. 0. 0.] [ 0. 1. 0.] [ 0. 0. 1.]] What just happened? We calculated the inverse of a matrix with the inv() function of the numpy.linalg package. We checked, with matrix multiplication, whether this is indeed the inverse matrix (see inversion.py): from __future__ import print_function import numpy as np   A = np.mat("0 1 2;1 0 3;4 -3 8") print("An", A)   inverse = np.linalg.inv(A) print("inverse of An", inverse)   print("Checkn", A * inverse) Pop quiz – creating a matrix Q1. Which function can create matrices? array create_matrix mat vector Have a go hero – inverting your own matrix Create your own matrix and invert it. The inverse is only defined for square matrices. The matrix must be square and invertible; otherwise, a LinAlgError exception is raised. Solving linear systems A matrix transforms a vector into another vector in a linear way. This transformation mathematically corresponds to a system of linear equations. The numpy.linalg function solve() solves systems of linear equations of the form Ax = b, where A is a matrix, b can be a one-dimensional or two-dimensional array, and x is an unknown variable. We will see the dot() function in action. This function returns the dot product of two floating-point arrays. The dot() function calculates the dot product (see https://www.khanacademy.org/math/linear-algebra/vectors_and_spaces/dot_cross_products/v/vector-dot-product-and-vector-length). For a matrix A and vector b, the dot product is equal to the following sum: Time for action – solving a linear system Solve an example of a linear system with the following steps: Create A and b: A = np.mat("1 -2 1;0 2 -8;-4 5 9") print("An", A) b = np.array([0, 8, -9]) print("bn", b) A and b appear as follows: Solve this linear system with the solve() function: x = np.linalg.solve(A, b) print("Solution", x) The solution of the linear system is as follows: Solution [ 29. 16.   3.] Check whether the solution is correct with the dot() function: print("Checkn", np.dot(A , x)) The result is as expected: Check [[ 0. 8. -9.]] What just happened? We solved a linear system using the solve() function from the NumPy linalg module and checked the solution with the dot() function: from __future__ import print_function import numpy as np   A = np.mat("1 -2 1;0 2 -8;-4 5 9") print("An", A)   b = np.array([0, 8, -9]) print("bn", b)   x = np.linalg.solve(A, b) print("Solution", x)   print("Checkn", np.dot(A , x)) Finding eigenvalues and eigenvectors Eigenvalues are scalar solutions to the equation Ax = ax, where A is a two-dimensional matrix and x is a one-dimensional vector. Eigenvectors are vectors corresponding to eigenvalues (see https://www.khanacademy.org/math/linear-algebra/alternate_bases/eigen_everything/v/linear-algebra-introduction-to-eigenvalues-and-eigenvectors). The eigvals() function in the numpy.linalg package calculates eigenvalues. The eig() function returns a tuple containing eigenvalues and eigenvectors. Time for action – determining eigenvalues and eigenvectors Let's calculate the eigenvalues of a matrix: Create a matrix as shown in the following: A = np.mat("3 -2;1 0") print("An", A) The matrix we created looks like the following: A [[ 3 -2] [ 1 0]] Call the eigvals() function: print("Eigenvalues", np.linalg.eigvals(A)) The eigenvalues of the matrix are as follows: Eigenvalues [ 2. 1.] Determine eigenvalues and eigenvectors with the eig() function. This function returns a tuple, where the first element contains eigenvalues and the second element contains corresponding eigenvectors, arranged column-wise: eigenvalues, eigenvectors = np.linalg.eig(A) print("First tuple of eig", eigenvalues) print("Second tuple of eign", eigenvectors) The eigenvalues and eigenvectors appear as follows: First tuple of eig [ 2. 1.] Second tuple of eig [[ 0.89442719 0.70710678] [ 0.4472136   0.70710678]] Check the result with the dot() function by calculating the right and left side of the eigenvalues equation Ax = ax: for i, eigenvalue in enumerate(eigenvalues):      print("Left", np.dot(A, eigenvectors[:,i]))      print("Right", eigenvalue * eigenvectors[:,i])      print() The output is as follows: Left [[ 1.78885438] [ 0.89442719]] Right [[ 1.78885438] [ 0.89442719]] What just happened? We found the eigenvalues and eigenvectors of a matrix with the eigvals() and eig() functions of the numpy.linalg module. We checked the result using the dot() function (see eigenvalues.py): from __future__ import print_function import numpy as np   A = np.mat("3 -2;1 0") print("An", A)   print("Eigenvalues", np.linalg.eigvals(A) )   eigenvalues, eigenvectors = np.linalg.eig(A) print("First tuple of eig", eigenvalues) print("Second tuple of eign", eigenvectors)   for i, eigenvalue in enumerate(eigenvalues):      print("Left", np.dot(A, eigenvectors[:,i]))      print("Right", eigenvalue * eigenvectors[:,i])      print() Singular value decomposition Singular value decomposition (SVD) is a type of factorization that decomposes a matrix into a product of three matrices. The SVD is a generalization of the previously discussed eigenvalue decomposition. SVD is very useful for algorithms such as the pseudo inverse, which we will discuss in the next section. The svd() function in the numpy.linalg package can perform this decomposition. This function returns three matrices U, ?, and V such that U and V are unitary and ? contains the singular values of the input matrix: The asterisk denotes the Hermitian conjugate or the conjugate transpose. The complex conjugate changes the sign of the imaginary part of a complex number and is therefore not relevant for real numbers. A complex square matrix A is unitary if A*A = AA* = I (the identity matrix). We can interpret SVD as a sequence of three operations—rotation, scaling, and another rotation. We already transposed matrices in this article. The transpose flips matrices, turning rows into columns, and columns into rows. Time for action – decomposing a matrix It's time to decompose a matrix with the SVD using the following steps: First, create a matrix as shown in the following: A = np.mat("4 11 14;8 7 -2") print("An", A) The matrix we created looks like the following: A [[ 4 11 14] [ 8 7 -2]] Decompose the matrix with the svd() function: U, Sigma, V = np.linalg.svd(A, full_matrices=False) print("U") print(U) print("Sigma") print(Sigma) print("V") print(V) Because of the full_matrices=False specification, NumPy performs a reduced SVD decomposition, which is faster to compute. The result is a tuple containing the two unitary matrices U and V on the left and right, respectively, and the singular values of the middle matrix: U [[-0.9486833 -0.31622777]   [-0.31622777 0.9486833 ]] Sigma [ 18.97366596   9.48683298] V [[-0.33333333 -0.66666667 -0.66666667] [ 0.66666667 0.33333333 -0.66666667]] We do not actually have the middle matrix—we only have the diagonal values. The other values are all 0. Form the middle matrix with the diag() function. Multiply the three matrices as follows: print("Productn", U * np.diag(Sigma) * V) The product of the three matrices is equal to the matrix we created in the first step: Product [[ 4. 11. 14.] [ 8.   7. -2.]] What just happened? We decomposed a matrix and checked the result by matrix multiplication. We used the svd() function from the NumPy linalg module (see decomposition.py): from __future__ import print_function import numpy as np   A = np.mat("4 11 14;8 7 -2") print("An", A)   U, Sigma, V = np.linalg.svd(A, full_matrices=False)   print("U") print(U)   print("Sigma") print(Sigma)   print("V") print(V)   print("Productn", U * np.diag(Sigma) * V) Pseudo inverse The Moore-Penrose pseudo inverse of a matrix can be computed with the pinv() function of the numpy.linalg module (see http://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_pseudoinverse). The pseudo inverse is calculated using the SVD (see previous example). The inv() function only accepts square matrices; the pinv() function does not have this restriction and is therefore considered a generalization of the inverse. Time for action – computing the pseudo inverse of a matrix Let's compute the pseudo inverse of a matrix: First, create a matrix: A = np.mat("4 11 14;8 7 -2") print("An", A) The matrix we created looks like the following: A [[ 4 11 14] [ 8 7 -2]] Calculate the pseudo inverse matrix with the pinv() function: pseudoinv = np.linalg.pinv(A) print("Pseudo inversen", pseudoinv) The pseudo inverse result is as follows: Pseudo inverse [[-0.00555556 0.07222222] [ 0.02222222 0.04444444] [ 0.05555556 -0.05555556]] Multiply the original and pseudo inverse matrices: print("Check", A * pseudoinv) What we get is not an identity matrix, but it comes close to it: Check [[ 1.00000000e+00   0.00000000e+00] [ 8.32667268e-17   1.00000000e+00]] What just happened? We computed the pseudo inverse of a matrix with the pinv() function of the numpy.linalg module. The check by matrix multiplication resulted in a matrix that is approximately an identity matrix (see pseudoinversion.py): from __future__ import print_function import numpy as np   A = np.mat("4 11 14;8 7 -2") print("An", A)   pseudoinv = np.linalg.pinv(A) print("Pseudo inversen", pseudoinv)   print("Check", A * pseudoinv) Determinants The determinant is a value associated with a square matrix. It is used throughout mathematics; for more details, please refer to http://en.wikipedia.org/wiki/Determinant. For a n x n real value matrix, the determinant corresponds to the scaling a n-dimensional volume undergoes when transformed by the matrix. The positive sign of the determinant means the volume preserves its orientation (clockwise or anticlockwise), while a negative sign means reversed orientation. The numpy.linalg module has a det() function that returns the determinant of a matrix. Time for action – calculating the determinant of a matrix To calculate the determinant of a matrix, follow these steps: Create the matrix: A = np.mat("3 4;5 6") print("An", A) The matrix we created appears as follows: A [[ 3. 4.] [ 5. 6.]] Compute the determinant with the det() function: print("Determinant", np.linalg.det(A)) The determinant appears as follows: Determinant -2.0 What just happened? We calculated the determinant of a matrix with the det() function from the numpy.linalg module (see determinant.py): from __future__ import print_function import numpy as np   A = np.mat("3 4;5 6") print("An", A)   print("Determinant", np.linalg.det(A)) Fast Fourier transform The Fast Fourier transform (FFT) is an efficient algorithm to calculate the discrete Fourier transform (DFT). The Fourier series represents a signal as a sum of sine and cosine terms. FFT improves on more naïve algorithms and is of order O(N log N). DFT has applications in signal processing, image processing, solving partial differential equations, and more. NumPy has a module called fft that offers FFT functionality. Many functions in this module are paired; for those functions, another function does the inverse operation. For instance, the fft() and ifft() function form such a pair. Time for action – calculating the Fourier transform First, we will create a signal to transform. Calculate the Fourier transform with the following steps: Create a cosine wave with 30 points as follows: x = np.linspace(0, 2 * np.pi, 30) wave = np.cos(x) Transform the cosine wave with the fft() function: transformed = np.fft.fft(wave) Apply the inverse transform with the ifft() function. It should approximately return the original signal. Check with the following line: print(np.all(np.abs(np.fft.ifft(transformed) - wave)   < 10 ** -9)) The result appears as follows: True Plot the transformed signal with matplotlib: plt.plot(transformed) plt.title('Transformed cosine') plt.xlabel('Frequency') plt.ylabel('Amplitude') plt.grid() plt.show() The following resulting diagram shows the FFT result: What just happened? We applied the fft() function to a cosine wave. After applying the ifft() function, we got our signal back (see fourier.py): from __future__ import print_function import numpy as np import matplotlib.pyplot as plt     x = np.linspace(0, 2 * np.pi, 30) wave = np.cos(x) transformed = np.fft.fft(wave) print(np.all(np.abs(np.fft.ifft(transformed) - wave) < 10 ** -9))   plt.plot(transformed) plt.title('Transformed cosine') plt.xlabel('Frequency') plt.ylabel('Amplitude') plt.grid() plt.show() Shifting The fftshift() function of the numpy.linalg module shifts zero-frequency components to the center of a spectrum. The zero-frequency component corresponds to the mean of the signal. The ifftshift() function reverses this operation. Time for action – shifting frequencies We will create a signal, transform it, and then shift the signal. Shift the frequencies with the following steps: Create a cosine wave with 30 points: x = np.linspace(0, 2 * np.pi, 30) wave = np.cos(x) Transform the cosine wave with the fft() function: transformed = np.fft.fft(wave) Shift the signal with the fftshift() function: shifted = np.fft.fftshift(transformed) Reverse the shift with the ifftshift() function. This should undo the shift. Check with the following code snippet: print(np.all((np.fft.ifftshift(shifted) - transformed)   < 10 ** -9)) The result appears as follows: True Plot the signal and transform it with matplotlib: plt.plot(transformed, lw=2, label="Transformed") plt.plot(shifted, '--', lw=3, label="Shifted") plt.title('Shifted and transformed cosine wave') plt.xlabel('Frequency') plt.ylabel('Amplitude') plt.grid() plt.legend(loc='best') plt.show() The following diagram shows the effect of the shift and the FFT: What just happened? We applied the fftshift() function to a cosine wave. After applying the ifftshift() function, we got our signal back (see fouriershift.py): import numpy as np import matplotlib.pyplot as plt     x = np.linspace(0, 2 * np.pi, 30) wave = np.cos(x) transformed = np.fft.fft(wave) shifted = np.fft.fftshift(transformed) print(np.all(np.abs(np.fft.ifftshift(shifted) - transformed) < 10 ** -9))   plt.plot(transformed, lw=2, label="Transformed") plt.plot(shifted, '--', lw=3, label="Shifted") plt.title('Shifted and transformed cosine wave') plt.xlabel('Frequency') plt.ylabel('Amplitude') plt.grid() plt.legend(loc='best') plt.show() Random numbers Random numbers are used in Monte Carlo methods, stochastic calculus, and more. Real random numbers are hard to generate, so, in practice, we use pseudo random numbers, which are random enough for most intents and purposes, except for some very special cases. These numbers appear random, but if you analyze them more closely, you will realize that they follow a certain pattern. The random numbers-related functions are in the NumPy random module. The core random number generator is based on the Mersenne Twister algorithm—a standard and well-known algorithm (see https://en.wikipedia.org/wiki/Mersenne_Twister). We can generate random numbers from discrete or continuous distributions. The distribution functions have an optional size parameter, which tells NumPy how many numbers to generate. You can specify either an integer or a tuple as size. This will result in an array filled with random numbers of appropriate shape. Discrete distributions include the geometric, hypergeometric, and binomial distributions. Time for action – gambling with the binomial The binomial distribution models the number of successes in an integer number of independent trials of an experiment, where the probability of success in each experiment is a fixed number (see https://www.khanacademy.org/math/probability/random-variables-topic/binomial_distribution). Imagine a 17th century gambling house where you can bet on flipping pieces of eight. Nine coins are flipped. If less than five are heads, then you lose one piece of eight, otherwise you win one. Let's simulate this, starting with 1,000 coins in our possession. Use the binomial() function from the random module for that purpose. To understand the binomial() function, look at the following section: Initialize an array, which represents the cash balance, to zeros. Call the binomial() function with a size of 10000. This represents 10,000 coin flips in our casino: cash = np.zeros(10000) cash[0] = 1000 outcome = np.random.binomial(9, 0.5, size=len(cash)) Go through the outcomes of the coin flips and update the cash array. Print the minimum and maximum of the outcome, just to make sure we don't have any strange outliers: for i in range(1, len(cash)):    if outcome[i] < 5:      cash[i] = cash[i - 1] - 1    elif outcome[i] < 10:      cash[i] = cash[i - 1] + 1    else:      raise AssertionError("Unexpected outcome " + outcome)   print(outcome.min(), outcome.max()) As expected, the values are between 0 and 9. In the following diagram, you can see the cash balance performing a random walk: What just happened? We did a random walk experiment using the binomial() function from the NumPy random module (see headortail.py): from __future__ import print_function import numpy as np import matplotlib.pyplot as plt     cash = np.zeros(10000) cash[0] = 1000 np.random.seed(73) outcome = np.random.binomial(9, 0.5, size=len(cash))   for i in range(1, len(cash)):    if outcome[i] < 5:      cash[i] = cash[i - 1] - 1    elif outcome[i] < 10:      cash[i] = cash[i - 1] + 1    else:      raise AssertionError("Unexpected outcome " + outcome)   print(outcome.min(), outcome.max())   plt.plot(np.arange(len(cash)), cash) plt.title('Binomial simulation') plt.xlabel('# Bets') plt.ylabel('Cash') plt.grid() plt.show() Hypergeometric distribution The hypergeometricdistribution models a jar with two types of objects in it. The model tells us how many objects of one type we can get if we take a specified number of items out of the jar without replacing them (see https://en.wikipedia.org/wiki/Hypergeometric_distribution). The NumPy random module has a hypergeometric() function that simulates this situation. Time for action – simulating a game show Imagine a game show where every time the contestants answer a question correctly, they get to pull three balls from a jar and then put them back. Now, there is a catch, one ball in the jar is bad. Every time it is pulled out, the contestants lose six points. If, however, they manage to get out 3 of the 25 normal balls, they get one point. So, what is going to happen if we have 100 questions in total? Look at the following section for the solution: Initialize the outcome of the game with the hypergeometric() function. The first parameter of this function is the number of ways to make a good selection, the second parameter is the number of ways to make a bad selection, and the third parameter is the number of items sampled: points = np.zeros(100) outcomes = np.random.hypergeometric(25, 1, 3, size=len(points)) Set the scores based on the outcomes from the previous step: for i in range(len(points)):    if outcomes[i] == 3:      points[i] = points[i - 1] + 1    elif outcomes[i] == 2:      points[i] = points[i - 1] - 6    else:     print(outcomes[i]) The following diagram shows how the scoring evolved: What just happened? We simulated a game show using the hypergeometric() function from the NumPy random module. The game scoring depends on how many good and how many bad balls the contestants pulled out of a jar in each session (see urn.py): from __future__ import print_function import numpy as np import matplotlib.pyplot as plt     points = np.zeros(100) np.random.seed(16) outcomes = np.random.hypergeometric(25, 1, 3, size=len(points))   for i in range(len(points)):    if outcomes[i] == 3:      points[i] = points[i - 1] + 1    elif outcomes[i] == 2:      points[i] = points[i - 1] - 6    else:      print(outcomes[i])   plt.plot(np.arange(len(points)), points) plt.title('Game show simulation') plt.xlabel('# Rounds') plt.ylabel('Score') plt.grid() plt.show() Continuous distributions We usually model continuous distributions with probability density functions (PDF). The probability that a value is in a certain interval is determined by integration of the PDF (see https://www.khanacademy.org/math/probability/random-variables-topic/random_variables_prob_dist/v/probability-density-functions). The NumPy random module has functions that represent continuous distributions—beta(), chisquare(), exponential(), f(), gamma(), gumbel(), laplace(), lognormal(), logistic(), multivariate_normal(), noncentral_chisquare(), noncentral_f(), normal(), and others. Time for action – drawing a normal distribution We can generate random numbers from a normal distribution and visualize their distribution with a histogram (see https://www.khanacademy.org/math/probability/statistics-inferential/normal_distribution/v/introduction-to-the-normal-distribution). Draw a normal distribution with the following steps: Generate random numbers for a given sample size using the normal() function from the random NumPy module: N=10000 normal_values = np.random.normal(size=N) Draw the histogram and theoretical PDF with a center value of 0 and standard deviation of 1. Use matplotlib for this purpose: _, bins, _ = plt.hist(normal_values,   np.sqrt(N), normed=True, lw=1) sigma = 1 mu = 0 plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi))   * np.exp( - (bins - mu)**2 / (2 * sigma**2) ),lw=2) plt.show() In the following diagram, we see the familiar bell curve: What just happened? We visualized the normal distribution using the normal() function from the random NumPy module. We did this by drawing the bell curve and a histogram of randomly generated values (see normaldist.py): import numpy as np import matplotlib.pyplot as plt   N=10000   np.random.seed(27) normal_values = np.random.normal(size=N) _, bins, _ = plt.hist(normal_values, np.sqrt(N), normed=True, lw=1, label="Histogram") sigma = 1 mu = 0 plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * np.exp( - (bins - mu)**2 / (2 * sigma**2) ), '--', lw=3, label="PDF") plt.title('Normal distribution') plt.xlabel('Value') plt.ylabel('Normalized Frequency') plt.grid() plt.legend(loc='best') plt.show() Lognormal distribution A lognormal distribution is a distribution of a random variable whose natural logarithm is normally distributed. The lognormal() function of the random NumPy module models this distribution. Time for action – drawing the lognormal distribution Let's visualize the lognormal distribution and its PDF with a histogram: Generate random numbers using the normal() function from the random NumPy module: N=10000 lognormal_values = np.random.lognormal(size=N) Draw the histogram and theoretical PDF with a center value of 0 and standard deviation of 1: _, bins, _ = plt.hist(lognormal_values,   np.sqrt(N), normed=True, lw=1) sigma = 1 mu = 0 x = np.linspace(min(bins), max(bins), len(bins)) pdf = np.exp(-(numpy.log(x) - mu)**2 / (2 * sigma**2))/ (x *   sigma * np.sqrt(2 * np.pi)) plt.plot(x, pdf,lw=3) plt.show() The fit of the histogram and theoretical PDF is excellent, as you can see in the following diagram: What just happened? We visualized the lognormal distribution using the lognormal() function from the random NumPy module. We did this by drawing the curve of the theoretical PDF and a histogram of randomly generated values (see lognormaldist.py): import numpy as np import matplotlib.pyplot as plt   N=10000 np.random.seed(34) lognormal_values = np.random.lognormal(size=N) _, bins, _ = plt.hist(lognormal_values,   np.sqrt(N), normed=True, lw=1, label="Histogram") sigma = 1 mu = 0 x = np.linspace(min(bins), max(bins), len(bins)) pdf = np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2))/ (x * sigma * np.sqrt(2 * np.pi)) plt.xlim([0, 15]) plt.plot(x, pdf,'--', lw=3, label="PDF") plt.title('Lognormal distribution') plt.xlabel('Value') plt.ylabel('Normalized frequency') plt.grid() plt.legend(loc='best') plt.show() Bootstrapping in statistics Bootstrapping is a method used to estimate variance, accuracy, and other metrics of sample estimates, such as the arithmetic mean. The simplest bootstrapping procedure consists of the following steps: Generate a large number of samples from the original data sample having the same size N. You can think of the original data as a jar containing numbers. We create the new samples by N times randomly picking a number from the jar. Each time we return the number into the jar, so a number can occur multiple times in a generated sample. With the new samples, we calculate the statistical estimate under investigation for each sample (for example, the arithmetic mean). This gives us a sample of possible values for the estimator. Time for action – sampling with numpy.random.choice() We will use the numpy.random.choice() function to perform bootstrapping. Start the IPython or Python shell and import NumPy: $ ipython In [1]: import numpy as np Generate a data sample following the normal distribution: In [2]: N = 500   In [3]: np.random.seed(52)   In [4]: data = np.random.normal(size=N)   Calculate the mean of the data: In [5]: data.mean() Out[5]: 0.07253250605445645 Generate 100 samples from the original data and calculate their means (of course, more samples may lead to a more accurate result): In [6]: bootstrapped = np.random.choice(data, size=(N, 100))   In [7]: means = bootstrapped.mean(axis=0)   In [8]: means.shape Out[8]: (100,) Calculate the mean, variance, and standard deviation of the arithmetic means we obtained: In [9]: means.mean() Out[9]: 0.067866373318115278   In [10]: means.var() Out[10]: 0.001762807104774598   In [11]: means.std() Out[11]: 0.041985796464692651 If we are assuming a normal distribution for the means, it may be relevant to know the z-score, which is defined as follows: In [12]: (data.mean() - means.mean())/means.std() Out[12]: 0.11113598238549766 From the z-score value, we get an idea of how probable the actual mean is. What just happened? We bootstrapped a data sample by generating samples and calculating the means of each sample. Then we computed the mean, standard deviation, variance, and z-score of the means. We used the numpy.random.choice() function for bootstrapping. Summary You learned a lot in this article about NumPy modules. We covered linear algebra, the Fast Fourier transform, continuous and discrete distributions, and random numbers. Resources for Article: Further resources on this subject: SciPy for Signal Processing [article] Visualization [article] The plot function [article]
Read more
  • 0
  • 0
  • 4499
Modal Close icon
Modal Close icon