Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7014 Articles
article-image-prepare-for-2017-with-mapt
Packt
21 Oct 2016
2 min read
Save for later

Prepare for our 2017 Awards with Mapt

Packt
21 Oct 2016
2 min read
At Packt, we're committed to supporting developers to learn the skills they need to remain relevant in their field. But what exactly does relevant mean? To us, relevance is about the impact you have. And we believe that software should have always have an impact, whether it's for a business, for customers - whoever it is, it's ultimately about making a difference. We want to reward developers who make an impact. Whether you're a web developer who's creating awesome applications and websites that are engaging users every single day, or even a data analyst who has used Machine Learning to uncover revealing insights about healthcare or the environment, we're going to want to hear from you. We don't want to give too much away right now, but we're confident that you're going to be interested in our award... So, to prepare yourself for our awards, get started on Mapt and find your route through some of the most important skills in software today. What are you waiting for? We're sponsoring seats on Mapt for limited prices this week. That means you'll be able to get a subscription for a special discounted price - but be quick, each discount is time limited! Subscribe here.
Read more
  • 0
  • 0
  • 1783

article-image-resolving-deadlock-hbase
Ted Yu
20 Oct 2016
4 min read
Save for later

Resolving Deadlock in HBase

Ted Yu
20 Oct 2016
4 min read
In this post, I will walk you through how to resolve a tricky deadlock scenario when using HBase. To get a better idea of the details for this scenario, take a look at the following JIRA. This tricky scenario relates to HBASE-13651, which tried to handle the case where one region server removes the compacted hfiles, leading to FileNotFoundExceptions on another machine. Unlike the deadlocks that I have resolved in the past, this deadlock rarely happens, but it occurs when one thread tries to obtain a write lock, while the other thread holds a read lock of the same ReentrantReadWriteLock. Understanding the Scenario To fully understand this scenario, let's go ahead and take a look at a concrete example.  For handler 12, HRegion#refreshStoreFiles() obtains a lock on writestate (line 4919).And then it tries to get the write lock of updatesLock (a ReentrantReadWriteLock) in dropMemstoreContentsForSeqId(): "B.defaultRpcServer.handler=12,queue=0,port=16020" daemon prio=10 tid=0x00007f205cf8d000nid=0x8f0b waiting on condition [0x00007f203ea85000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for<0x00000006708113c8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.apache.hadoop.hbase.regionserver.HRegion.dropMemstoreContentsForSeqId(HRegion.java:4568) at org.apache.hadoop.hbase.regionserver.HRegion.refreshStoreFiles(HRegion.java:4919) - locked <0x00000006707c3500> (a org.apache.hadoop.hbase.regionserver.HRegion$WriteState) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleFileNotFound(HRegion.java:6104) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5736) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5875) For handler 24, HRegion$RegionScannerImpl.next() gets a read lock, and tries to obtain a lock on writestate in handleFileNotFound(): "B.defaultRpcServer.handler=24,queue=0,port=16020" daemon prio=10 tid=0x00007f205cfa6000nid=0x8f17 waiting for monitor entry [0x00007f203de79000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.regionserver.HRegion.refreshStoreFiles(HRegion.java:4887) - waiting to lock <0x00000006707c3500> (a org.apache.hadoop.hbase.regionserver.HRegion$WriteState) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.handleFileNotFound(HRegion.java:6104) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5736) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5875) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5653) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5630) - locked <0x00000007130162c8> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:5616) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6810) at org.apache.hadoop.hbase.regionserver.HRegion.getIncrementCurrentValue(HRegion.java:7673) at org.apache.hadoop.hbase.regionserver.HRegion.applyIncrementsToColumnFamily(HRegion.java:7583) at org.apache.hadoop.hbase.regionserver.HRegion.doIncrement(HRegion.java:7480) at org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:7440) As you can see, these two handlers get into a deadlock. Fixing the Deadlock So, how can you fix this? The fix breaks the deadlock (on hander 12) by remembering the tuples to be passed to dropMemstoreContentsForSeqId(), releasing the read lock and then calling dropMemstoreContentsForSeqId(). Mingmin, the reporter of the bug, kindly added a patched JAR onto his production cluster so that the deadlock of this form no longer exists. Take a look at this. I hope my experiences encountering this tricky situation will be of some help to you in the event that you see such a scenario in the future. About the author Ted Yu is astaff engineer at HortonWorks. He has also been an HBase committer/PMC for 5years. His work on HBase covers various components: security, backup/restore, load balancer, MOB, and so on. He has provided support for customers at eBay, Micron, PayPal, and JPMC. He is also a Spark contributor.
Read more
  • 0
  • 0
  • 3351

article-image-introduction-moodle-3-and-moodlecloud
Packt
19 Oct 2016
20 min read
Save for later

An Introduction to Moodle 3 and MoodleCloud

Packt
19 Oct 2016
20 min read
In this article by Silvina Paola Hillar, the author of the book Moodle Theme Development we will introduce e-learning and virtual learning environments such as Moodle and MoodleCloud, explaining their similarities and differences. Apart from that, we will learn and understand screen resolution and aspect ratio, which is the information we need in order to develop Moodle themes. In this article, we shall learn the following topics: Understanding what e-learning is Learning about virtual learning environments Introducing Moodle and MoodleCloud Learning what Moodle and MoodleCloud are Using Moodle on different devices Sizing the screen resolution Calculating the aspect ratio Learning about sharp and soft images Learning about crisp and sharp text Understanding what anti-aliasing is (For more resources related to this topic, see here.) Understanding what e-learning is E-learning is electronic learning, meaning that it is not traditional learning in a classroom with a teacher and students, plus the board. E-learning involves using a computer to deliver classes or a course. When delivering classes or a course, there is online interaction between the student and the teacher. There might also be some offline activities, when a student is asked to create a piece of writing or something else. Another option is that there are collaboration activities involving the interaction of several students and the teacher. When creating course content, there is the option of video conferencing as well. So there is virtual face-to-face interaction within the e-learning process. The time and the date should be set beforehand. In this way, e-learning is trying to imitate traditional learning to not lose human contact or social interaction. The course may be full distance or not. If the course is full distance, there is online interaction only. All the resources and activities are delivered online and there might be some interaction through messages, chats, or emails between the student and the teacher. If the course is not full distance, and is delivered face to face but involving the use of computers, we are referring to blended learning. Blended learning means using e-learning within the classroom, and is a mixture of traditional learning and computers. The usage of blended learning with little children is very important, because they get the social element, which is essential at a very young age. Apart from that, they also come into contact with technology while they are learning. It is advisable to use interactive whiteboards (IWBs) at an early stage. IWBs are the right tool to choose when dealing with blended learning. IWBs are motivational gadgets, which are prominent in integrating technology into the classroom. IWBs are considered a symbol of innovation and a key element of teaching students. IWBs offer interactive projections for class demonstrations; we can usually project resources from computer software as well as from our Moodle platform. Students can interact with them by touching or writing on them, that is to say through blended learning. Apart from that, teachers can make presentations on different topics within a subject and these topics become much more interesting and captivating for students, since IWBs allows changes to be made and we can insert interactive elements into the presentation of any subject. There are several types of technology used in IWBs, such as touch technology, laser scanning, and electromagnetic writing tools. Therefore, we have to bear in mind which to choose when we get an IWB. On the other hand, the widespread use of mobile devices nowadays has turned e-learning into mobile learning. Smartphones and tablets allows students to learn anywhere at any time. Therefore, it is important to design course material that is usable by students on such devices. Moodle is a learning platform through which we can design, build and create e-learning environments. It is possible to create online interaction and have video conferencing sessions with students. Distance learning is another option if blended learning cannot be carried out. We can also choose Moodle mobile. We can download the app from App Store, Google Play, Windows Store, or Windows Phone Store. We can browse the content of courses, receive messages, contact people from the courses, upload different types of file, and view course grades, among other actions. Learning about Virtual Learning Environments Virtual Learning Environment (VLE) is a type of virtual environment that supports both resources and learning activities; therefore, students can have both passive and active roles. There is also social interaction, which can take place through collaborative work as well as video conferencing. Students can also be actors, since they can also construct the VLE. VLEs can be used for both distance and blended learning, since they can enrich courses. Mobile learning is also possible because mobile devices have access to the Internet, allowing teachers and students to log in to their courses. VLEs are designed in such a way that they can carry out the following functions or activities: Design, create, store, access, and use course content Deliver or share course content Communicate, interact, and collaborate between students and teachers Assess and personalize the learning experience Modularize both activities and resources Customize the interface We are going to deal with each of these functions and activities and see how useful they might be when designing our VLE for our class. When using Moodle, we can perform all the functions and activities mentioned here, because Moodle is a VLE. Design, create, store, access and use course content If we use the Moodle platform to create the course, we have to deal with course content. Therefore, when we add a course, we have to add its content. We can choose the weekly outline section or the topic under which we want to add the content. We click on Add an activity or resource and two options appear, resources and activities; therefore, the content can be passive or active for the student. When we create or design activities in Moodle, the options are shown in the following screenshot: Another option for creating course content is to reuse content that has already been created and used before in another VLE. In other words, we can import or export course materials, since most VLEs have specific tools designed for such purposes. This is very useful and saves time. There are a variety of ways for teachers to create course materials, due to the fact that the teacher thinks of the methodology, as well as how to meet the student's needs, when creating the course. Moodle is designed in such a way that it offers a variety of combinations that can fit any course content. Deliver or share course content Before using VLEs, we have to log in, because all the content is protected and is not open to the general public. In this way, we can protect property rights, as well as the course itself. All participants must be enrolled in the course unless it has been opened to the public. Teachers can gain remote access in order to create and design their courses. This is quite profitable since they can build the content at home, rather than in their workplace. They need login access and they need to switch roles to course creator in order to create the content. Follow these steps to switch roles to course creator: Under Administration, click on Switch role to… | Course creator, as shown in the following screenshot: When the role has been changed, the teacher can create content that students can access. Once logged in, students have access to the already created content, either activities or resources. The content is available over the Internet or the institution's intranet connection. Students can access the content anywhere if any of these connections are available. If MoodleCloud is being used, there must be an Internet connection, otherwise it is impossible for both students and teachers to log in. Communicate, interact, and collaborate among students and teachers Communication, interaction, and collaborative working are the key factors to social interaction and learning through interchanging ideas. VLEs let us create course content activities, because these actions are allows they are elemental for our class. There is no need to be an isolated learner, because learners have the ability to communicate between themselves and with the teachers. Moodle offers the possibility of video conferencing through the Big Blue Button. In order to install the Big Blue Button plugin in Moodle, visit the following link:https://moodle.org/plugins/browse.php?list=set&id=2. This is shown in the following screenshot: If you are using MoodleCloud, the Big Blue Button is enabled by default, so when we click on Add an activity or resource it appears in the list of activities, as shown in the following screenshot: Assess and personalize the learning experience Moodle allows the teacher to follow the progress of students so that they can assess and grade their work, as long as they complete the activities. Resources cannot be graded, since they are passive content for students, but teachers can also check when a participant last accessed the site. Badges are another element used to personalize the learning experience. We can create badges for students when they complete an activity or a course; they are homework rewards. Badges are quite good at motivating young learners. Modularize both activities and resources Moodle offers the ability to build personalized activities and resources. There are several ways to present both, with all the options Moodle offers. Activities can be molded according to the methodology the teacher uses. In Moodle 3, there are new question types within the Quiz activity. The question types are as follows: Select missing words Drag and drop into text Drag and drop onto image Drag and drop markers The question types are shown after we choose Quiz in the Add a resource or Activity menu, in the weekly outline section or topic that we have chosen. The types of question are shown in the following screenshot: Customize the interface Moodle allows us to customize the interface in order to develop the look and feel that we require; we can add a logo for the school or institution that the Moodle site belongs to. We can also add another theme relevant to the subject or course that we have created. The main purpose in customizing the interface is to avoid all subjects and courses looking the same. Later in the article, we will learn how to customize the interface. Learning Moodle and MoodleCloud Modular Object-Oriented Dynamic Learning Environment (Moodle) is a learning platform designed in such a way that we can create VLEs. Moodle can be downloaded, installed and run on any web server software using Hypertext Preprocessor (PHP). It can support a SQL database and can run on several operating systems. We can download Moodle 3.0.3 from the following URL: https://download.moodle.org/. This URL is shown in the following screenshot: MoodleCloud, on the other hand, does not need to be downloaded since, as its name suggests, is in the cloud. Therefore, we can get our own Moodle site with MoodleCloud within minutes and for free. It is Moodle's hosting platform, designed and run by the people who make Moodle. In order to get a MoodleCloud site, we need to go to the following URL: https://moodle.com/cloud/. This is shown in the following screenshot: MoodleCloud was created in order to cater for users with fewer requirements and small budgets. In order to create an account, you need to add your cell phone number to receive an SMS which we must be input when creating your site. As it is free, there are some limitations to MoodleCloud, unless we contact Moodle Partners and pay for an expanded version of it. The limitations are as follows: No more than 50 users 200 MB disk space Core themes and plugins only One site per phone number Big Blue Button sessions are limited to 6 people, with no recordings There are advertisements When creating a Moodle site, we want to change the look and functionality of the site or individual course. We may also need to customize themes for Moodle, in order to give the course the desired look. Therefore, this article will explain the basic concepts that we have to bear in mind when dealing with themes, due to the fact that themes are shown in different devices. In the past, Moodle ran only on desktops or laptops, but nowadays Moodle can run on many different devices, such as smartphones, tablets, iPads, and smart TVs, and the list goes on. Using Moodle on different devices Moodle can be used on different devices, at different times, in different places. Therefore, there are factors that we need to be aware of when designing courses and themes.. Therefore, here after in this article, there are several aspects and concepts that we need to deepen into in order to understand what we need to take into account when we design our courses and build our themes. Devices change in many ways, not only in size but also in the way they display our Moodle course. Moodle courses can be used on anything from a tiny device that fits into the palm of a hand to a huge IWB or smart TV, and plenty of other devices in between. Therefore, such differences have to be taken into account when choosing images, text, and other components of our course. We are going to deal with sizing screen resolution, calculating the aspect ratio, types of images such as sharp and soft, and crisp and sharp text. Finally, but importantly, the anti-aliasing method is explained. Sizing the screen resolution Number of pixels the display of device has, horizontally and vertically and the color depth measuring the number of bits representing the color of each pixel makes up the screen resolution. The higher the screen resolution, the higher the productivity we get. In the past, the screen resolution of a display was important since it determined the amount of information displayed on the screen. The lower the resolution, the fewer items would fit on the screen; the higher the resolution, the more items would fit on the screen. The resolution varies according to the hardware in each device. Nowadays, the screen resolution is considered a pleasant visual experience, since we would rather see more quality than more stuff on the screen. That is the reason why the screen resolution matters. There might be different display sizes where the screen resolutions are the same, that is to say, the total number of pixels is the same. If we compare a laptop (13'' screen with a resolution of 1280 x 800) and a desktop (with a 17'' monitor with the same 1280 x 800 resolution), although the monitor is larger, the number of pixels is the same; the only difference is that we will be able to see everything bigger on the monitor. Therefore, instead of seeing more stuff, we see higher quality. Screen resolution chart Code Width Height Ratio Description QVGA 320 240 4:3 Quarter Video Graphics Array FHD 1920 1080 ~16:9 Full High Definition HVGA 640 240 8:3 Half Video Graphics Array HD 1360 768 ~16:9 High Definition HD 1366 768 ~16:9 High Definition HD+ 1600 900 ~16:9 High Definition plus VGA 640 480 4:3 Video Graphics Array SVGA 800 600 4:3 Super Video Graphics Array XGA 1024 768 4:3 Extended Graphics Array XGA+ 1152 768 3:2 Extended Graphics Array plus XGA+ 1152 864 4:3 Extended Graphics Array plus SXGA 1280 1024 5:4 Super Extended Graphics Array SXGA+ 1400 1050 4:3 Super Extended Graphics Array plus UXGA 1600 1200 4:3 Ultra Extended Graphics Array QXGA 2048 1536 4:3 Quad Extended Graphics Array WXGA 1280 768 5:3 Wide Extended Graphics Array WXGA 1280 720 ~16:9 Wide Extended Graphics Array WXGA 1280 800 16:10 Wide Extended Graphics Array WXGA 1366 768 ~16:9 Wide Extended Graphics Array WXGA+ 1280 854 3:2 Wide Extended Graphics Array plus WXGA+ 1440 900 16:10 Wide Extended Graphics Array plus WXGA+ 1440 960 3:2 Wide Extended Graphics Array plus WQHD 2560 1440 ~16:9 Wide Quad High Definition WQXGA 2560 1600 16:10 Wide Quad Extended Graphics Array WSVGA 1024 600 ~17:10 Wide Super Video Graphics Array WSXGA 1600 900 ~16:9 Wide Super Extended Graphics Array WSXGA 1600 1024 16:10 Wide Super Extended Graphics Array WSXGA+ 1680 1050 16:10 Wide Super Extended Graphics Array plus WUXGA 1920 1200 16:10 Wide Ultra Extended Graphics Array WQXGA 2560 1600 16:10 Wide Quad Extended Graphics Array WQUXGA 3840 2400 16:10 Wide Quad Ultra Extended Graphics Array 4K UHD 3840 2160 16:9 Ultra High Definition 4K UHD 1536 864 16:9 Ultra High Definition Considering that 3840 x 2160 displays (also known as 4K, QFHD, Ultra HD, UHD, or 2160p) are already available for laptops and monitors, a pleasant visual experience with high DPI displays can be a good long-term investment for your desktop applications. The DPI setting for the monitor causes another common problem. The change in the effective resolution. Consider the 13.3" display that offers a 3200 x1800 resolution and is configured with an OS DPI of 240 DPI. The high DPI setting makes the system use both larger fonts and UI elements; therefore, the elements consume more pixels to render than the same elements displayed in the resolution configured with an OS DPI of 96 DPI. The effective resolution of a display that provides 3200 x1800 pixels configured at 240 DPI is 1280 x 720. The effective resolution can become a big problem because an application that requires a minimum resolution of the old standard 1024 x 768 pixels with an OS DPI of 96 would have problems with a 3200 x 1800-pixel display configured at 240 DPI, and it wouldn't be possible to display all the necessary UI elements. It may sound crazy, but the effective vertical resolution is 720 pixels, lower than the 768 vertical pixels required by the application to display all the UI elements without problems. The formula to calculate the effective resolution is simple: divide the physical pixels by the scale factor (OS DPI / 96). For example, the following formula calculates the horizontal effective resolution of my previous example: 3200 / (240 / 96) = 3200 / 2.5 = 1280; and the following formula calculates the vertical effective resolution: 1800 / (240 / 96) = 1800 / 2.5 = 720. The effective resolution would be of 1800 x 900 pixels if the same physical resolution were configured at 192 DPI. Effective horizontal resolution: 3200 / (192 / 96) = 3200 / 2 = 1600; and vertical effective resolution: 1800 / (192 / 96) = 1800 / 2 = 900. Calculating the aspect ratio The aspect radio is the proportional relationship between the width and the height of an image. It is used to describe the shape of a computer screen or a TV. The aspect ratio of a standard-definition (SD) screen is 4:3, that is to say, a relatively square rectangle. The aspect ratio is often expressed in W:H format, where W stands for width and H stands for height. 4:3 means four units wide to three units high. With regards to high-definition TV (HDTV), they have a 16:9 ratio, which is a wider rectangle. Why do we calculate the aspect ratio? The answer to this question is that the ratio has to be well defined because the rectangular shape that every frame, digital video, canvas, image, or responsive design has, makes shapes fit into different and distinct devices. Learning about sharp and soft images Images can be either sharp or soft. Sharp is the opposite of soft. A soft image has less pronounced details, while a sharp image has more contrast between pixels. The more pixels the image has, the sharper it is. We can soften the image, in which case it loses information, but we cannot sharpen one; in other words, we can't add more information to an image. In order to compare sharp and soft images, we can visit the following website, where we can convert bitmaps to vector graphics. We can convert a bitmap images such as .png, .jpeg, or .gif into a .svg in order to get an anti-aliased image. We can do this with a simple step. We work with an online tool to vectorize the bitmap using http://vectormagic.com/home. There are plenty of features to take into account when vectorizing. We can design a bitmap using an image editor and upload the bitmap image from the clipboard, or upload the file from our computer. Once the image is uploaded to the application, we can start working. Another possibility is to use the sample images on the website, which we are going to use in order to see that anti-aliasing effect. We convert bitmap images, which are made up of pixels, into vector images, which are made up of shapes. The shapes are mathematical descriptions of images and do not become pixelated when scaling up. Vector graphics can handle scaling without any problems. Vector images are the preferred type to work with in graphic design on paper or clothes. Go to http://vectormagic.com/home and click on Examples, as shown in the following screenshot: After clicking on Examples, the bitmap appears on the left and the vectorized image on the right. The bitmap is blurred and soft; the SVG has an anti-aliasing effect, therefore the image is sharp. The result is shown in the following screenshot: Learning about crisp and sharp text There are sharp and soft images, and there is also crisp and sharp text, so it is now time to look at text. What is the main difference between these two? When we say that the text is crisp, we mean that there is more anti-aliasing, in other words it has more grey pixels around the black text. The difference is shown when we zoom in to 400%. On the other hand, sharp mode is superior for small fonts because it makes each letter stronger. There are four options in Photoshop to deal with text: sharp, crisp, strong, and smooth. Sharp and crisp have already been mentioned in the previous paragraphs. Strong is notorious for adding unnecessary weight to letter forms, and smooth looks closest to the untinted anti-aliasing, and it remains similar to the original. Understanding what anti-aliasing is The word anti-aliasing means the technique used in order to minimize the distortion artifacts. It applies intermediate colors in order to eliminate pixels, that is to say the saw-tooth or pixelated lines. Therefore, we need to look for a lower resolution so that the saw-tooth effect does not appear when we make the graphic bigger. Test your knowledge Before we delve deeper into more content, let's test your knowledge about all the information that we have dealt with in this article: Moodle is a learning platform with which… We can design, build and create E-learning environments. We can learn. We can download content for students. BigBlueButtonBN… Is a way to log in to Moodle. Lets you create links to real-time online classrooms from within Moodle. Works only in MoodleCloud. MoodleCloud… Is not open source. Does not allow more than 50 users. Works only for universities. The number of pixels the display of the device has horizontally and vertically, and the color depth measuring the number of bits representing the color of each pixel, make up… Screen resolution. Aspect ratio. Size of device. Anti-aliasing can be applied to … Only text. Only images. Both images and text. Summary In this article, we have covered most of what needs to be known about e-learning, VLEs, and Moodle and MoodleCloud. There is a slight difference between Moodle and MoodleCloud specially if you don't have access to a Moodle course in the institution where you are working and want to design a Moodle course. Moodle is used in different devices and there are several aspects to take into account when designing a course and building a Moodle theme. We have dealt with screen resolution, aspect ratio, types of images and text, and anti-aliasing effects. Resources for Article: Further resources on this subject: Listening Activities in Moodle 1.9: Part 2 [article] Gamification with Moodle LMS [article] Adding Graded Activities [article]
Read more
  • 0
  • 0
  • 14782

article-image-heart-diseases-prediction-using-spark-200
Packt
18 Oct 2016
16 min read
Save for later

Heart Diseases Prediction using Spark 2.0.0

Packt
18 Oct 2016
16 min read
In this article, Md. Rezaul Karim and Md. Mahedi Kaysar, the authors of the book Large Scale Machine Learning with Spark discusses how to develop a large scale heart diseases prediction pipeline by considering steps like taking input, parsing, making label point for regression, model training, model saving and finally predictive analytics using the trained model using Spark 2.0.0. In this article, they will develop a large-scale machine learning application using several classifiers like the random forest, decision tree, and linear regression classifier. To make this happen the following steps will be covered: Data collection and exploration Loading required packages and APIs Creating an active Spark session Data parsing and RDD of Label point creation Splitting the RDD of label point into training and test set Training the model Model saving for future use Predictive analysis using the test set Predictive analytics using the new dataset Performance comparison among different classifier (For more resources related to this topic, see here.) Background Machine learning in big data together is a radical combination that has created some great impacts in the field of research to academia and industry as well in the biomedical sector. In the area of biomedical data analytics, this carries a better impact on a real dataset for diagnosis and prognosis for better healthcare. Moreover, the life science research is also entering into the Big data since datasets are being generated and produced in an unprecedented way. This imposes great challenges to the machine learning and bioinformatics tools and algorithms to find the VALUE out of the big data criteria like volume, velocity, variety, veracity, visibility and value. In this article, we will show how to predict the possibility of future heart disease by using Spark machine learning APIs including Spark MLlib, Spark ML, and Spark SQL. Data collection and exploration In the recent time, biomedical research has gained lots of advancement and more and more life sciences data set are being generated making many of them open. However, for the simplicity and ease, we decided to use the Cleveland database. Because till date most of the researchers who have applied the machine learning technique to biomedical data analytics have used this dataset. According to the dataset description at https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/heart-disease.names, the heart disease dataset is one of the most used as well as very well-studied datasets by the researchers from the biomedical data analytics and machine learning respectively. The dataset is freely available at the UCI machine learning dataset repository at https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/. This data contains total 76 attributes, however, most of the published research papers refer to use a subset of only 14 feature of the field. The goal field is used to refer if the heart diseases are present or absence. It has 5 possible values ranging from 0 to 4. The value 0 signifies no presence of heart diseases. The value 1 and 2 signify that the disease is present but in the primary stage. The value 3 and 4, on the other hand, indicate the strong possibility of the heart disease. Biomedical laboratory experiments with the Cleveland dataset have determined on simply attempting to distinguish presence (values 1, 2, 3, 4) from absence (value 0). In short, the more the value the more possibility and evidence of the presence of the disease. Another thing is that the privacy is an important concern in the area of biomedical data analytics as well as all kind of diagnosis and prognosis. Therefore, the names and social security numbers of the patients were recently removed from the dataset to avoid the privacy issue. Consequently, those values have been replaced with dummy values instead. It is to be noted that three files have been processed, containing the Cleveland, Hungarian, and Switzerland datasets altogether. All four unprocessed files also exist in this directory. To demonstrate the example, we will use the Cleveland dataset for training evaluating the models. However, the Hungarian dataset will be used to re-use the saved model. As said already that although the number of attributes is 76 (including the predicted attribute). However, like other ML/Biomedical researchers, we will also use only 14 attributes with the following attribute information:  No. Attribute name Explanation 1 age Age in years 2 sex Either male or female: sex (1 = male; 0 = female) 3 cp Chest pain type: -- Value 1: typical angina -- Value 2: atypical angina -- Value 3: non-angina pain -- Value 4: asymptomatic 4 trestbps Resting blood pressure (in mm Hg on admission to the hospital) 5 chol Serum cholesterol in mg/dl 6 fbs Fasting blood sugar. If > 120 mg/dl)(1 = true; 0 = false) 7 restecg Resting electrocardiographic results: -- Value 0: normal -- Value 1: having ST-T wave abnormality -- Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria. 8 thalach Maximum heart rate achieved 9 exang Exercise induced angina (1 = yes; 0 = no) 10 oldpeak ST depression induced by exercise relative to rest 11 slope The slope of the peak exercise ST segment    -- Value 1: upsloping    -- Value 2: flat    -- Value 3: down-sloping 12 ca Number of major vessels (0-3) coloured by fluoroscopy 13 thal Heart rate: ---Value 3 = normal; ---Value 6 = fixed defect ---Value 7 = reversible defect 14 num Diagnosis of heart disease (angiographic disease status) -- Value 0: < 50% diameter narrowing -- Value 1: > 50% diameter narrowing Table 1: Dataset characteristics Note there are several missing attribute values distinguished with value -9.0. In the Cleveland dataset contains the following class distribution: Database:     0       1     2     3   4   Total Cleveland:   164   55   36   35 13   303 A sample snapshot of the dataset is given as follows: Figure 1: Snapshot of the Cleveland's heart diseases dataset Loading required packages and APIs The following packages and APIs need to be imported for our purpose. We believe the packages are self-explanatory if you have minimum working experience with Spark 2.0.0.: import java.util.HashMap; import java.util.List; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.function.Function; import org.apache.spark.api.java.function.PairFunction; import org.apache.spark.ml.classification.LogisticRegression; import org.apache.spark.mllib.classification.LogisticRegressionModel; import org.apache.spark.mllib.classification.NaiveBayes; import org.apache.spark.mllib.classification.NaiveBayesModel; import org.apache.spark.mllib.linalg.DenseVector; import org.apache.spark.mllib.linalg.Vector; import org.apache.spark.mllib.regression.LabeledPoint; import org.apache.spark.mllib.regression.LinearRegressionModel; import org.apache.spark.mllib.regression.LinearRegressionWithSGD; import org.apache.spark.mllib.tree.DecisionTree; import org.apache.spark.mllib.tree.RandomForest; import org.apache.spark.mllib.tree.model.DecisionTreeModel; import org.apache.spark.mllib.tree.model.RandomForestModel; import org.apache.spark.rdd.RDD; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; import com.example.SparkSession.UtilityForSparkSession; import javassist.bytecode.Descriptor.Iterator; import scala.Tuple2; Creating an active Spark session SparkSession spark = UtilityForSparkSession.mySession(); Here is the UtilityForSparkSession class that creates and returns an active Spark session: import org.apache.spark.sql.SparkSession; public class UtilityForSparkSession { public static SparkSession mySession() { SparkSession spark = SparkSession .builder() .appName("UtilityForSparkSession") .master("local[*]") .config("spark.sql.warehouse.dir", "E:/Exp/") .getOrCreate(); return spark; } } Note that here in Windows 7 platform, we have set the Spark SQL warehouse as "E:/Exp/", set your path accordingly based on your operating system. Data parsing and RDD of Label point creation Taken input as simple text file, parse them as text file and create RDD of label point that will be used for the classification and regression analysis. Also specify the input source and number of partition. Adjust the number of partition based on your dataset size. Here number of partition has been set to 2: String input = "heart_diseases/processed_cleveland.data"; Dataset<Row> my_data = spark.read().format("com.databricks.spark.csv").load(input); my_data.show(false); RDD<String> linesRDD = spark.sparkContext().textFile(input, 2); Since, JavaRDD cannot be created directly from the text files; rather we have created the simple RDDs, so that we can convert them as JavaRDD when necessary. Now let's create the JavaRDD with Label Point. However, we need to convert the RDD to JavaRDD to serve our purpose that goes as follows: JavaRDD<LabeledPoint> data = linesRDD.toJavaRDD().map(new Function<String, LabeledPoint>() { @Override public LabeledPoint call(String row) throws Exception { String line = row.replaceAll("\?", "999999.0"); String[] tokens = line.split(","); Integer last = Integer.parseInt(tokens[13]); double[] features = new double[13]; for (int i = 0; i < 13; i++) { features[i] = Double.parseDouble(tokens[i]); } Vector v = new DenseVector(features); Double value = 0.0; if (last.intValue() > 0) value = 1.0; LabeledPoint lp = new LabeledPoint(value, v); return lp; } }); Using the replaceAll() method we have handled the invalid values like missing values that are specified in the original file using ? character. To get rid of the missing or invalid value we have replaced them with a very large value that has no side-effect to the original classification or predictive results. The reason behind this is that missing or sparse data can lead you to highly misleading results. Splitting the RDD of label point into training and test set Well, in the previous step, we have created the RDD label point data that can be used for the regression or classification task. Now we need to split the data as training and test set. That goes as follows: double[] weights = {0.7, 0.3}; long split_seed = 12345L; JavaRDD<LabeledPoint>[] split = data.randomSplit(weights, split_seed); JavaRDD<LabeledPoint> training = split[0]; JavaRDD<LabeledPoint> test = split[1]; If you see the preceding code segments, you will find that we have split the RDD label point as 70% as the training and 30% goes to the test set. The randomSplit() method does this split. Note that, set this RDD's storage level to persist its values across operations after the first time it is computed. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. The split seed value is a long integer that signifies that split would be random but the result would not be a change in each run or iteration during the model building or training. Training the model and predict the heart diseases possibility At the first place, we will train the linear regression model which is the simplest regression classifier. final double stepSize = 0.0000000009; final int numberOfIterations = 40; LinearRegressionModel model = LinearRegressionWithSGD.train(JavaRDD.toRDD(training), numberOfIterations, stepSize); As you can see the preceding code trains a linear regression model with no regularization using Stochastic Gradient Descent. This solves the least squares regression formulation f (weights) = 1/n ||A weights-y||^2^; which is the mean squared error. Here the data matrix has n rows, and the input RDD holds the set of rows of A, each with its corresponding right-hand side label y. Also to train the model it takes the training set, number of iteration and the step size. We provide here some random value of the last two parameters. Model saving for future use Now let's save the model that we just created above for future use. It's pretty simple just use the following code by specifying the storage location as follows: String model_storage_loc = "models/heartModel"; model.save(spark.sparkContext(), model_storage_loc); Once the model is saved in your desired location, you will see the following output in your Eclipse console: Figure 2: The log after model saved to the storage Predictive analysis using the test set Now let's calculate the prediction score on the test dataset: JavaPairRDD<Double, Double> predictionAndLabel = test.mapToPair(new PairFunction<LabeledPoint, Double, Double>() { @Override public Tuple2<Double, Double> call(LabeledPoint p) { return new Tuple2<>(model.predict(p.features()), p.label()); } }); Predict the accuracy of the prediction: double accuracy = predictionAndLabel.filter(new Function<Tuple2<Double, Double>, Boolean>() { @Override public Boolean call(Tuple2<Double, Double> pl) { return pl._1().equals(pl._2()); } }).count() / (double) test.count(); System.out.println("Accuracy of the classification: "+accuracy); The output goes as follows: Accuracy of the classification: 0.0 Performance comparison among different classifier Unfortunately, there is no prediction accuracy at all, right? There might be several reasons for that, including: The dataset characteristic Model selection Parameters selection, that is, also called hyperparameter tuning Well, for the simplicity, we assume the dataset is okay; since, as already said that it is a widely used dataset used for machine learning research used by many researchers around the globe. Now, what next? Let's consider another classifier algorithm for example Random forest or decision tree classifier. What about the Random forest? Lets' go for the random forest classifier at second place. Just use below code to train the model using the training set. Integer numClasses = 26; //Number of classes //HashMap is used to restrict the delicacy in the tree construction HashMap<Integer, Integer> categoricalFeaturesInfo = new HashMap<Integer, Integer>(); Integer numTrees = 5; // Use more in practice. String featureSubsetStrategy = "auto"; // Let the algorithm choose the best String impurity = "gini"; // also information gain & variance reduction available Integer maxDepth = 20; // set the value of maximum depth accordingly Integer maxBins = 40; // set the value of bin accordingly Integer seed = 12345; //Setting a long seed value is recommended final RandomForestModel model = RandomForest.trainClassifier(training, numClasses,categoricalFeaturesInfo, numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins, seed); We believe the parameters user by the trainClassifier() method is self-explanatory and we leave it to the readers to get know the significance of each parameter. Fantastic! We have trained the model using the Random forest classifier and cloud manage to save the model too for future use. Now if you reuse the same code that we described in the Predictive analysis using the test set step, you should have the output as follows: Accuracy of the classification: 0.7843137254901961 Much better, right? If you are still not satisfied, you can try with another classifier model like Naïve Bayes classifier. Predictive analytics using the new dataset As we already mentioned that we have saved the model for future use, now we should take the opportunity to use the same model for new datasets. The reason is if you recall the steps, we have trained the model using the training set and evaluate using the test set. Now if you have more data or new data available to be used? Will you go for re-training the model? Of course not since you will have to iterate several steps and you will have to sacrifice valuable time and cost too. Therefore, it would be wise to use the already trained model and predict the performance on a new dataset. Well, now let's reuse the stored model then. Note that you will have to reuse the same model that is to be trained the same model. For example, if you have done the model training using the Random forest classifier and saved the model while reusing you will have to use the same classifier model to load the saved model. Therefore, we will use the Random forest to load the model while using the new dataset. Use just the following code for doing that. Now create RDD label point from the new dataset (that is, Hungarian database with same 14 attributes): String new_data = "heart_diseases/processed_hungarian.data"; RDD<String> linesRDD = spark.sparkContext().textFile(new_data, 2); JavaRDD<LabeledPoint> data = linesRDD.toJavaRDD().map(new Function<String, LabeledPoint>() { @Override public LabeledPoint call(String row) throws Exception { String line = row.replaceAll("\?", "999999.0"); String[] tokens = line.split(","); Integer last = Integer.parseInt(tokens[13]); double[] features = new double[13]; for (int i = 0; i < 13; i++) { features[i] = Double.parseDouble(tokens[i]); } Vector v = new DenseVector(features); Double value = 0.0; if (last.intValue() > 0) value = 1.0; LabeledPoint p = new LabeledPoint(value, v); return p; } }); Now let's load the saved model using the Random forest model algorithm as follows: RandomForestModel model2 = RandomForestModel.load(spark.sparkContext(), model_storage_loc); Now let's calculate the prediction on test set: JavaPairRDD<Double, Double> predictionAndLabel = data.mapToPair(new PairFunction<LabeledPoint, Double, Double>() { @Override public Tuple2<Double, Double> call(LabeledPoint p) { return new Tuple2<>(model2.predict(p.features()), p.label()); } }); Now calculate the accuracy of the prediction as follows: double accuracy = predictionAndLabel.filter(new Function<Tuple2<Double, Double>, Boolean>() { @Override public Boolean call(Tuple2<Double, Double> pl) { return pl._1().equals(pl._2()); } }).count() / (double) data.count(); System.out.println("Accuracy of the classification: "+accuracy); We got the following output: Accuracy of the classification: 0.7380952380952381 To get more interesting and fantastic machine learning application like spam filtering, topic modelling for real-time streaming data, handling graph data for machine learning, market basket analysis, neighborhood clustering analysis, Air flight delay analysis, making the ML application adaptable, Model saving and reusing, hyperparameter tuning and model selection, breast cancer diagnosis and prognosis, heart diseases prediction, optical character recognition, hypothesis testing, dimensionality reduction for high dimensional data, large-scale text manipulation and many more visits inside. Moreover, the book also contains how to scaling up the ML model to handle massive big dataset on cloud computing infrastructure. Furthermore, some best practice in the machine learning techniques has also been discussed. In a nutshell many useful and exciting application have been developed using the following machine learning algorithms: Linear Support Vector Machine (SVM) Linear Regression Logistic Regression Decision Tree Classifier Random Forest Classifier K-means Clustering LDA topic modelling from static and real-time streaming data Naïve Bayes classifier Multilayer Perceptron classifier for deep classification Singular Value Decomposition (SVD) for dimensionality reduction Principal Component Analysis (PCA) for dimensionality reduction Generalized Linear Regression Chi Square Test (for goodness of fit test, independence test, and feature test) KolmogorovSmirnovTest for hypothesis test Spark Core for Market Basket Analysis Multi-label classification One Vs Rest classifier Gradient Boosting classifier ALS algorithm for movie recommendation Cross-validation for model selection Train Split for model selection RegexTokenizer, StringIndexer, StopWordsRemover, HashingTF and TF-IDF for text manipulation Summary In this article we came to know that how beneficial large scale machine learning with Spark is with respect to any field. Resources for Article: Further resources on this subject: Spark for Beginners [article] Setting up Spark [article] Holistic View on Spark [article]
Read more
  • 0
  • 0
  • 4852

article-image-managing-application-configuration
Packt
18 Oct 2016
14 min read
Save for later

Managing Application Configuration

Packt
18 Oct 2016
14 min read
In this article by Sean McCord author of the book CoreOS Cookbook, we will explore some of the options available to help bridge the configuration divide with the following topics: Configuring by URL Translating etcd to configuration files Building EnvironmentFiles Building an active configuration manager Using fleet globals (For more resources related to this topic, see here.) Configuring by URL One of the most direct ways to obtain application configurations is by URL. You can generate a configuration and store it as a file somewhere, or construct a configuration from a web request, returning the formatted file. In this section, we will construct a dynamic redis configuration by web request and then run redis using it. Getting ready First, we need a configuration server. This can be S3, an object store, etcd, a NodeJS application, a rails web server, or just about anything. The details don't matter, as long as it speaks HTTP. We will construct a simple one here using Go, just in case you don't have one ready. Make sure your GOPATH is set and create a new directory named configserver. Then, create a new file in that directory called main.go with the following contents: package main import ( "html/template" "log" "net/http" ) func init() { redisTmpl = template.Must(template.New("rcfg").Parse(redisString)) } func main() { http.HandleFunc("/config/redis", redisConfig) log.Fatal(http.ListenAndServe(":8080", nil)) } func redisConfig(w http.ResponseWriter, req *http.Request) { // TODO: pull configuration from database redisTmpl.Execute(w, redisConfigOpts{ Save: true, MasterIP: "192.168.25.100", MasterPort: "6379", }) } type redisConfigOpts struct { Save bool // Should redis save db to file? MasterIP string // IP address of the redis master MasterPort string // Port of the redis master } var redisTmpl *template.Template const redisString = ` {{if .Save}} save 900 1 save 300 10 save 60 10000 {{end}} slaveof {{.MasterIP}} {{.MasterPort}} ` For our example, we simply statically configure the values, but it is easy to see how we could query etcd or another database to fill in the appropriate values on demand. Now, just go and build and run the config server, and we are ready to implement our configURL-based configuration. How to do it... By design, CoreOS is a very stripped down OS. However, one of the tools it does come with is curl, which we can use to download our configuration. All we have to do is add it to our systemd/fleet unit file. For the redis-slave.service input the following: [Unit] Description=Redis slave server After=docker.service [Service] ExecStartPre=/usr/bin/mkdir -p /tmp/config/redis-slave ExecStartPre=/usr/bin/curl -s -o /tmp/config/redis-slave/redis.conf http://configserver-address:8080/config/redis ExecStartPre=-/usr/bin/docker kill %p ExecStartPre=-/usr/bin/docker rm %p ExecStart=/usr/bin/docker run --rm --name %p -v /tmp/config/redis-slave/redis.conf:/tmp/redis.conf redis:alpine /tmp/redis.conf We have made the configserver's address configserver-address in the preceding code, so make certain you fill in the appropriate IP for the system running the config server. How it works... We outsource the work of generating the configuration to the web server or beyond. This is a common idiom in modern cluster-oriented systems: many small pieces work together to make the whole. The idea of using a configuration URL is very flexible. In this case, it allows us to use a pre-packaged, official Docker image for an application that has no knowledge of the cluster, in its standard, default setup. While redis is fairly simple, the same concept can be used to generate and supply configurations for almost any legacy application. Translating etcd to configuration files In CoreOS, we have a well-suited database that is evidenced by its name and well suited to configuration (while the name etc is an abbreviation for the Latin et cetera, in common UNIX usage, /etc is where the system configuration is stored). It presents a standard HTTP server, which is easy to access from nearly anything. This makes storing application configuration in etcd a natural choice. The only problem is devising methods of storing the configuration in ways that are sufficiently expressive, flexible, and usable. Getting ready A naive but simple way of using etcd is to simply use it as a key-oriented file store as follows: etcdctl set myconfig $(cat mylocalconfig.conf |base64) etcdctl get myconfig |base64 -d > mylocalconfig.conf However, this method stores the configuration file in the database as a static, opaque blob and store/retrieve. Decoupling the generation from the consumption yields much more flexibility both in adapting configuration content to multiple consumers and producers and scaling out multiple access uses. How to do it... We can store and retrieve an entire configuration blob storage very simply as follows: etcdctl set /redis/config $(cat redis.conf |base64) etcdctl get /redis/config |base64 -d > redis.conf Or we can store more generally-structured data as follows: etcdctl set /redis/config/master 192.168.9.23 etcdctl set /redis/config/loglevel notice etcdctl set /redis/config/dbfile dump.rdb And use it in different ways: REDISMASTER=$(curl -s http://localhost:2379/v2/keys/redis/config/master |jq .node.value) cat <<ENDHERE >/etc/redis.conf slaveof $(curl -s http://localhost:2379/v2/keys/redis/config/master jq .node.value) loglevel $(etcdctl get /redis/config/loglevel) dbfile $(etcdctl get /redis/config/dbfile) ENDHERE Building EnvironmentFiles Environment variables are a popular choice for configuring container executions because nearly anything can read or write them, especially shell scripts. Moreover, they are always ephemeral, and by widely-accepted convention they override configuration file settings. Getting ready Systemd provides an EnvironmentFile directive that can be issued multiple times in a service file. This directive takes the argument of a filename that should contain key=value pairs to be loaded into the execution environment of the ExecStart program. CoreOS provides (in most non-bare metal installations) the file /etc/environment, which is formatted to be included with an EnvironmentFile statement. It typically contains variables describing the public and private IPs of the host. Environment file A common misunderstanding when starting out with Docker is about environment variables. Docker does not inherit the environment variables of the environment that calls docker run. Environment variables that are to be passed to the container must be explicitly stated using the -e option. This can be particularly confounding since systemd units do much the same thing. Therefore, to pass environments into Docker from a systemd unit, you need to define them both in the unit and in the docker run invocation. So this will work as expected: [Service] Environment=TESTVAR=testVal ExecStart=/usr/bin/docker -e TESTVAR=$TESTVAR nginx Whereas this will not: [Service] Environment=TESTVAR=unknowableVal ExecStart=/usr/bin/docker nginx How to do it... We will start by constructing an environment file generator unit. For testapp-env.service use the following: [Unit] Description=EnvironmentFile generator for testapp Before=testapp.service BindsTo=testapp.service [Install] RequiredBy=testapp.service [Service] ExecStart=/bin/sh -c "echo NOW=$(date +'%%'s) >/run/now.env" Type=oneshot RemainAfterExit=yes You may note the odd syntax for the date format. Systemd expands %s internally, so it needs to be escaped to be passed to the shell unmolested. For testapp.service use the following: [Unit] Description=My Amazing test app, configured by EnvironmentFile [Service] EnvironmentFile=/run/now.env ExecStart=/usr/bin/docker run --rm -p 8080:8080 -e NOW=${NOW} ulexus/environmentfile-demo If you are using fleet, you can submit these service files. If you are using raw systemd, you will need to install them into the /etc/systemd/system. Then issue the following: systemctl daemon-reload systemctl enable testapp-env.service systemctl start testapp.service testapp output How it works... The first unit writes the current UNIX timestamp to the file `/run/now.env and the second unit reads that file, parsing its contents into environment variables. We further pass the desired environment variables into the docker execution. Taking apart the first unit, there a number of important components. They are as follows: The Before statement tells systemd that the unit should be started before the main testapp. This is important so that the environment file exists before the service is started. Otherwise the unit will fail because the file does not exist or reads the wrong data if the file is stale. The BindsTo setting tells systemd that the unit should be stopped and started with testapp.service. This makes sure that it is restarted when testapp is restarted, refreshing the environment file. The RequiredBy setting tells systemd that this unit is required by the other unit. By stating the relationship in this manner, it allows the first unit to be separately enabled or disabled without any modification of the first unit. While that wouldn't matter in this case, in cases where the target service is a standard unit file which knows nothing about the helper unit, it allows us to use the add-on without fear of our changes to the official, standard service unit. The Type and RemainAfterExit combination of settings tells systemd to expect that the unit will exit, but to treat the unit as up even after it has exited. This allows the prerequisite to operate even though the unit has exited. In the second unit, the main service, the main thing to note is the EnvironmentFile line. It simply takes a file as an argument. We reference the file that was created (or updated) by the first script. Systemd reads it into the environment for any Exec* statements. Because Docker separates its containers' environments, we do still have to manually pass that variable into the container with the -e flag to docker run. There's more... You might be trying to figure out why we don't combine the units and try to set the environment variable with an ExecStartPre statement. Modifications to the environment from an Exec* statement are isolated from each other's Exec* statements. You can make changes to the environment within an Exec* statement, but those changes will not be carried over to any other Exec* statement. Also, you cannot execute any commands in an Environment or EnvironmentFile statement, nor can they expand any variables themselves. Building an active configuration manager Dynamic systems are, well, dynamic. They will often change while a dependent service is running. In such a case, simple runtime configuration systems as we have discussed thus far are insufficient. We need the ability to tell our dependent services to use the new, changed configuration. For such cases as this, we can implement active configuration management. In an active configuration, some processes monitor the state of dynamic components and notify or restart dependent services with the updated data. Getting ready Much like the active service announcer, we will be building our active configuration manager in Go, so a functional Go development environment is required. To increase readability, we have broken each subroutine into a separate file. How to do it... First, we construct the main routine, as follows: main.go: package main import ( "log" "os" "github.com/coreos/etcd/clientv3" "golang.org/x/net/context" ) var etcdKey = "web:backends" func main() { ctx := context.Background() log.Println("Creating etcd client") c, err := clientv3.NewFromURL(os.Getenv("ETCD_ENDPOINTS")) if err != nil { log.Fatal("Failed to create etcd client:", err) os.Exit(1) } defer c.Close() w := c.Watch(ctx, etcdKey, clientv3.WithPrefix()) for resp := range w { if resp.Canceled { log.Fatal("etcd watcher died") os.Exit(1) } go reconfigure(ctx, c) } } Next, our reconfigure routine, which pulls the current state from etcd, writes the configuration to file, and restarts our service, as follows: reconfigure.go: package main import ( "github.com/coreos/etcd/clientv3" "golang.org/x/net/context" ) // reconfigure haproxy func reconfigure(ctx context.Context, c *clientv3.Client) error { backends, err := get(ctx, c) if err != nil { return err } if err = write(backends); err != nil { return err } return restart() } The reconfigure routine just calls get, write and restart, in sequence. Let's create each of those as follows: get.go: package main import ( "bytes" "github.com/coreos/etcd/clientv3" "golang.org/x/net/context" ) // get the present list of backends func get(ctx context.Context, c *clientv3.Client) ([]string, error) { resp, err := clientv3.NewKV(c).Get(ctx, etcdKey) if err != nil { return nil, err } var backends = []string{} for _, node := range resp.Kvs { if node.Value != nil { v := bytes.NewBuffer(node.Value).String() backends = append(backends, v) } } return backends, nil } write.go: package main import ( "html/template" "os" ) var configTemplate *template.Template func init() { configTemplate = template.Must(template.New("config").Parse(configTemplateString)) } // Write the updated config file func write(backends []string) error { cf, err := os.Create("/config/haproxy.conf") if err != nil { return err } defer cf.Close() return configTemplate.Execute(cf, backends) } var configTemplateString = ` frontend public bind 0.0.0.0:80 default_backend servers backend servers {{range $index, $ip := .}} server srv-$index $ip {{end}} ` restart.go: package main import "github.com/coreos/go-systemd/dbus" // restart haproxy func restart() error { conn, err := dbus.NewSystemdConnection() if err != nil { return err } _, err = conn.RestartUnit("haproxy.service", "ignore-dependencies", nil) return err } With our active configuration manager available, we can now create a service unit to run it, as follows: haproxy-config-manager.service: [Unit] Description=Active configuration manager [Service] ExecStart=/usr/bin/docker run --rm --name %p -v /data/config:/data -v /var/run/dbus:/var/run/dbus -v /run/systemd:/run/systemd -e ETCD_ENDPOINTS=http://${COREOS_PUBLIC_IPV4}:2379 quay.io/ulexus/demo-active-configuration-manager Restart=always RestartSec=10 [X-Fleet] MachineOf=haproxy.service How it works... First, we monitor the pertinent keys in etcd. It helps to have all of the keys under one prefix, but if that isn't the case, we can simply add more watchers. When a change occurs, we pull the present values for all the pertinent keys from etcd and then rebuild our configuration file. Next, we tell systemd to restart the dependent service. If the target service has a valid ExecReload, we could tell systemd to reload, instead. In order to talk to systemd, we have passed in the dbus and systemd directories, to enable access to their respective sockets. Using fleet globals When you have a set of services that should be run on each of a set of machines, it can be tedious to run discrete and separate unit instances for each node. Fleet provides a reasonably flexible way to run these kinds of services, and when nodes are added, it will automatically start any declared globals on these machines. Getting ready In order to use fleet globals, you will need fleet running on each machine on which the globals will be executed. This is usually a simple matter of enabling fleet within the cloud-config as follows: #cloud-config coreos: fleet: metadata: service=nginx,cpu=i7,disk=ssd public-ip: "$public_ipv4" units: - name: fleet.service command: start How to do it... To make a fleet unit a global, simply declare the Global=true parameter in the [X-Fleet] section of the unit as follows: [Unit] Description=My global service [Service] ExecStart=/usr/bin/docker run --rm -p 8080:80 nginx [X-Fleet] Global=true Globals can also be filtered with other keys. For instance, a common filter is to run globals on all nodes that have certain metadata: [Unit] Description=My partial global service [Service] ExecStart=/usr/bin/docker run --rm -p 8080:80 nginx [X-Fleet] Global=true MachineMetadata=service=nginx Note that the metadata that is being referred to here is the fleet metadata, which is distinct from the instance metadata of your cloud provider or even the node tags of Kubernetes. How it works... Unlike most fleet units, there is not a one-to-one correspondence between the fleet unit instance and the actual running services. This has the side effect that modifications to a fleet global have immediate global effect. In other words, there is no rolling update with a fleet global. There is an immediate, universal replacement only. Hence, do not use globals for services that cannot be wholly down during upgrades. Summary We overcome the challenges for administrators who comes from traditional static deployment environments. We learned that we can't just build configuration or deploy it. It needs to be proactive in running environment. Any changes needs to be reloaded. Resources for Article: Further resources on this subject: How to Set Up CoreOS Environment [article] CoreOS Networking and Flannel Internals [article] Let's start with Extending Docker [article]
Read more
  • 0
  • 0
  • 2281

article-image-diving-data-search-and-report
Packt
17 Oct 2016
11 min read
Save for later

Diving into Data – Search and Report

Packt
17 Oct 2016
11 min read
In this article by Josh Diakun, Paul R Johnson, and Derek Mock authors of the books Splunk Operational Intelligence Cookbook - Second Edition, we will cover the basic ways to search the data in Splunk. We will cover how to make raw event data readable (For more resources related to this topic, see here.) The ability to search machine data is one of Splunk's core functions, and it should come as no surprise that many other features and functions of Splunk are heavily driven-off searches. Everything from basic reports and dashboards to data models and fully featured Splunk applications are powered by Splunk searches behind the scenes. Splunk has its own search language known as the Search Processing Language (SPL). This SPL contains hundreds of search commands, most of which also have several functions, arguments, and clauses. While a basic understanding of SPL is required in order to effectively search your data in Splunk, you are not expected to know all the commands! Even the most seasoned ninjas do not know all the commands and regularly refer to the Splunk manuals, website, or Splunk Answers (http://answers.splunk.com). To get you on your way with SPL, be sure to check out the search command cheat sheet and download the handy quick reference guide available at http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/SplunkEnterpriseQuickReferenceGuide. Searching Searches in Splunk usually start with a base search, followed by a number of commands that are delimited by one or more pipe (|) characters. The result of a command or search to the left of the pipe is used as the input for the next command to the right of the pipe. Multiple pipes are often found in a Splunk search to continually refine data results as needed. As we go through this article, this concept will become very familiar to you. Splunk allows you to search for anything that might be found in your log data. For example, the most basic search in Splunk might be a search for a keyword such as error or an IP address such as 10.10.12.150. However, searching for a single word or IP over the terabytes of data that might potentially be in Splunk is not very efficient. Therefore, we can use the SPL and a number of Splunk commands to really refine our searches. The more refined and granular the search, the faster the time to run and the quicker you get to the data you are looking for! When searching in Splunk, try to filter as much as possible before the first pipe (|) character, as this will save CPU and disk I/O. Also, pick your time range wisely. Often, it helps to run the search over a small time range when testing it and then extend the range once the search provides what you need. Boolean operators There are three different types of Boolean operators available in Splunk. These are AND, OR, and NOT. Case sensitivity is important here, and these operators must be in uppercase to be recognized by Splunk. The AND operator is implied by default and is not needed, but does no harm if used. For example, searching for the term error or success would return all the events that contain either the word error or the word success. Searching for error success would return all the events that contain the words error and success. Another way to write this can be error AND success. Searching web access logs for error OR success NOT mozilla would return all the events that contain either the word error or success, but not those events that also contain the word mozilla. Common commands There are many commands in Splunk that you will likely use on a daily basis when searching data within Splunk. These common commands are outlined in the following table: Command Description chart/timechart This command outputs results in a tabular and/or time-based output for use by Splunk charts. dedup This command de-duplicates results based upon specified fields, keeping the most recent match. eval This command evaluates new or existing fields and values. There are many different functions available for eval. fields This command specifies the fields to keep or remove in search results. head This command keeps the first X (as specified) rows of results. lookup This command looks up fields against an external source or list, to return additional field values. rare This command identifies the least common values of a field. rename This command renames the fields. replace This command replaces the values of fields with another value. search This command permits subsequent searching and filtering of results. sort This command sorts results in either ascending or descending order. stats This command performs statistical operations on the results. There are many different functions available for stats. table This command formats the results into a tabular output. tail This command keeps only the last X (as specified) rows of results. top This command identifies the most common values of a field. transaction This command merges events into a single event based upon a common transaction identifier. Time modifiers The drop-down time range picker in the Graphical User Interface (GUI) to the right of the Splunk search bar allows users to select from a number of different preset and custom time ranges. However, in addition to using the GUI, you can also specify time ranges directly in your search string using the earliest and latest time modifiers. When a time modifier is used in this way, it automatically overrides any time range that might be set in the GUI time range picker. The earliest and latest time modifiers can accept a number of different time units: seconds (s), minutes (m), hours (h), days (d), weeks (w), months (mon), quarters (q), and years (y). Time modifiers can also make use of the @ symbol to round down and snap to a specified time. For example, searching for sourcetype=access_combined earliest=-1d@d latest=-1h will search all the access_combined events from midnight, a day ago until an hour ago from now. Note that the snap (@) will round down such that if it were 12 p.m. now, we would be searching from midnight a day and a half ago until 11 a.m. today. Working with fields Fields in Splunk can be thought of as keywords that have one or more values. These fields are fully searchable by Splunk. At a minimum, every data source that comes into Splunk will have the source, host, index, and sourcetype fields, but some source might have hundreds of additional fields. If the raw log data contains key-value pairs or is in a structured format such as JSON or XML, then Splunk will automatically extract the fields and make them searchable. Splunk can also be told how to extract fields from the raw log data in the backend props.conf and transforms.conf configuration files. Searching for specific field values is simple. For example, sourcetype=access_combined status!=200 will search for events with a sourcetype field value of access_combined that has a status field with a value other than 200. Splunk has a number of built-in pre-trained sourcetypes that ship with Splunk Enterprise that might work with out-of-the-box, common data sources. These are available at http://docs.splunk.com/Documentation/Splunk/latest/Data/Listofpretrainedsourcetypes. In addition, Technical Add-Ons (TAs), which contain event types and field extractions for many other common data sources such as Windows events, are available from the Splunk app store at https://splunkbase.splunk.com. Saving searches Once you have written a nice search in Splunk, you may wish to save the search so that you can use it again at a later date or use it for a dashboard. Saved searches in Splunk are known as Reports. To save a search in Splunk, you simply click on the Save As button on the top right-hand side of the main search bar and select Report. Making raw event data readable When a basic search is executed in Splunk from the search bar, the search results are displayed in a raw event format by default. To many users, this raw event information is not particularly readable, and valuable information is often clouded by other less valuable data within the event. Additionally, if the events span several lines, only a few events can be seen on the screen at any one time. In this recipe, we will write a Splunk search to demonstrate how we can leverage Splunk commands to make raw event data readable, tabulating events and displaying only the fields we are interested in. Getting ready You should be familiar with the Splunk search bar and search results area. How to do it… Follow the given steps to search and tabulate the selected event data: Log in to your Splunk server. Select the Search & Reporting application from the drop-down menu located in the top left-hand side of the screen. Set the time range picker to Last 24 hours and type the following search into the Splunk search bar: index=main sourcetype=access_combined Then, click on Search or hit Enter. Splunk will return the results of the search and display the raw search events under the search bar. Let's rerun the search, but this time we will add the table command as follows: index=main sourcetype=access_combined | table _time, referer_domain, method, uri_path, status, JSESSIONID, useragent Splunk will now return the same number of events, but instead of presenting the raw events to you, the data will be in a nicely formatted table, displaying only the fields we specified. This is much easier to read! Save this search by clicking on Save As and then on Report. Give the report the name cp02_tabulated_webaccess_logs and click on Save. On the next screen, click on Continue Editing to return to the search. How it works… Let's break down the search piece by piece: Search fragment Description index=main All the data in Splunk is held in one or more indexes. While not strictly necessary, it is a good practice to specify the index (es) to search, as this will ensure a more precise search. sourcetype=access_combined This tells Splunk to search only the data associated with the access_combined sourcetype, which, in our case, is the web access logs. | table _time, referer_domain, method, uri_path, action, JSESSIONID, useragent Using the table command, we take the result of our search to the left of the pipe and tell Splunk to return the data in a tabular format. Splunk will only display the fields specified after the table command in the table of results.  In this recipe, you used the table command. The table command can have a noticeable performance impact on large searches. It should be used towards the end of a search, once all the other processing on the data by the other Splunk commands has been performed. The stats command is more efficient than the table command and should be used in place of table where possible. However, be aware that stats and table are two very different commands. There's more… The table command is very useful in situations where we wish to present data in a readable format. Additionally, tabulated data in Splunk can be downloaded as a CSV file, which many users find useful for offline processing in spreadsheet software or for sending to others. There are some other ways we can leverage the table command to make our raw event data readable. Tabulating every field Often, there are situations where we want to present every event within the data in a tabular format, without having to specify each field one by one. To do this, we simply use a wildcard (*) character as follows: index=main sourcetype=access_combined | table * Removing fields, then tabulating everything else While tabulating every field using the wildcard (*) character is useful, you will notice that there are a number of Splunk internal fields, such as _raw, that appear in the table. We can use the fields command before the table command to remove the fields as follows: index=main sourcetype=access_combined | fields - sourcetype, index, _raw, source date* linecount punct host time* eventtype | table * If we do not include the minus (-) character after the fields command, Splunk will keep the specified fields and remove all the other fields. Summary In this article we covered along with the introduction to Splunk, how to make raw event data readable Resources for Article: Further resources on this subject: Splunk's Input Methods and Data Feeds [Article] The Splunk Interface [Article] The Splunk Web Framework [Article]
Read more
  • 0
  • 0
  • 1170
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-how-build-desktop-app-using-electron
Amit Kothari
17 Oct 2016
9 min read
Save for later

How to build a desktop app using Electron

Amit Kothari
17 Oct 2016
9 min read
Desktop apps are making a comeback. Even companies with cloud-based applications with awesome web apps are investing in desktop apps to offer a better user experience. One example is team collaboration tool called Slack. They built a really good desktop app with web technologies using Electron. Electron is an open source framework used to build cross-platform desktop apps using web technologies. It uses Node.js and Chromium and allows us to develop desktop GUI apps using HTML, CSS and JavaScript. Electron is developed by GitHub, initially for Atom editor but now used by many companies, including Slack, Wordpress, Microsoft and Docker to name a few. Electron apps are web apps running in embedded Chromium web browser, with access to the full suite of Node.js modules and underlying operating system. In this post we will build a simple desktop app using Electron. Hello Electron Let’s start by creating a simple app. Before we start, we need Node.js and npm installed. Follow the instructions on the Node.js website if you do not have these installed already. Create a new director for your application and inside the app directory, create a package.json file by using the npm init command. Follow the prompts and remember to set main.js as the entry point. Once the file is generated, install electron-prebuild, which is the precomplied version of electron, and add it as a dev depenency in the package.json using the command npm install --save-dev electron-prebuilt. Also add "start": "electron ." under scripts, which we will use later to start our app. The package.json file will look something like this: { "name": "electron-tutorial", "version": "1.0.0", "description": "Electron Tutorial ", "main": "main.js", "scripts": { "start": "electron ." }, "devDependencies": { "electron-prebuilt": "^1.3.3" } } Create a file main.js with the following content: const {app, BrowserWindow} = require('electron'); // Global reference of the window object. let mainWindow; // When Electron finish initialization, create window and load app index.html app.on('ready', () => { mainWindow = new BrowserWindow({ width: 800, height: 600 }); mainWindow.loadURL(`file://${__dirname}/index.html`); }); We defined main.js as the entry point to our app in package.json. In main.js the electron app module controls the application lifecyle and BrowserWindow is used to create a native browser window. When Electron finishes initializing and our app is ready, we create a browser window to load our web page—index.html. As mentioned in the Electron documentation, remember to keep a global reference of the window object to avoid it from closing automatically when the JavaScript garbage collector kicks in. Finally, create the index.html file: <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>Hello Electron</title> </head> <body> <h1>Hello Electron</h1> </body> </html> We can now start our app by running the npm start command. Testing the Electron app Let’s write some integration tests for our app using Spectron. spectron allows us to test Electron apps using ChromeDriver and WebdriverIO. It is a test framework that is agnostic, but for this example, we will use mocha to write the tests. Let’s start by adding spectron and mocha as dev dependecies using the npm install --save-dev spectron and npm install --save-dev mocha commands. Then add "test": "./node_modules/mocha/bin/mocha" under scripts in the package.json file. This will be used to run our tests later. The package.json should look something like this: { "name": "electron-tutorial", "version": "1.0.0", "description": "Electron Tutorial ", "main": "main.js", "scripts": { "start": "electron .", "test": "./node_modules/mocha/bin/mocha" }, "devDependencies": { "electron-prebuilt": "^1.3.3", "mocha": "^3.0.2", "spectron": "^3.3.0" } } Now that we have all the dependencies installed, let’s write some tests. Create a directory called test and a file called test.js inside it. Copy the following content to test.js: var Application = require('spectron').Application; var electron = require('electron-prebuilt'); var assert = require('assert'); describe('Sample app', function () { var app; beforeEach(function () { app = new Application({ path: electron, args: ['.'] }); return app.start(); }); afterEach(function () { if (app && app.isRunning()) { return app.stop(); } }); it('should show initial window', function () { return app.browserWindow.isVisible() .then(function (isVisible) { assert.equal(isVisible, true); }); }); it('should have correct app title', function () { return app.client.getTitle() .then(function (title) { assert.equal(title, 'Hello Electron') }); }); }); Here we have couple of simple tests. We start the app before each test and stop after each test. The first test is to verify that the app's browserWindow is visible, and the second test is to verify the app’s title. We can run these tests using the npm run test command. spectron not only allows us to easily set up and tear down our app, but also give access to various APIs, allowing us to write sophisticated tests covering various business requirements. Please have a look at their documentation for more details. Packaging our app Now that we have a basic app, we are ready to package and build it for distribution. We will use electron-builder for this, which offers a complete solution to distribute apps on different platforms with the option to auto-update. It is recommended to use two separate package.jsons when using electron-builder, one for the development environment and build scripts and another one with app dependencies. But for our simple app, we can just use one package.json file. Let’s start by adding electron-builder as dev dependency using command npm install --save-dev electron-builder. Make sure you have the name, desciption, version and author defined in package.json. You also need to add electron-builder-specific options as build property in package.json: "build": { "appId": "com.amitkothari.electronsample", "category": "public.app-category.productivity" } For Mac OS, we need to specify appId and category. Look at the documentation for options for other platforms. Finally add script in package.json to package and build the app: "dist": "build" The updated package.json will look like this: { "name": "electron-tutorial", "version": "1.0.0", "description": "Electron Tutorial ", "author": "Amit Kothari", "main": "main.js", "scripts": { "start": "electron .", "test": "./node_modules/mocha/bin/mocha", "dist": "build" }, "devDependencies": { "electron-prebuilt": "^1.3.3", "mocha": "^3.0.2", "spectron": "^3.3.0", "electron-builder": "^5.25.1" }, "build": { "appId": "com.amitkothari.electronsample", "category": "public.app-category.productivity" } } Next we need to create a build directory under our project root directory. In this, put a file background.png for the Mac OS DMG background and icon.icns for app icon. We can now package our app by running the npm run dist command. Todo App We’ve built a very simple app, but Electron apps can do more than just show static text. Lets add some dynamic behavior to our app and convert it into a Todo list manager. We can use any JavaScript framework of choice, from AngularJS to React, with Electron, but for this example, we will use plain JavaScript. To start with, let’s update our index.html to display a todo list: <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>Hello Electron</title> <link rel="stylesheet" type="text/css" href="./style.css"> </head> <body> <div class="container"> <ul id="todoList"></ul> <textarea id="todoInput" placeholder="What needs to be done ?"></textarea> <button id="addTodoButton">Add to list</button> </div> </body> <script>require('./app.js')</script> </html> We also included style.css and app.js in index.html. All our CSS will be in style.css and our app logic will be in app.js. Create the style.css file with the following content: body { margin: 0; } ul { list-style-type: none; margin: 0; padding: 0; } li { padding: 10px; border-bottom: 1px solid #ddd; } button { background-color: black; color: #fff; margin: 5px; padding: 5px; cursor: pointer; border: none; font-size: 12px; } .container { width: 100%; } #todoInput { float: left; display: block; overflow: auto; margin: 15px; padding: 10px; font-size: 12px; width: 250px; } #addTodoButton { float: left; margin: 25px 10px; } And finally create the app.js file: (function () { const addTodoButton = document.getElementById('addTodoButton'); const todoList = document.getElementById('todoList'); // Create delete button for todo item const createTodoDeleteButton = () => { const deleteButton = document.createElement("button"); deleteButton.innerHTML = "X"; deleteButton.onclick = function () { this.parentNode.outerHTML = ""; }; return deleteButton; } // Create element to show todo text const createTodoText = (todo) => { const todoText = document.createElement("span"); todoText.innerHTML = todo; return todoText; } // Create a todo item with delete button and text const createTodoItem = (todo) => { const todoItem = document.createElement("li"); todoItem.appendChild(createTodoDeleteButton()); todoItem.appendChild(createTodoText(todo)); return todoItem; } // Clear input field const clearTodoInputField = () => { document.getElementById("todoInput").value = ""; } // Add new todo item and clear input field const addTodoItem = () => { const todo = document.getElementById('todoInput').value; if (todo) { todoList.appendChild(createTodoItem(todo)); clearTodoInputField(); } } addTodoButton.addEventListener("click", addTodoItem, false); } ()); Our app.js has a self invoking function which registers a listener (addTodoItem) on addTodoButton click event. On add button click event, the addTodoItem function will add a new todo item and clear the text area. Run the app again using the npm start command. Conclusion We built a very simple app, but it shows the potential of Electron. As stated on the Electron website, if you can build a website, you can build a desktop app. I hope you find this post interesting. If you have built an application with Electron, please share it with us. About the author Amit Kothari is a full-stack software developer based in Melbourne, Australia. He has 10+ years experience in designing and implementing software, mainly in Java/JEE. His recent experience is in building web applications using JavaScript frameworks such as React and AngularJS and backend microservices/REST API in Java. He is passionate about lean software development and continuous delivery.
Read more
  • 0
  • 0
  • 42912

article-image-first-projects-esp8266
Packt
17 Oct 2016
9 min read
Save for later

First Projects with the ESP8266

Packt
17 Oct 2016
9 min read
In this article by Marco Schwartz, author Internet of Things with ESP8266, we will focus on the ESP8266 chip is ready to be used and you can connect it to your Wi-Fi network, we can now build some basic projects with it. This will help you understand the basics of the ESP8266. (For more resources related to this topic, see here.) We are going to see three projects in this article: how to control an LED, how to read data from a GPIO pin, and how to grab the contents from a web page. We will also see how to read data from a digital sensor. Controlling an LED First, we are going to see how to control a simple LED. Indeed, the GPIO pins of the ESP8266 can be configured to realize many functions: inputs, outputs, PWM outputs, and also SPI or I2C communications. This first project will teach you how to use the GPIO pins of the chip as outputs. The first step is to add an LED to our project. These are the extra components you will need for this project: 5mm LED (https://www.sparkfun.com/products/9590) 330 Ohm resistor (to limit the current in the LED) (https://www.sparkfun.com/products/8377) The next step is to connect the LED with the resistor to the ESP8266 board. To do so, the first thing to do is to place the resistor on the breadboard. Then, place the LED on the breadboard as well, connecting the longest pin of the LED (the anode) to one pin of the resistor. Then, connect the other end of the resistor to the GPIO pin 5 of the ESP8266, and the other end of the LED to the ground. This is how it should look like at the end: We are now going to light up the LED by programming the ESP8266 chip, by connecting it to the Wi-Fi network. This is the complete code for this section: // Import required libraries #include <ESP8266WiFi.h> void setup() { // Set GPIO 5 as output pinMode(5, OUTPUT); // Set GPIO 5 on a HIGH state digitalWrite(5, HIGH); } void loop() { } This code simply sets the GPIO pin as an output, and then applies a HIGH state on it. The HIGH state means that the pin is active, and that positive voltage (3.3V) is applied on the pin. A LOW state would mean that the output is at 0V. You can now copy this code and paste it in the Arduino IDE. Then, upload the code to the board using the instructions from the previous article. You should immediately see that the LED is lighting up. You can shut it down again by using digitalWrite(5, LOW) in the code. You could also, for example, modify the code so the ESP8266 switches the LED on and off every second. Reading data from a GPIO pin As a second project in this article, we are going to read the state of a GPIO pin. For this, we will use the same pin as in the previous project. You can therefore remove the LED and the resistor that we used in the previous project. Now, simply connect this pin (GPIO 5) of the board to the positive power supply on your breadboard with a wire, therefore applying a 3.3V signal on this pin. Reading data from a pin is really simple. This is the complete code for this part: // Import required libraries #include <ESP8266WiFi.h> void setup(void) { // Start Serial (to display results on the Serial monitor) Serial.begin(115200); // Set GPIO 5 as input pinMode(5, INPUT);} void loop() { // Read GPIO 5 and print it on Serial port Serial.print("State of GPIO 5: "); Serial.println(digitalRead(5)); // Wait 1 second delay(1000); } We simply set the pin as an input, and then read the value of this pin, and print it out every second. Copy and paste this code into the Arduino IDE, then upload it to the board using the instructions from the previous article. This is the result you should get in the Serial monitor: State of GPIO 5: 1 We can see that the returned value is 1 (digital state HIGH), which is what we expected, because we connected the pin to the positive power supply. As a test, you can also connect the pin to the ground, and the state should go to 0. Grabbing the content from a web page As a last project in this article, we are finally going to use the Wi-Fi connection of the chip to grab the content of a page. We will simply use the www.example.com page, as it's a basic page largely used for test purposes. This is the complete code for this project: // Import required libraries #include <ESP8266WiFi.h> // WiFi parameters constchar* ssid = "your_wifi_network"; constchar* password = "your_wifi_password"; // Host constchar* host = "www.example.com"; void setup() { // Start Serial Serial.begin(115200); // We start by connecting to a WiFi network Serial.println(); Serial.println(); Serial.print("Connecting to "); Serial.println(ssid); WiFi.begin(ssid, password); while (WiFi.status() != WL_CONNECTED) { delay(500); Serial.print("."); } Serial.println(""); Serial.println("WiFi connected"); Serial.println("IP address: "); Serial.println(WiFi.localIP()); } int value = 0; void loop() { Serial.print("Connecting to "); Serial.println(host); // Use WiFiClient class to create TCP connections WiFiClient client; const int httpPort = 80; if (!client.connect(host, httpPort)) { Serial.println("connection failed"); return; } // This will send the request to the server client.print(String("GET /") + " HTTP/1.1rn" + "Host: " + host + "rn" + "Connection: closernrn"); delay(10); // Read all the lines of the reply from server and print them to Serial while(client.available()){ String line = client.readStringUntil('r'); Serial.print(line); } Serial.println(); Serial.println("closing connection"); delay(5000); } The code is really basic: we first open a connection to the example.com website, and then send a GET request to grab the content of the page. Using the while(client.available()) code, we also listen for incoming data, and print it all inside the Serial monitor. You can now copy this code and paste it into the Arduino IDE. This is what you should see in the Serial monitor: This is basically the content of the page, in pure HTML code. Reading data from a digital sensor In this last section of this article, we are going to connect a digital sensor to our ESP8266 chip, and read data from it. As an example, we will use a DHT11 sensor that can be used to get ambient temperature and humidity. You will need to get this component for this section, the DHT11 sensor (https://www.adafruit.com/products/386) Let's now connect this sensor to your ESP8266: First, place the sensor on the breadboard. Then, connect the first pin of the sensor to VCC, the second pin to pin #5 of the ESP8266, and the fourth pin of the sensor to GND. This is how it will look like at the end: Note that here I've used another ESP8266 board, the Adafruit ESP8266 breakout board. We will also use the aREST framework in this example, so it's easy for you to access the measurements remotely. aREST is a complete framework to control your ESP8266 boards remotely (including from the cloud), and we are going to use it several times in the article. You can find more information about it at the following URL: http://arest.io/. Let's now configure the board. The code is too long to be inserted here, but I will detail the most important part of it now. It starts by including the required libraries: #include "ESP8266WiFi.h" #include <aREST.h> #include "DHT.h" To install those libraries, simply look for them inside the Arduino IDE library manager. Next, we need to set the pin on which the DHT sensor is connected to: #define DHTPIN 5 #define DHTTYPE DHT11 After that we declare an instance of the DHT sensor: DHT dht(DHTPIN, DHTTYPE, 15); As earlier, you will need to insert your own Wi-Fi name and password inside the code: const char* ssid = "wifi-name"; const char* password = "wifi-pass"; We also define two variables that will hold the measurements of the sensor: float temperature; float humidity; In the setup() function of the sketch, we initialize the sensor: dht.begin(); Still in the setup() function, we expose the variables to the aREST API, so we can access them remotely via Wi-Fi: rest.variable("temperature",&temperature); rest.variable("humidity",&humidity); Finally, in the loop() function, we make the measurements from the sensor: humidity = dht.readHumidity(); temperature = dht.readTemperature(); It's now time to test the project! Simply grab all the code and put it inside the Arduino IDE. Also make sure to install the aREST Arduino library using the Arduino library manager. Now, put the ESP8266 board in bootloader mode, and upload the code to the board. After that, reset the board, and open the Serial monitor. You should see the IP address of the board being displayed: Now, we can access the measurements from the sensor remotely. Simply go to your favorite web browser, and type: 192.168.115.105/temperature You should immediately get the answer from the board, with the temperature being displayed: { "temperature": 25.00, "id": "1", "name": "esp8266", "connected": true } You can of course do the same with humidity. Note that we used here the aREST API. You can learn more about it at: http://arest.io/. Congratulations, you just completed your very first projects using the ESP8266 chip! Feel free to experiment with what you learned in this article, and start learning more about how to configure your ESP8266 chip. Summary In this article, we realized our first basic projects using the ESP8266 Wi-Fi chip. We first learned how to control a simple output, by controlling the state of an LED. Then, we saw how to read the state of a digital pin on the chip. Finally, we learned how to read data from a digital sensor, and actually grab this data using the aREST framework. We are going to go right into the main topic of the article, and build our first Internet of Things project using the ESP8266. Resources for Article: Further resources on this subject: Sending Notifications using Raspberry Pi Zero [article] The Raspberry Pi and Raspbian [article] Working with LED Lamps [article]
Read more
  • 0
  • 0
  • 14940

article-image-applying-themes-sails-applications-part-2
Luis Lobo
14 Oct 2016
4 min read
Save for later

Applying Themes to Sails Applications, Part 2

Luis Lobo
14 Oct 2016
4 min read
In Part 1 of this series covering themes in the Sails Framework, we bootstrapped our sample Sails app (step 1). Here in Part 2, we will complete steps 2 and 3, compiling our theme’s CSS and the necessary Less files and setting up the theme Sails hook to complete our application. Step 2 – Adding a task for compiling our theme's CSS and the necessary Less files Let’s pick things back up where we left of in Part 1. We now want to customize our page to have our burrito style. We need to add a task that compiles our themes. Edit your /tasks/config/less.js so that it looks like this one: module.exports = function (grunt) { grunt.config.set('less', { dev: { files: [{ expand: true, cwd: 'assets/styles/', src: ['importer.less'], dest: '.tmp/public/styles/', ext: '.css' }, { expand: true, cwd: 'assets/themes/export', src: ['*.less'], dest: '.tmp/public/themes/', ext: '.css' }] } }); grunt.loadNpmTasks('grunt-contrib-less'); }; Basically, we added a second object to the files section, which tells the Less compiler task to look for any Less file in assets/themes/export, compile it, and put the resulting CSS in the .tmp/public/themes folder. In case you were not aware of it, the .tmp/public folder is the one Sails uses to publish its assets. We now create two themes: one is default.less and the other is burrito.less, which is based on default.less. We also have two other Less files, each one holding the variables for each theme. This technique allows you to have one base theme and many other themes based on the default. /assets/themes/variables.less @app-navbar-background-color: red; @app-navbar-brand-color: white; /assets/themes/variablesBurrito.less @app-navbar-background-color: green; @app-navbar-brand-color: yellow; /assets/themes/export/default.less @import "../variables.less"; .navbar-inverse { background-color: @app-navbar-background-color; .navbar-brand { color: @app-navbar-brand-color; } } /assets/themes/export/burrito.less @import "default.less"; @import "../variablesBurrito.less"; So, burrito.less just inherits from default.less but overrides the variables with the ones on its own, creating a new theme based on the default. If you lift Sails now, you will notice that the Navigation bar has a red background on white. Step 3 – Setting up the theme Sails hook The last step involves creating a Hook, a Node module that adds functionality to the Sails corethat catches the hostname, and if it has burrito in it, sets the new theme. First, let’s create the folder for the hook: mkdir -p ./api/hooks/theme Now create a file named index.js in that folder with this content: /** * theme hook - Sets the correct CSS to be displayed */ module.exports = function (sails) { return { routes: { before: { 'all /*': function (req, res, next) { if (!req.isSocket) { // makes theme variable available in views res.locals.theme = sails.hooks.theme.getTheme(req); } returnnext(); } } }, /** * getTheme defines which css needs to be used for this request * In this case, we select the theme by pattern matching certain words from the hostname */ getTheme: function (req) { var hostname = 'default'; var theme = 'default'; try { hostname = req.get('host').toLowerCase(); } catch(e) { // host may not be available always (ie, socket calls. If you need that, add a Host header in your // sails socket configuration) } // if burrito is found on the hostname, change the theme if (hostname.indexOf('burrito') > -1) { theme = 'burrito'; } return theme; } }; }; Finally, to test our configuration, we need to add a host entry in our OS hosts file. In Linux/Unix-based operating systems, you have to edit /etc/hosts (with sudo or root). Add the following line: 127.0.0.1 burrito.smartdelivery.localwww.smartdelivery.local Now navigate using those host names, first to www.smartdelivery.local: And lastly, navigate to burrito.smartdelivery.local: You now have your Burrito Smart Delivery! And you have a Themed Sails Application! I hope you have enjoyed this series.  You can get the source code from here. Enjoy! About the author Luis Lobo Borobia is the CTO at FictionCity.NET, is a mentor and advisor, independent software engineer consultant, and conference speaker. He has a background as a software analyst and designer, creating, designing, and implementing software products, solutions, frameworks, and platforms for several kinds of industries. In the last few years, he has focused on research and development for the Internet of Things, using the latest bleeding-edge software and hardware technologies available.
Read more
  • 0
  • 0
  • 12744

article-image-bringing-devops-network-operations
Packt
14 Oct 2016
37 min read
Save for later

Bringing DevOps to Network Operations

Packt
14 Oct 2016
37 min read
In this article by Steven Armstrong, author of the book DevOps for Networking, we will focus on people and process with regards to DevOps. The DevOps initiative was initially about breaking down silos between development and operations teams and changing company's operational models. It will highlight methods to unblock IT staff and allow them to work in a more productive fashion, but these mindsets have since been extended to quality assurance testing, security, and now network operations. This article will primarily focus on the evolving role of the network engineer, which is changing like the operations engineer before them, and the need for network engineers to learn new skills that will allow network engineers to remain as valuable as they are today as the industry moves towards a completely programmatically controlled operational model. (For more resources related to this topic, see here.) This article will look at two differing roles, that of the CTO / senior manager and engineer, discussing at length some of the initiatives that can be utilized to facilitate the desired cultural changes that are required to create a successful DevOps transformation for a whole organization or even just allow a single department improve their internal processes by automating everything they do. In this article, the following topics will be covered: Initiating a change in behavior Top-down DevOps initiatives for networking teams Bottom-up DevOps initiatives for networking teams Initiating a change in behavior The networking OSI model contains seven layers, but it is widely suggested that the OSI model has an additional 8th layer named the user layer, which governs how end users integrate and interact with the network. People are undoubtedly a harder beast to master and manage than technology, so there is no one size fits all solution to the vast amount of people issues that exist. The seven layers of OS are shown in the following image: Initiating cultural change and changes in behavior is the most difficult task an organization will face, and it won't occur overnight. To change behavior there must first be obvious business benefits. It is important to first outline the benefits that these cultural changes will bring to an organization, which will enable managers or change agents to make business justifications to implement the required changes. Cultural change and dealing with people and processes is notoriously hard, so divorcing the tools and dealing with people and processes is paramount to the success of any DevOps initiative or project. Cultural change is not something that needs to be planned and a company initiative. In a recent study by Gartner, it was shown that selecting the wrong tooling was not the main reason that cloud projects were a failure, instead the top reason was failure to change the operational model. Reasons to implement DevOps When implementing DevOps, some myths are often perpetuated, such as DevOps only works for start-ups, it won't bring any value to a particular team, or that it is simply a buzz word and a fad. The quantifiable benefits of DevOps initiatives are undeniable when done correctly. Some of these benefits include improvements to the following: The velocity of change Mean time to resolve Improved uptime Increased number of deployments Cross-skilling between teams The removal of the bus factor of one Any team in the IT industry would benefit from these improvements, so really teams can't afford to not adopt DevOps, as it will undoubtedly improve their business functions. By implementing a DevOps initiative, it promotes repeatability, measurement, and automation. Implementing automation naturally improves the velocity of change and increased number of deployments a team can do in any given day and time to market. Automation of the deployment process allows teams to push fixes through to production quickly as well as allowing an organization to push new products and features to market. A byproduct of automation is that the mean time to resolve will also become quicker for infrastructure issues. If infrastructure or network changes are automated, they can be applied much more efficiently than if they were carried out manually. Manual changes depend on the velocity of the engineer implementing the change rather than an automated script that can be measured more accurately. Implementing DevOps also means measuring and monitoring efficiently too, so having effective monitoring is crucial on all parts of infrastructure and networking, as it means the pace in which root cause analysis can carried out improves. Having effective monitoring helps to facilitate the process of mean time to resolve, so when a production issue occurs, the source of the issue can be found quicker than numerous engineers logging onto consoles and servers trying to debug issues. Instead a well-implemented monitoring system can provide a quick notification to localize the source of the issue, silencing any resultant alarms that result from the initial root cause, allowing the issue to be highlighted and fixed efficiently. The monitoring then hands over to the repeatable automation, which can then push out the localized fix to production. This process provides a highly accurate feedback loop, where processes will improve daily. If alerts are missed, they will ideally be built into the monitoring system over time as part of the incident post-mortem. Effective monitoring and automation results in quicker mean time to resolve, which leads to happier customers, and results in improved uptime of products. Utilizing automation and effective monitoring also means that all members of a team have access to see how processes work and how fixes and new features are pushed out. This will mean less of a reliance on key individuals removing the bus factor of one where a key engineer needs to do the majority of tasks in the team as he is the most highly skilled individual and has all of the system knowledge stored in his head. Using a DevOps model means that the very highly skilled engineer can instead use their talents to help cross skill other team members and create effective monitoring that can help any team member carry out the root cause analysis they normally do manually. This builds the talented engineers deep knowledge into the monitoring system, so the monitoring system as opposed to the talented engineer becomes the go to point of reference when an issue first occurs, or ideally the monitoring system becomes the source of truth that alerts on events to prevent customer facing issues. To improve cross-skilling, the talented engineer should ideally help write automation too, so they are not the only member of the team that can carry out specific tasks. Reasons to implement DevOps for networking So how do some of those DevOps benefits apply to traditional networking teams? Some of the common complaints with siloed networking teams today are the following: Reactive Slow often using ticketing systems to collaborate Manual processes carried out using admin terminals Lack of preproduction testing Manual mistakes leading to network outages Constantly in firefighting mode Lack of automation in daily processes Network teams like infrastructure teams before them are essentially used to working in siloed teams, interacting with other teams in large organizations via ticketing systems or using suboptimal processes. This is not a streamlined or optimized way of working, which led to the DevOps initiative that sought to break down barriers between Development and Operations staff, but its remit has since widened. Networking does not seem to have been initially included in this DevOps movement yet, but software delivery can only operate as fast as the slowest component. The slowest component will eventually become the bottleneck or blocker of the entire delivery process. That slowest component often becomes the star engineer in a siloed team that can't process enough tickets in a day manually to keep up with demand, thus becoming the bus factor of one. If that engineer goes off sick, then work is blocked, the company becomes too reliant and cannot function efficiently without them. If a team is not operating in the same way as the rest of the business, then all other departments will be slowed down as the siloed department is not agile enough. Put simply, the reason networking teams exist in most companies is to provide a service to development teams. Development teams require networking to be deployed, so they deliver applications to production so that the business can make money from those products. So networking changes to ACL policies, load-balancing rules, and provisioning of new subnets for new applications can no longer be deemed acceptable if they take days, months or even weeks. Networking has a direct impact on the velocity of change, mean time to resolve, uptime, as well as the number of deployments, which are four of the key performance indicators of a successful DevOps initiative. So networking needs to be included in a DevOps model by companies, otherwise all of these quantifiable benefits will become constrained. Given the rapid way AWS, Microsoft Azure, OpenStack, and Software-defined Networking (SDN) can be used to provision network functions in the private and public cloud, it is no longer acceptable for network teams to not adapt their operational processes and learn new skills. But the caveat is that the evolution of networking has been quick, and they need the support and time to do this. If a cloud solution is implemented and the operational model does not change, then no real quantifiable benefits will be felt by the organization. Cloud projects traditionally do not fail because of technology, cloud projects fail because of the incumbent operational models that hinder them from being a success. There is zero value to be had from building a brand new OpenStack private cloud, with its open set of extensible APIs to manage compute, networking, and storage if a company doesn't change its operational model and allow end users to use those APIs to self-service their requests. If network engineers are still using the GUI to point and click and cut and paste then this doesn't bring any real business value as the network engineer that cuts and pastes the slowest is the bottleneck. The company may as well stick with their current processes as implementing a private cloud solution with manual processes will not result in a speeding up time to market or mean time to recover from failure. However, cloud should not be used as an excuse to deride your internal network staff with, as incumbent operational models in companies are typically not designed or set up by current staff, they are normally inherited. Moving to public cloud doesn't solve the problem of the operational agility of a company's network team, it is a quick fix and bandage that disguises the deeper rooted cultural challenges that exist. However, smarter ways of working allied with use of automation, measurement, and monitoring can help network teams refine their internal processes and facilitate the developers and operations staff that they work with daily. Cultural change can be initiated in two different ways, grass roots bottom-up initiatives coming from engineers, or top-down management initiatives. Top-down DevOps initiatives for networking teams Top-down DevOps initiatives are when a CTO, Directors, or Senior Manager have to buy in from the company to make changes to the operational model. These changes are required as the incumbent operational model is deemed suboptimal and not set up to deliver software at the speed of competitors, which inherently delays new products or crucial fixes from being delivered to market. When doing DevOps transformations from a top-down management level, it is imperative that some ground work is done with the teams involved, if large changes are going to be made to the operational model, it can often cause unrest or stress to staff on the ground. When implementing operational changes, upper management need to have the buy in of the people on the ground as they will operate within that model daily. Having teams buy in is a very important aspect; otherwise, the company will end up with an unhappy workforce, which will mean the best staff will ultimately leave. It is very important that upper management engage staff when implementing new operational processes and deal with any concerns transparently from the outset, as opposed to going for an offsite management meeting and coming back with an enforced plan, which is all too common a theme. Management should survey the teams to understand how they operate on a daily basis, what they like about the current processes and where their frustrations lie. The biggest impediment to changing an operational model is misunderstanding the current operational model. All initiatives should ideally be led and not enforced. So let's focus on some specific top-down initiatives that could be used to help. Analyzing successful teams One approach would be for the management is to look at other teams within the organization whose processes are working well and are delivering in an incremental agile fashion, if no other team in the organization is working in this fashion, then reach out to other companies. Ask if it would be possible to go and look at the way another company operate for a day. Most companies will happily use successful projects as reference cases to public audiences at conferences or meet-ups, as they enjoy showing their achievements, so it shouldn't be difficult to seek out companies that have overcome similar cultural challenges. It is good to attend some DevOps conferences and look at who is speaking, so approach the speakers and they will undoubtedly be happy to help. Management teams should initially book a meeting with the high-performing team and do a question and answer session focusing on the following points, if it is an external vendor then an introduction phone call can suffice. Some important questions to ask in the initial meeting are the following: Which processes normally work well? What tools they actually use on a daily basis? How is work assigned? How do they track work? What is the team structure? How do other teams make requests to the team? How is work prioritized? How do they deal with interruptions? How are meetings structured? It is important not to reinvent the wheel, if a team in the organization already has a proven template that works well, then that team could also be invaluable in helping facilitate cultural change within the networks team. It will be slightly more challenging if focus is put on an external team as the evangelist as it opens up excuses such as it being easier for them because of x, y, and z in their company. A good strategy, when utilizing a local team in the organization as the evangelist, is to embed a network engineer in that team for a few weeks and have them observe and give feedback how the other teams operate and document their findings. This is imperative, so the network engineers on the ground understand the processes. Flexibility is also important, as only some of the successful team's processes may be applicable to a network team, so don't expect two teams to work identically. The sum of parts and personal individuals in the team really do mean that every team is different, so focus on goals rather than the implementation of strict process. If teams achieve the same outcomes in slightly different ways, then as long as work can be tracked and is visible to management, it shouldn't be an issue as long as it can be easily reported on. Make sure pace is prioritized, select specific change agents to make sure teams are comfortable with new processes, so empower change agents in the network team to choose how they want to work by engaging with the team by creating new processes and also put them in charge of eventual tool selection. However, before selecting any tooling, it is important to start with process and agree on the new operational model to prevent tooling driving processes, this is a common mistake in IT. Mapping out activity diagrams A good piece of advice is to use an activity diagram as a visual aid to understand how a team's interactions work and where they can be improved. A typical development activity diagram, with manual hand-off to a quality assurance team is shown here: Utilizing activity diagrams as a visual aid is important as it highlights suboptimal business process flows. In the example, we see a development team's activity diagram. This process is suboptimal as it doesn't include the quality assurance team in the Test locally and Peer review phases. Instead it has a formalized QA hand-off phase, which is very late in the development cycle, and a suboptimal way of working as it promotes a development and QA silo, which is a DevOps anti-pattern. A better approach would be to have QA engineers work on creating test tasks and creating automated tests, whereas the development team works on coding tasks. This would allow the development Peer review process to have QA engineers' review and test developer code earlier in the development lifecycle and make sure that every piece of code written has appropriate test coverage before the code is checked in. Another shortcoming in the process is that it does not cater for software bugs found by the quality assurance team or in production by customers, so mapping these streams of work into the activity diagram would also be useful to show all potential feedback loops. If a feedback loop is missed in the overall activity diagram, then it can cause a breakdown in the process flow, so it is important to capture all permutations in the overarching flow that could occur before mapping tooling to facilitate the process. Each team should look at ways of shortening interactions to aid mean time to resolve and improve the velocity of change at which work can flow through the overall process. Management should dedicate some time in their schedule with the development, infrastructure, networking, and test teams and map out what they believe the team processes to be in their individual teams. Keep it high level, this should represent a simple activity swim-lane utilizing the start point where they accept work and the process the team goes through to deliver that work. Once each team has mapped out the initial approach, they should focus on optimizing it and removing the parts of the process they dislike and discuss ways the process could be improved as a team. It may take many iterations before this is mapped out effectively, so don't rush this process, it should be used as a learning experience for each team. The finalized activity diagram will normally include management and technical functions combined in an optimized way to show the overall process flow. Try not to bother using Business Process Management (BPM) software at this stage a simple white board will suffice to keep it simple and informal. It is a good practice to utilize two layers of an activity diagram, so the first layer can be a box that simply says Peer review, which then references a nested activity diagrams outlining what the teams peer review process is. Both need refined but the nested tier of business processes should be dictated by the individual teams as these are specific to their needs, so it's important to leave teams the flexibility they need at this level. It is important to split the two out tiers; otherwise, the overall top layer of activity diagram will be too complex to extract any real value from, so try and minimize the complexity at the top layer, as this will need to be integrated with other teams processes. The activity doesn't need to contain team-specific details such as how an internal team's Peer review process operates as this will always be subjective to that team; this should be included but will be a nested layer activity that won't be shared. Another team should be able to look at a team's top layer activity diagram and understand the process without explanation. It can sometimes be useful to first map out a high performing teams top layer activity diagram to show how an integrated joined up business process should look. This will help teams that struggle a bit more with these concepts and allow them to use that team's activity diagram as a guide. This can be used as a point of reference and show how these teams have solved their cross team interaction issues and facilitated one or more teams interacting without friction. The main aim of this exercise is to join up business processes, so they are not siloed between teams, so the planning and execution of work is as integrated as possible for joined up initiatives. Once each team has completed their individual activity diagram and optimized it to the way the team wants, the second phase of the process can begin. This involves layering each team's top layer of their activity diagrams together to create a joined up process. Teams should use this layering exercise as an excuse to talk about suboptimal processes and how the overall business process should look end to end. Utilize this session to remove perceived bottlenecks between teams, completely ignoring existing tools and the constraints of current tools, this whole exercise should be focusing on process not tooling. A good example of a suboptimum process flow that is constrained by tooling would be a stage on a top layer activity diagram that says raise ticket with ticketing system. This should be broken down so work is people focused, what does the person requesting the change actually require? Developers' day job involves writing code and building great features and products, so if a new feature needs a network change, then networking should be treated as part of that feature change. So the time taken for the network changes needs to be catered for as part of the planning and estimation for that feature rather than a ticketed request that will hinder the velocity of change when it is done reactively as an afterthought. This is normally a very successful exercise when engagement is good, it is good to utilize a senior engineer and manager from each team in the combined activity diagram layering exercise with more junior engineers involved in each team included in the team-specific activity diagram exercise. Changing the network team's operational model The network team's operational model at the end of the activity diagram exercise should ideally be fully integrated with the rest of the business. Once the new operational model has been agreed with all teams, it is time to implement it. It is important to note that because the teams on the ground created the operational model and joined up activity diagram, it should be signed off by all parties as the new business process. So this removes the issue of an enforced model from management as those using it have been involved in creating it. The operational model can be iterated and improved over time, but interactions shouldn't change greatly although new interaction points may be added that have been initially missed. A master copy of the business process can then be stored and updated, so anyone new joining the company knows exactly how to interact with other teams. Short term it may seem the new approach is slowing down development estimates as automation is not in place for network functions, so estimation for developer features becomes higher when they require network changes. This is often just a truer reflection of reality, as estimations didn't take into account network changes and then they became blockers as they were tickets, but once reported, it can be optimized and improved over time. Once the overall activity diagram has been merged together and agreed with all the teams, it is important to remember if the processes are properly optimized, there should not be pages and pages of high-level operations on the diagram. If the interactions are too verbose, it will take any change hours and hours to traverse each of the steps on the activity diagram. The activity diagram below shows a joined up business process, where work is either defined from a single roadmap producing user stories for all teams. New user stories, which are units of work, are then estimated out by cross-functional teams, including developers, infrastructure, quality assurance, and network engineers. Each team will review the user story and working out which cross-functional tasks are involved to deliver the feature. The user story then becomes part of the sprint with the cross-functional teams working on the user story together making sure that it has everything it needs to work prior to the check-in. After Peer review, the feature or change is then handed off to the automated processes to deliver the code, infrastructure, and network changes to production. The checked-in feature then flows through unit testing, quality assurance, integration, performance testing quality gates, which will include any new tests that were written by the quality assurance team before check-in. Once every stage is passed, the automation is invoked by a button press to push the changes to production. Each environment has the same network changes applied, so network changes are made first on test environments before production. This relies on treating networking as code, meaning automated network processes need to be created so the network team can be as agile as the developers. Once the agreed operational model is mapped out only then should the DevOps transformation begin. This will involve selecting the best of breed tools at every stage to deliver the desired outcome with the focus on the following benefits: The velocity of change Mean time to resolve Improved uptime Increased number of deployments Cross-skilling between teams The removal of the bus factor of one All business processes will be different for each company, so it is important to engage each department and have the buy in from all managers to make this activity a success. Changing the network teams behavior Once a new operational model has been established in the business, it is important to help prevent the network team from becoming the bottleneck in a DevOps-focused continuous delivery model. Traditionally, network engineers will be used to operating command lines and logging into admin consoles on network devices to make changes. Infrastructure engineers adjusted to automation as they already had scripting experience in bash and PowerShell coupled with a firm grounding in Linux or Windows operating systems, so transitioning to configuration management tooling was not a huge step. However, it may be more difficult to persuade network engineers from making that same transition initially. Moving network engineers towards coding against APIs and adopting configuration management tools may initially appear daunting, as it is a higher barrier to entry, but having an experienced automation engineer on hand, can help network engineers make this transition. It is important to be patient, so try to change this behavior gradually by setting some automation initiatives for the network team in their objectives. This will encourage the correct behavior and try and incentivize it too. It may be useful to start off automation initiatives by offering training or purchasing particular coding books for teams. It may also be useful to hold an initial automation hack day; this will give network engineers a day away from their day jobs and time to attempt to automate a small process, which is repeated everyday by network engineers. If possible, make this a mandatory exercise, so that it is adopted and make other teams available to cover for the network team, so they aren't distracted. This is a good way of seeing which members of the network team may be open to evangelizing DevOps and automation. If any particular individual stands out, then work with them to help push automation initiatives forward to the rest of the team by making them the champion for automation. Establishing an internal DevOps meet-up where teams present back their automation achievements is also a good way of promoting automation in network teams and this keeping the momentum going. Encourage each team across the business to present back interesting things they have achieved each quarter and incentivize this too by allowing each team time off from their day job to attend if they participate. This leads to a sense of community and illustrates to teams they are part of bigger movement that is bringing real cost benefits to the business. This also helps to focus teams on the common goal of making the company better and breaks down barriers between teams in the process. One approach that should be avoided at all costs is having other teams write all the network automation for networking teams. Ideally, it should be the networking team that evolve and adopt automation, so giving the network team a sense of ownership over the network automation is very important. This though requires full buy in from networking teams and discipline not to revert back to manual tasks at any point even if issues occur. To ease the transition offer to put an automation engineer into the network team from infrastructure or development, but this should only be a temporary measure. It is important to select an automation engineer that is respected by the network team and knowledgeable in networking, as no one should ever attempt to automate something that they cannot operate by hand, so having someone well-versed in networking to help with network automation is crucial, as they will be training the network team so have to be respected. If an automation engineer is assigned to the network team and isn't knowledgeable or respected, then the initiative will likely fail, so choose wisely. It is important to accept at an early stage, that this transition towards DevOps and automation may not be for everyone, so not every network engineer will be able to make the journey. It is all about the network team seizing the opportunity and showing initiative and willingness to pick up and learn new skills. It is important to stamp out disruptive behavior early on which may be a bad influence on the team. It is fine to have for people to have a cynical skepticism at first, but not attempting to change or build new skills shouldn't be tolerated, as it will disrupt the team dynamic and this should be monitored so it doesn't cause automation initiatives to fail or stall, just because individuals are proving to be blockers or being disruptive. It is important to note that every organization has its own unique culture and a company's rate of change will be subject to cultural uptake of the new processes and ways of working. When initiating cultural change, change agents are necessary and can come from internal IT staff or external sources depending on the aptitude and appetite of the staff to change. Every change project is different, but it is important that it has the correct individuals involved to make it a success along with the correct management sponsorship and backing. Bottom-up DevOps initiatives for networking teams Bottom-up DevOps initiatives are when an engineer, team leads, or lower management don't necessarily have buy in from the company to make changes to the operational model. However, they realize that although changes can't be made to the overall incumbent operational model, they can try and facilitate positive changes using DevOps philosophies within their team that can help the team perform better and make their productivity more efficient. When implementing DevOps initiatives from a bottom-up initiative, it is much more difficult and challenging at times as some individuals or teams may not be willing to change the way they work and operate as they don't have to. But it is important not to become disheartened and do the best possible job for the business. It is still possible to eventually convince upper management to implement a DevOps initiative using grass roots initiatives to prove the process brings real business benefits. Evangelizing DevOps in the networking team It is important to try and stay positive at all times, working on a bottom-up initiative can be tiring, but it is important to roll with the punches and not take things too personally. Always remain positive and try to focus on evangelizing the benefits associated with DevOps processes and positive behavior first within your own team. The first challenge is to convince your own team of the merits of adopting a DevOps approach before even attempting to convince other teams in the business. A good way of doing this is by showing the benefits that DevOps approach has made to other companies, such as Google, Facebook, and Etsy, focusing on what they have done in the networking space. A pushback from individuals may be the fact that these companies are unicorns and DevOps has only worked for companies for this reason, so be prepared to be challenged. Seek out initiatives that have been implemented by these companies that the networking team could adopt and are actually applicable to your company. In order to facilitate an environment of change, work out your colleagues drivers are, what motivates them? Try tailor the sell to individuals motivations, the sell to an engineer or manager may be completely different, an engineer on the ground may be motivated by the following: Doing more interesting work Developing skills and experience Helping automate menial daily tasks Learning sought-after configuration management skills Understanding the development lifecycle Learning to code A manager on the other hand will probably be more motivated by offering to measure KPI's that make his team look better such as: Time taken to implement changes Mean time to resolve failures Improved uptime of the network Another way to promote engagement is to invite your networking team to DevOps meet-ups arranged by forward thinking networking vendors. They may be amazed that most networking and load-balancing vendors are now actively promoting automation and DevOps and not yet be aware of this. Some of the new innovations in this space may be enough to change their opinions and make them interested in picking up some of the new approaches, so they can keep pace with the industry. Seeking sponsorship from a respected manager or engineer After making the network team aware of the DevOps initiatives, it is important to take this to the next stage. Seek out a respected manager or senior engineer in the networking team that may be open to trying out DevOps and automation. It is important to sell this person the dream, state how you are passionate about implementing some changes to help the team, and that you are keen to utilize some proven best practices that have worked well for other successful companies. It is important to be humble, try not to rant or spew generalized DevOps jargon to your peers, which can be very off-putting. Always make reasonable arguments and justify them while avoid making sweeping statements or generalizations. Try not to appear to be trying to undermine the manager or senior engineer, instead ask for their help to achieve the goal by seeking their approval to back the initiative or idea. A charm offensive may be necessary at this stage to convince the manager or engineer that it's a good idea but gradually building up to the request can help otherwise it may appear insincere if the request comes out the blue. Potentially analyze the situation over lunch or drinks and gauge if it is something they would be interested in, there is little point trying to convince people that are stubborn as they probably will not budge unless the initiative comes from above. Once you have found the courage to broach the subject, it is now time to put forward numerous suggestions on how the team could work differently with the help of a mediator that could take the form of a project manager. Ask for the opportunity to try this out on a small scale and offer to lead the initiative and ask for their support and backing. It is likely that the manager or senior engineer will be impressed at your initiative and allow you to run with the idea, but they may choose the initiative you implement. So, never suggest anything you can't achieve, you may only get one opportunity at this so it is important to make a good impression. Try and focus on a small task to start with; that's typically a pain point, and attempt to automate it. Anyone can write an automation script, but try and make the automation process easy to use, find what the team likes in the current process, and try and incorporate aspects of it. For example, if they often see the output from a command line displayed in a particular way, write the automation script so that it still displays the same output, so the process is not completely alien to them. Try not to hardcode values into scripts and extract them into a configuration files to make the automation more flexible, so it could potentially be used again in different ways. By showing engineers the flexibility of automation, it will encourage them to use it more, show other in the teams how you wrote the automation and ways they could adapt it to apply it to other activities. If this is done wisely, then automation will be adopted by enthusiastic members of the team, and you will gain enough momentum to impress the sponsor enough to take it forward onto more complex tasks. Automate a complex problem with the networking team The next stage of the process after building confidence by automating small repeatable tasks is to take on a more complex problem; this can be used to cement the use of automation within the networking team going forward. This part of the process is about empowering others to take charge, and lead automation initiatives themselves in the future, so will be more time-consuming. It is imperative that the more difficult to work with engineers that may have been deliberately avoided while building out the initial automation is involved this time. These engineers more than likely have not been involved in automation at all at this stage. This probably means the most certified person in the team and alpha of the team, nobody said it was going to be easy, but it will be worth it in the long run convincing the biggest skeptics of the merits of DevOps and automation. At this stage, automation within the network team should have enough credibility and momentum to broach the subject citing successful use cases. It's easier to involve all difficult individuals in the process rather than presenting ideas back to them at the end of the process. Difficult senior engineers or managers are less likely to shoot down your ideas in front of your peers if they are involved in the creation of the process and have contributed in some way. Try and be respectful, even if you do not agree with their viewpoints, but don't back down if you believe that you are correct or give up. Make arguments fact based and non-emotive, write down pros and cons, and document any concerns without ignoring them, you have to be willing to compromise but not to the point of devaluing the solution. There may actually be genuine risks involved that need addressed, so valid points should not be glossed over or ignored. Where possible seek backup from your sponsor if you are not sure on some of the points or feel individuals are being unreasonable. When implementing the complex automation task work as a team, not as an individual, this is a learning experience for others as well as yourself. Try and teach the network team a configuration management tool, they may just be scared try out new things, so go with a gentle approach. Potentially stopping at times to try out some online tutorials to familiarize everyone with the tool and try out various approaches to solve problems in the easiest way possible. Try and show the network engineers how easy it is to use configuration management tools and the benefits. Don't use complicated configuration management tools as it may put them off. The majority of network engineers can't currently code, something that will potentially change in the coming years. As stated before, infrastructure engineers at least had a grounding in bash or PowerShell to help get started, so pick tooling that they like and give them options. Try not to enforce tools they are not comfortable with. When utilizing automation, one of the key concerns for network engineers is peer review as they have a natural distrust that the automation has worked. Try and build in gated processes to address these concerns, automation doesn't mean any peer review so create a lightweight process to help. Make the automation easy to review by utilizing source control to show diffs and educate the network engineers on how to do this. Coding can be a scary prospect initially, so propose to do some team exercises each week on a coding or configuration management task. Work on it as a team. This makes it less threatening, and it is important to listen to feedback. If the consensus is that something isn't working well or isn't of benefit, then look at alternate ways to achieve the same goal that works for the whole team. Before releasing any new automated process, test it in preproduction environment, alongside an experienced engineer and have them peer review it, and try to make it fail against numerous test cases. There is only one opportunity to make a first impression, with a new process, so make sure it is a successful one. Try and set up knowledge-sharing session between the team to discuss the automation and make sure everyone knows how to do operations manually too, so they can easily debug any future issues or extend or amend the automation. Make sure that output and logging is clear to all users as they will all need to support the automation when it is used in production. Summary In this article, we covered practical initiatives, which when combined, will allow IT staff to implement successful DevOps models in their organization. Rather than just focusing on departmental issues, it has promoted using a set of practical strategies to change the day-to-day operational models that constrain teams. It also focuses on the need for network engineers to learn new skills and techniques in order to make the most of a new operational model and not become the bottleneck for delivery. This article has provided practical real-world examples that could help senior managers and engineers to improve their own companies, emphasizing collaboration between teams and showing that networking departments now required to automate all network operations to deliver at the pace expected by businesses. Key takeaways from this article are: DevOps is not just about development and operations staff it can be applied to network teams Before starting a DevOps initiative analyze successful teams or companies and what made them successful Senior management sponsorship is a key to creating a successful DevOps model Your own companies model will not identically mirror other companies, so try not to copy like for like, adapt it so that it works in your own organization Allow teams to create their own processes and don't dictate processes Allow change agents to initiate changes that teams are comfortable with Automate all operational work, start small, and build up to larger more complex problems once the team is comfortable with new ways of working Successful change will not happen overnight; it will only work through a model of continuous improvement Useful links on DevOps are: https://www.youtube.com/watch?v=TdAmAj3eaFI https://www.youtube.com/watch?v=gqmuVHw-hQw Resources for Article: Further resources on this subject: Jenkins 2.0: The impetus for DevOps Movement [article] Introduction to DevOps [article] Command Line Tools for DevOps [article]
Read more
  • 0
  • 0
  • 12947
article-image-deployment-and-devops
Packt
14 Oct 2016
16 min read
Save for later

Deployment and DevOps

Packt
14 Oct 2016
16 min read
 In this article by Makoto Hashimoto and Nicolas Modrzyk, the authors of the book Clojure Programming Cookbook, we will cover the recipe Clojure on Amazon Web Services. (For more resources related to this topic, see here.) Clojure on Amazon Web Services This recipe is a standalone dish where you can learn how to combine the elegance of Clojure with Amazon Web Services (AWS). AWS was started in 2006 and is used by many businesses as easy to use web services. This style of serverless services is becoming more and more popular. You can use computer resources and software services on demand, without the need of preparing hardware or installing software by yourselves. You will mostly make use of the amazonica library, which is a comprehensive Clojure client for the entire Amazon AWS set of APIs. This library wraps the Amazon AWS APIs and supports most of AWS services including EC2, S3, Lambda, Kinesis, Elastic Beanstalk, Elastic MapReduce, and RedShift. This recipe has received a lot of its content and love from Robin Birtle, a leading member of the Clojure Community in Japan. Getting ready You need an AWS account and credentials to use AWS, so this recipe starts by showing you how to do the setup and acquire the necessary keys to get started. Signing up on AWS You need to sign up AWS if you don't have your account in AWS yet. In this case, go to https://aws.amazon.com, click on Sign In to the Console, and follow the instruction for creating your account:   To complete the sign up, enter the number of a valid credit card and a phone number. Getting access key and secret access key To call the API, you now need your AWS's access key and secret access key. Go to AWS console and click on your name, which is located in the top right corner of the screen, and select Security Credential, as shown in the following screenshot: Select Access Keys (Access Key ID and Secret Access Key), as shown in the following screenshot:   Then, the following screen appears; click on New Access Key: You can see your access key and secret access key, as shown in the following screenshot: Copy and save these strings for later use. Setting up dependencies in your project.clj Let's add amazonica library to your project.clj and restart your REPL: :dependencies [[org.clojure/clojure "1.8.0"] [amazonica "0.3.67"]] How to do it… From there on, we will go through some sample usage of the core Amazon services, accessed with Clojure, and the amazonica library. The three main ones we will review are as follows: EC2, Amazon's Elastic Cloud, which allows to run Virtual Machines on Amazon's Cloud S3, Simple Storage Service, which gives you Cloud based storage SQS, Simple Queue Services, which gives you Cloud-based data streaming and processing Let's go through each of these one by one. Using EC2 Let's assume you have an EC2 micro instance in Tokyo region: First of all, we will declare core and ec2 namespace in amazonica to use: (ns aws-examples.ec2-example (:require [amazonica.aws.ec2 :as ec2] [amazonica.core :as core])) We will set the access key and secret access key for enabling AWS client API accesses AWS. core/defcredential does as follows: (core/defcredential "Your Access Key" "Your Secret Access Key" "your region") ;;=> {:access-key "Your Access Key", :secret-key "Your Secret Access Key", :endpoint "your region"} The region you need to specify is ap-northeast-1, ap-south-1, or us-west-2. To get full regions list, use ec2/describe-regions: (ec2/describe-regions) ;;=> {:regions [{:region-name "ap-south-1", :endpoint "ec2.ap-south-1.amazonaws.com"} ;;=> ..... ;;=> {:region-name "ap-northeast-2", :endpoint "ec2.ap-northeast-2.amazonaws.com"} ;;=> {:region-name "ap-northeast-1", :endpoint "ec2.ap-northeast-1.amazonaws.com"} ;;=> ..... ;;=> {:region-name "us-west-2", :endpoint "ec2.us-west-2.amazonaws.com"}]} ec2/describe-instances returns very long information as the following: (ec2/describe-instances) ;;=> {:reservations [{:reservation-id "r-8efe3c2b", :requester-id "226008221399", ;;=> :owner-id "182672843130", :group-names [], :groups [], .... To get only necessary information of instance, we define the following __get-instances-info: (defn get-instances-info[] (let [inst (ec2/describe-instances)] (->> (mapcat :instances (inst :reservations)) (map #(vector [:node-name (->> (filter (fn [x] (= (:key x)) "Name" ) (:tags %)) first :value)] [:status (get-in % [:state :name])] [:instance-id (:instance-id %)] [:private-dns-name (:private-dns-name %)] [:global-ip (-> % :network-interfaces first :private-ip-addresses first :association :public-ip)] [:private-ip (-> % :network-interfaces first :private-ip-addresses first :private-ip-address)])) (map #(into {} %)) (sort-by :node-name)))) ;;=> #'aws-examples.ec2-example/get-instances-info Let's try to use the following function: get-instances-info) ;;=> ({:node-name "ECS Instance - amazon-ecs-cli-setup-my-cluster", ;;=> :status "running", ;;=> :instance-id "i-a1257a3e", ;;=> :private-dns-name "ip-10-0-0-212.ap-northeast-1.compute.internal", ;;=> :global-ip "54.199.234.18", ;;=> :private-ip "10.0.0.212"} ;;=> {:node-name "EcsInstanceAsg", ;;=> :status "terminated", ;;=> :instance-id "i-c5bbef5a", ;;=> :private-dns-name "", ;;=> :global-ip nil, ;;=> :private-ip nil}) As in the preceding example function, we can obtain instance-id list. So, we can start/stop instances using ec2/start-instances and ec2/stop-instances_ accordingly: (ec2/start-instances :instance-ids '("i-c5bbef5a")) ;;=> {:starting-instances ;;=> [{:previous-state {:code 80, :name "stopped"}, ;;=> :current-state {:code 0, :name "pending"}, ;;=> :instance-id "i-c5bbef5a"}]} (ec2/stop-instances :instance-ids '("i-c5bbef5a")) ;;=> {:stopping-instances ;;=> [{:previous-state {:code 16, :name "running"}, ;;=> :current-state {:code 64, :name "stopping"}, ;;=> :instance-id "i-c5bbef5a"}]} Using S3 Amazon S3 is secure, durable, and scalable storage in AWS cloud. It's easy to use for developers and other users. S3 also provide high durability, availability, and low cost. The durability is 99.999999999 % and the availability is 99.99 %. Let's create s3 buckets names makoto-bucket-1, makoto-bucket-2, and makoto-bucket-3 as follows: (s3/create-bucket "makoto-bucket-1") ;;=> {:name "makoto-bucket-1"} (s3/create-bucket "makoto-bucket-2") ;;=> {:name "makoto-bucket-2"} (s3/create-bucket "makoto-bucket-3") ;;=> {:name "makoto-bucket-3"} s3/list-buckets returns buckets information: (s3/list-buckets) ;;=> [{:creation-date #object[org.joda.time.DateTime 0x6a09e119 "2016-08-01T07:01:05.000+09:00"], ;;=> :owner ;;=> {:id "3d6e87f691897059c23bcfb88b17da55f0c9aa02cc2a44e461f1594337059d27", ;;=> :display-name "tokoma1"}, ;;=> :name "makoto-bucket-1"} ;;=> {:creation-date #object[org.joda.time.DateTime 0x7392252c "2016-08-01T17:35:30.000+09:00"], ;;=> :owner ;;=> {:id "3d6e87f691897059c23bcfb88b17da55f0c9aa02cc2a44e461f1594337059d27", ;;=> :display-name "tokoma1"}, ;;=> :name "makoto-bucket-2"} ;;=> {:creation-date #object[org.joda.time.DateTime 0x4d59b4cb "2016-08-01T17:38:59.000+09:00"], ;;=> :owner ;;=> {:id "3d6e87f691897059c23bcfb88b17da55f0c9aa02cc2a44e461f1594337059d27", ;;=> :display-name "tokoma1"}, ;;=> :name "makoto-bucket-3"}] We can see that there are three buckets in your AWS console, as shown in the following screenshot: Let's delete two of the three buckets as follows: (s3/list-buckets) ;;=> [{:creation-date #object[org.joda.time.DateTime 0x56387509 "2016-08-01T07:01:05.000+09:00"], ;;=> :owner {:id "3d6e87f691897059c23bcfb88b17da55f0c9aa02cc2a44e461f1594337059d27", :display-name "tokoma1"}, :name "makoto-bucket-1"}] We can see only one bucket now, as shown in the following screenshot: Now we will demonstrate how to send your local data to s3. s3/put-object uploads a file content to the specified bucket and key. The following code uploads /etc/hosts and makoto-bucket-1: (s3/put-object :bucket-name "makoto-bucket-1" :key "test/hosts" :file (java.io.File. "/etc/hosts")) ;;=> {:requester-charged? false, :content-md5 "HkBljfktNTl06yScnMRsjA==", ;;=> :etag "1e40658df92d353974eb249c9cc46c8c", :metadata {:content-disposition nil, ;;=> :expiration-time-rule-id nil, :user-metadata nil, :instance-length 0, :version-id nil, ;;=> :server-side-encryption nil, :etag "1e40658df92d353974eb249c9cc46c8c", :last-modified nil, ;;=> :cache-control nil, :http-expires-date nil, :content-length 0, :content-type nil, ;;=> :restore-expiration-time nil, :content-encoding nil, :expiration-time nil, :content-md5 nil, ;;=> :ongoing-restore nil}} s3/list-objects lists objects in a bucket as follows: (s3/list-objects :bucket-name "makoto-bucket-1") ;;=> {:truncated? false, :bucket-name "makoto-bucket-1", :max-keys 1000, :common-prefixes [], ;;=> :object-summaries [{:storage-class "STANDARD", :bucket-name "makoto-bucket-1", ;;=> :etag "1e40658df92d353974eb249c9cc46c8c", ;;=> :last-modified #object[org.joda.time.DateTime 0x1b76029c "2016-08-01T07:01:16.000+09:00"], ;;=> :owner {:id "3d6e87f691897059c23bcfb88b17da55f0c9aa02cc2a44e461f1594337059d27", ;;=> :display-name "tokoma1"}, :key "test/hosts", :size 380}]} To obtain the contents of objects in buckets, use s3/get-object: (s3/get-object :bucket-name "makoto-bucket-1" :key "test/hosts") ;;=> {:bucket-name "makoto-bucket-1", :key "test/hosts", ;;=> :input-stream #object[com.amazonaws.services.s3.model.S3ObjectInputStream 0x24f810e9 ;;=> ...... ;;=> :last-modified #object[org.joda.time.DateTime 0x79ad1ca9 "2016-08-01T07:01:16.000+09:00"], ;;=> :cache-control nil, :http-expires-date nil, :content-length 380, :content-type "application/octet-stream", ;;=> :restore-expiration-time nil, :content-encoding nil, :expiration-time nil, :content-md5 nil, ;;=> :ongoing-restore nil}} The result is a map, the content is a stream data, and the value of :object-content. To get the result as a string, we will use slurp_ as follows: (slurp (:object-content (s3/get-object :bucket-name "makoto-bucket-1" :key "test/hosts"))) ;;=> "127.0.0.1tlocalhostn127.0.1.1tphenixnn# The following lines are desirable for IPv6 capable hostsn::1 ip6-localhost ip6-loopbacknfe00::0 ip6-localnetnff00::0 ip6-mcastprefixnff02::1 ip6-allnodesnff02::2 ip6-allroutersnn52.8.30.189 my-cluster01-proxy1 n52.8.169.10 my-cluster01-master1 n52.8.198.115 my-cluster01-slave01 n52.9.12.12 my-cluster01-slave02nn52.8.197.100 my-node01n" Using Amazon SQS Amazon SQS is a high-performance, high-availability, and scalable Queue Service. We will demonstrate how easy it is to handle messages on queues in SQS using Clojure: (ns aws-examples.sqs-example (:require [amazonica.core :as core] [amazonica.aws.sqs :as sqs])) To create a queue, you can use sqs/create-queue as follows: (sqs/create-queue :queue-name "makoto-queue" :attributes {:VisibilityTimeout 3000 :MaximumMessageSize 65536 :MessageRetentionPeriod 1209600 :ReceiveMessageWaitTimeSeconds 15}) ;;=> {:queue-url "https://sqs.ap-northeast-1.amazonaws.com/864062283993/makoto-queue"} To get information of queue, use sqs/get-queue-attributes as follows: (sqs/get-queue-attributes "makoto-queue") ;;=> {:QueueArn "arn:aws:sqs:ap-northeast-1:864062283993:makoto-queue", ... You can configure a dead letter queue using sqs/assign-dead-letter-queue as follows: (sqs/create-queue "DLQ") ;;=> {:queue-url "https://sqs.ap-northeast-1.amazonaws.com/864062283993/DLQ"} (sqs/assign-dead-letter-queue (sqs/find-queue "makoto-queue") (sqs/find-queue "DLQ") 10) ;;=> nil Let's list queues defined: (sqs/list-queues) ;;=> {:queue-urls ;;=> ["https://sqs.ap-northeast-1.amazonaws.com/864062283993/DLQ" ;;=> "https://sqs.ap-northeast-1.amazonaws.com/864062283993/makoto-queue"]} The following image is of the console of SQS: Let's examine URLs of queues: (sqs/find-queue "makoto-queue") ;;=> "https://sqs.ap-northeast-1.amazonaws.com/864062283993/makoto-queue" (sqs/find-queue "DLQ") ;;=> "https://sqs.ap-northeast-1.amazonaws.com/864062283993/DLQ" To send messages, we use sqs/send-message: (sqs/send-message (sqs/find-queue "makoto-queue") "hello sqs from Clojure") ;;=> {:md5of-message-body "00129c8cc3c7081893765352a2f71f97", :message-id "690ddd68-a2f6-45de-b6f1-164eb3c9370d"} To receive messages, we use sqs/receive-message: (sqs/receive-message "makoto-queue") ;;=> {:messages [ ;;=> {:md5of-body "00129c8cc3c7081893765352a2f71f97", ;;=> :receipt-handle "AQEB.....", :message-id "bd56fea8-4c9f-4946-9521-1d97057f1a06", ;;=> :body "hello sqs from Clojure"}]} To remove all messages in your queues, we use sqs/purge-queue: (sqs/purge-queue :queue-url (sqs/find-queue "makoto-queue")) ;;=> nil To delete queues, we use sqs/delete-queue: (sqs/delete-queue "makoto-queue") ;;=> nil (sqs/delete-queue "DLQ") ;;=> nil Serverless Clojure with AWS Lambda Lambda is an AWS product that allows you to run Clojure code without the hassle and expense of setting up and maintaining a server environment. Behind the scenes, there are still servers involved, but as far as you are concerned, it is a serverless environment. Upload a JAR and you are good to go. Code running on Lambda is invoked in response to an event, such as a file being uploaded to S3, or according to a specified schedule. In production environments, Lambda is normally used in wider AWS deployment that includes standard server environments to handle discrete computational tasks. Particularly those that benefit from Lambda's horizontal scaling that just happens with configuration required. For Clojurians working on personal project, Lambda is a wonderful combination of power and limitation. Just how far can you hack Lambda given the constraints imposed by AWS? Clojure namespace helloworld Start off with a clean empty projected generated using lein new. From there, in your IDE of choice, configure and package and a new Clojure source file. In the following example, the package is com.sakkam and the source file uses the Clojure namespace helloworld. The entry point to your Lambda code is a Clojure function that is exposed as a method of a Java class using Clojure's gen-class. Similar to use and require, the gen-class function can be included in the Clojure ns definition, as the following, or specified separately. You can use any name you want for the handler function but the prefix must be a hyphen unless an alternate prefix is specified as part of the :methods definition: (ns com.sakkam.lambda.helloworld (:gen-class :methods [^:static [handler [String] String]])) (defn -myhandler [s] (println (str "Hello," s))) From the command line, use lein uberjar to create a JAR that can be uploaded to AWS Lambda. Hello World – the AWS part Getting your Hello World to work is now a matter of creating a new Lambda within AWS, uploading your JAR, and configuring your handler. Hello Stream The handler method we used in our Hello World Lambda function was coded directly and could be extended to accept custom Java classes as part of the method signature. However, for more complex Java integrations, implementing one of AWS's standard interfaces for Lambda is both straightforward and feels more like idiomatic Clojure. The following example replaces our own definition of a handler method with an implementation of a standard interface that is provided as part of the aws-lambda-java-core library. First of all, add the dependency [com.amazonaws/aws-lambda-java-core "1.0.0"] into your project.clj. While you are modifying your project.clj, also add in the dependency for [org.clojure/data.json "0.2.6"] since we will be manipulating JSON formatted objects as part of this exercise. Then, either create a new Clojure namespace or modify your existing one so that it looks like the following (the handler function must be named -handleRequest since handleRequest is specified as part of the interface): (ns aws-examples.lambda-example (:gen-class :implements [com.amazonaws.services.lambda.runtime.RequestStreamHandler]) (:require [clojure.java.io :as io] [clojure.data.json :as json] [clojure.string :as str])) (defn -handleRequest [this is os context] (let [w (io/writer os) parameters (json/read (io/reader is) :key-fn keyword)] (println "Lambda Hello Stream Output ") (println "this class: " (class this)) (println "is class:" (class is)) (println "os class:" (class os)) (println "context class:" (class context)) (println "Parameters are " parameters)) (.flush w)) Use lein uberjar again to create a JAR file. Since we have an existing Lambda function in AWS, we can overwrite the JAR used in the Hello World example. Since the handler function name has changed, we must modify our Lambda configuration to match. This time, the default test that provides parameters in JSON format should work as is, and the result will look something like the following: We can very easily get a more interesting test of Hello Stream by configuring this Lambda to run whenever a file is uploaded to S3. At the Lambda management page, choose the Event Sources tab, click on Add Event, and choose an S3 bucket to which you can easily add a file. Now, upload a file to the specified S3 bucket and then navigate to the logs of the Hello World Lambda function. You will find that Hello World has been automatically invoked, and a fairly complicated object that represents the uploaded file is supplied as a parameter to our Lambda function. Real-world Lambdas To graduate from a Hello World Lambda to real-world Lambdas, the chances are you going to need richer integration with other AWS facilities. As a minimum, you will probably want to write a file to an S3 bucket or insert a notification into SNS queue. Amazon provides an SDK that makes this integration straightforward for developers using standard Java. For Clojurians, using the Amazon Clojure wrapper Amazonica is a very fast and easy way to achieve the same. How it works… Here, we will explain how AWS works. What Is Amazon EC2? Using EC2, we don't need to buy hardware or installing operating system. Amazon provides various types of instances for customers' use cases. Each instance type has varies combinations of CPU, memory, storage, and networking capacity. Some instance types are given in the following table. You can select appropriate instances according to the characteristics of your application. Instance type Description M4 M4 type instance is designed for general purpose computing. This family provides a balanced CPU, memory and network bandwidth C4 C4 type instance is designed for applications that consume CPU resources. C4 is the highest CPU performance with the lowest cost R3 R3 type instances are for memory-intensive applications G2 G2 type instances has NVIDIA GPU and is used for graphic applications and GPU computing applications such as deep learning   The following table shows the variations of models of M4 type instance. You can choose the best one among models. Model vCPU RAM (GiB) EBS bandwidth (Mbps) m4.large 2 8 450 m4.xlarge 4 16 750 m4.2xlarge 8 32 1,000 m4.4xlarge 16 64 2,000 m4.10xlarge 40 160 4,000   Amazon S3 Amazon S3 is storage for Cloud. It provides a simple web interface that allows you to store and retrieve data. S3 API is an ease of use but ensures security. S3 provides Cloud storage services and is scalable, reliable, fast, and inexpensive. Buckets and Keys Buckets are containers for objects stored in Amazon S3. Objects are stored in buckets. Bucket name is unique among all regions in the world. So, names of buckets are the top-level identities of S3 and units of charges and access controls. Keys are the unique identifiers for an object within a bucket. Every object in a bucket has exactly one key. Keys are the second-level identifiers and should be unique in a bucket. To identify an object, you use the combination of bucket name and key name. Objects Objects are accessed by a bucket names and keys. Objects consist of data and metadata. Metadata is a set of name-value pairs that describe the characteristics of object. Examples of metadata are the date last modified and content type. Objects can have multiple versions of data. There's more… It is clearly impossible to review all the different APIs for all the different services proposed via the Amazonica library, but you would probably get the feeling of having tremendous powers in your hands right now. (Don't forget to give that credit card back to your boss now …) Some other examples of Amazon services are as follows: Amazon IoT: This proposes a way to get connected devices easily and securely interact with cloud applications and other devices. Amazon Kinesis: This gives you ways of easily loading massive volumes of streaming data into AWS and easily analyzing them through streaming techniques. Summary We hope you enjoyed this appetizer to the book Clojure Programming Cookbook, which will present you a set of progressive readings to improve your Clojure skills, and make it so that Clojure becomes your de facto everyday language for professional and efficient work. This book presents different topics of generic programming, which are always to the point, with some fun so that each recipe feels not like a classroom, but more like a fun read, with challenging exercises left to the reader to gradually build up skills. See you in the book! Resources for Article: Further resources on this subject: Customizing Xtext Components [article] Reactive Programming and the Flux Architecture [article] Setup Routine for an Enterprise Spring Application [article]
Read more
  • 0
  • 0
  • 4887

article-image-features-dynamics-gp
Packt
14 Oct 2016
10 min read
Save for later

Features of Dynamics GP

Packt
14 Oct 2016
10 min read
 In this article by Ian Grieve and Mark Polino, authors of the book, Microsoft Dynamics GP 2016 Cookbook, we will see few features of Dynamics GP. Dynamics GP provides a number of features to better organize the overall system and improve its usefulness for all users; these recipes are designed for the use of administrators rather than typical users. This article is designed to demonstrate how to implement and fine-tune these features to provide the most benefit. In this article, we will look at the following topics: Speeding account entry with account aliases Cleaning account lookups by removing accounts from lookups Streamlining payables processing by prioritizing vendors Getting clarity with user-defined fields (For more resources related to this topic, see here.) Speeding account entry with account aliases As organizations grow up, the chart of accounts tends to grow larger and more complex as well. Companies want to segment their business by departments, locations, or divisions; all of this means that more and more accounts get added to the chart and, as the chart of accounts grows, it gets more difficult to select the right account. Dynamics GP provides the Account Alias feature as a way to quickly select the right account. Account aliases provide a way to create shortcuts to specific accounts which can dramatically speed up the process of selecting the correct account. We'll look at how that works in this recipe. Getting ready Setting up account aliases requires a user with access to the Account Maintenance window. To get to this window perform the following steps: Select Financial from the Navigation pane on the left. Click Accounts on the Financial Area page underCards. This will open the Account Maintenance window. Click the lookup button (magnifying glass) next to the account number or use the keyboard shortcut Ctrl + Q. Find and select account 000-2100-00. In the middle of the Account Maintenance window is the Account Alias field. Enter AP in the Alias field. This associates the letters AP with the accounts payable account selected. This means that the user now only has to enter AP instead of the full account number to use the accounts payable account: How to do it… Once aliases have been setup, let's see how the user can quickly select an account using the alias: To demonstrate how this works, click Financial on the Navigation pane on the left. Select General from the Financial area page under Transactions. On the Transaction Entry window, select the top line in the grid area on the lower half of the window. Click the expansion button (represented by a blue arrow) next to the Account heading to open the Account Entry window. In the Alias field type AP and press Enter: The Account Alias window will close and the account represented by the alias will appear in the Transaction Entry window: How it works… Account aliases provide quick shortcuts for account entry. Keeping them short and obvious makes them easy to use. Aliases are less useful if users have to think about them. Limiting them to the most commonly used accounts makes them more useful. Most users don't mind occasionally looking up the odd account but they shouldn't have to memorize long account strings for regularly used account numbers. It's counter-productive to put an alias on every account since that would make finding the right alias as difficult as finding the right account number. The setup process should be performed on the most commonly used accounts to provide easy access. Cleaning account lookups by removing accounts from lookups A consequence of company growth is that the chart of accounts grows and the account lookups can get clogged up by the number of accounts on the system. While the general ledger will stop showing an account in a lookup when the account is made inactive, other modules will continue to show these inactive codes. However, Dynamics GP does contain a feature which can be used to remove inactive account from lookups; this same feature can also be used to remove accounts from lookups in series where the account should not be used, such as a sales account in the purchasing or inventory series. How to do it… Here we will see how to remove inactive accounts from lookups. Open Financial from the Navigation pane on the left. In the main Area page, under Cards, select Account. Enter, or do a lookup for, the account to be made inactive and removed from the lookups: Mark the Inactive checkbox. Press and hold the Ctrl key and click on each of the lines in the Include in Lookup list. Click Save to commit the changes. Next time a lookup is done in any of the now deselected modules, the account will not be included in the list. If the account is to be included in lookups in some modules but not others, simply leave selected the modules in which the account should be included: How it works… Accounts will only show in lookups when the series is selected in the Include in Lookup list. For series other than General Ledger, simply marking an account as Inactive is not enough to remove it from the lookup although the code can't be used when the account is inactive. Streamlining payables processing by prioritizing vendors Management of vendor payments is a critical activity for any firm; it's even more critical in difficult economic times. Companies need to understand and control payments and a key component of this is prioritizing vendors. Every firm has both critical and expendable vendors. Paying critical vendors on time is a key business driver. For example, a newspaper that doesn't pay their newsprint supplier won't be in business long. However, they can safely delay payments to their janitorial vendor without worrying about going under. Dynamics GP provides a mechanism to prioritize vendors and apply those priorities when selecting which checks to print. That is the focus of this recipe. Getting ready Setting this up first requires that the company figure out who the priority vendors are. That part is beyond the scope of this book. The Vendor Priority field in Dynamics GP is a three-character field, but users shouldn't be seduced by the possibilities of three characters. A best practice is to keep the priorities simple by using 1, 2, 3 or A, B, C. Anything more complicated than that tends to confuse users and actually makes it harder to prioritize vendors. Once the vendor priorities have been determined, the priority needs to be set in Dynamics GP. Attaching a priority to a vendor is the first step. To do that follow these steps: Select Purchasing from the Navigation pane. In the Purchasing area page under Cards, click Vendor Maintenance. Once the Vendor Maintenance window opens, select the lookup button (magnifying glass) next to Vendor ID. Select a vendor and click OK. Once the vendor information is populated, click the Options button. This opens the Vendor Maintenance Options screen. In the center left is the Payment Priority field. Enter 1 in Payment Priority and click Save: How to do it… Now that a vendor has been set up with a priority, let's see how to apply that information when selecting checks to print: To use vendor priorities to select invoices for payment, click Select Checks from the Transactions on the Purchasing area page. In the Select Payables Checks window enter CHECKS to name the check batch. Press Tab to move off of the Batch ID field and click Add to add the batch. Pick a checkbook ID and click Save to save the batch. In the Select By field, click the drop down box and select Payment Priority. Enter 1 in both the From and To boxes. Click the Insert >> button to lock in Payment Priority as an option: Click Build Batch at the top. If there are any transactions where the vendor is set to a priority of 1 this will populate a batch of checks based on the vendor priority: How it works… Since priority is one of the built-in options for selecting checks, it's easy to ensure that high priority vendors get selected to be paid first. All of this is easily accomplished with basic Dynamics GP functionality that most people miss. Getting clarity with user-defined fields Throughout Dynamics GP, maintenance cards typically include at least two user defined fields. User-defined fields can be renamed in the setup screen for the related module. This provides a great mechanism to add in special information. We'll take a look at a typical use of a user defined field in this recipe. How to do it… For our example, we'll look at using a user-defined field to rename the User-Defined1 field to Region in Customer Master:. To do so use the following steps: From the Navigation pane select Sales. In the Sales area page click Setup, then Receivables, and finally Options. In the User-Defined 1 field type Region and click OK to close each window: Back on the Sales area page click Customer under the Cards area. On the bottom left above User-Defined 2 is the newly named Region field ready to be filled in: How it works… Changing the field name only changes the display field; it doesn't change the underlying field name in the database. SmartLists are smart enough to show the new name. In our example, the description Region would appear in a SmartList, not User-Defined 1. User-defined fields like this are present for customers, vendors, accounts, sales orders, fixed assets, inventory items, and purchase receipts among others. They can each be renamed in their respective setup screens. There's more... All user defined fields are not the same; some have special features. Special User-Defined 1 features User-Defined 1 has special features inside of Dynamics GP. Most of the built-in reports inside of Dynamics GP allow sorting and selection with the User-Defined 1 field. These options aren't provided for User-Defined 2. Consequently, administrators should carefully consider what information belongs in User-Defined 1 before changing its name since the effects of this selection will be felt throughout the system. Company setup user-defined fields On the Company Setup window there are two user defined fields at the top right and there is no option in Dynamics GP to rename these fields. The Company Setup window is accessed by clicking Administration on the Navigation pane, then clicking on Company under the Setup and Company headers. Expanded user-defined fields Certain areas such as Fixed Assets, Inventory Items, and Purchase Receipts have more complex types of user-defined fields that can include dates, list selections, and currency. Summary So in this article we covered few features of Dynamics GP such as speeding account entries, cleaning account lookups, and so on.For more information on Dynamics GP, you can check other books by Packt, mentioned as follows: Microsoft Dynamics GP 2010 Cookbook: https://www.packtpub.com/application-development/microsoft-dynamics-gp-2010-cookbook Microsoft Dynamics GP 2013 Cookbook: https://www.packtpub.com/application-development/microsoft-dynamics-gp-2013-cookbook Microsoft Dynamics GP 2010 Implementation: https://www.packtpub.com/application-development/microsoft-dynamics-gp-2010-implementation Resources for Article: Further resources on this subject: Code Analysis and Debugging Tools in Microsoft Dynamics NAV 2009 [article] Connecting to Microsoft SQL Server Compact 3.5 with Visual Studio [article] MySQL Data Transfer using Sql Server Integration Services (SSIS) [article]
Read more
  • 0
  • 0
  • 2184

article-image-loops-conditions-and-recursion
Packt
14 Oct 2016
14 min read
Save for later

Loops, Conditions, and Recursion

Packt
14 Oct 2016
14 min read
In this article from Paul Johnson, author of the book Learning Rust, we would take a look at how loops and conditions within any programming language are a fundamental aspect of operation. You may be looping around a list attempting to find when something matches, and when a match occurs, branching out to perform some other task; or, you may just want to check a value to see if it meets a condition. In any case, Rust allows you to do this. (For more resources related to this topic, see here.) In this article, we will cover the following topics: Types of loop available Different types of branching within loops Recursive methods When the semi-colon (;) can be omitted and what it means Loops Rust has essentially three types of loop—for, loop, and while. The for loop This type of loop is very simple to understand, yet rather powerful in operation. It is simple. In that, we have a start value, an end condition, and some form of value change. Although, the power comes in those two last points. Let's take a simple example to start with—a loop that goes from 0 to 10 and outputs the value: for x in 0..10 { println!("{},", x); } We create a variable x that takes the expression (0..10) and does something with it. In Rust terminology, x is not only a variable but also an iterator, as it gives back a value from a series of elements. This is obviously a very simple example. We can also go down as well, but the syntax is slightly different. In C, you will expect something akin to for (i = 10; i > 0; --i). This is not available in Rust, at least, not in the stable branches. Instead, we will use the rev() method, which is as follows: for x in (0..10).rev() { println!("{},", x); } It is worth noting that, as with the C family, the last number is to be excluded. So, for the first example, the values outputted are 9 to 0; essentially, the program generates the output values from 0 to 10 and then outputs them in reverse. Notice also that the condition is in braces. This is because the second parameter is the condition. In C#, this will be the equivalent of a foreach. In Rust, it will be as follows: for var in condition { // do something } The C# equivalent for the preceding code is: foreach(var t in condition) // do something Using enumerate A loop condition can also be more complex using multiple conditions and variables. For example, the for loop can be tracked using enumerate. This will keep track of how many times the loop has executed, as shown here: for(i, j) in (10..20).enumerate() { println!("loop has executed {} times. j = {}", i, j); } 'The following is the output: The enumeration is given in the first variable with the condition in the second. This example is not of that much use, but where it comes into its own is when looping over an iterator. Say we have an array that we need to iterate over to obtain the values. Here, the enumerate can be used to obtain the value of the array members. However, the value returned in the condition will be a pointer, so a code such as the one shown in the following example will fail to execute (line is a & reference whereas an i32 is expected) fn main() { let my_array: [i32; 7] = [1i32,3,5,7,9,11,13]; let mut value = 0i32; for(_, line) in my_array.iter().enumerate() { value += line; } println!("{}", value); } This can be simply converted back from the reference value, as follows: for(_, line) in my_array.iter().enumerate() { value += *line; } The iter().enumerate() method can equally be used with the Vec type, as shown in the following code: fn main() { let my_array = vec![1i32,3,5,7,9,11,13]; let mut value = 0i32; for(_,line) in my_array.iter().enumerate() { value += *line; } println!("{}", value); } In both cases, the value given at the end will be 49, as shown in the following screenshot: The _ parameter You may be wondering what the _ parameter is. It's Rust, which means that there is an argument, but we'll never do anything with it, so it's a parameter that is only there to ensure that the code compiles. It's a throw-away. The _ parameter cannot be referred to either; whereas, we can do something with linenumber in for(linenumber, line), but we can't do anything with _ in for(_, line). The simple loop A simple form of the loop is called loop: loop { println!("Hello"); } The preceding code will either output Hello until the application is terminated or the loop reaches a terminating statement. While… The while condition is of slightly more use, as you will see in the following code snippet: while (condition) { // do something } Let's take a look at the following example: fn main() { let mut done = 0u32; while done != 32 { println!("done = {}", done); done+=1; } } The preceding code will output done = 0 to done = 31. The loop terminates when done equals 32. Prematurely terminating a loop Depending on the size of the data being iterated over within a loop, the loop can be costly on processor time. For example, say the server is receiving data from a data-logging application, such as measuring values from a gas chromatograph, over the entire scan, it may record roughly half a million data points with an associated time position. For our purposes, we want to add all of the recorded values until the value is over 1.5 and once that is reached, we can stop the loop. Sound easy? There is one thing not mentioned, there is no guarantee that the recorded value will ever reach over 1.5, so how can we terminate the loop if the value is reached? We can do this one of two ways. First is to use a while loop and introduce a Boolean to act as the test condition. In the following example, my_array represents a very small subsection of the data sent to the server. fn main() { let my_array = vec![0.6f32, 0.4, 0.2, 0.8, 1.3, 1.1, 1.7, 1.9]; let mut counter: usize = 0; let mut result = 0f32; let mut test = false; while test != true { if my_array[counter] > 1.5 { test = true; } else { result += my_array[counter]; counter += 1; } } println!("{}", result); } The result here is 4.4. This code is perfectly acceptable, if slightly long winded. Rust also allows the use of break and continue keywords (if you're familiar with C, they work in the same way). Our code using break will be as follows: fn main() { let my_array = vec![0.6f32, 0.4, 0.2, 0.8, 1.3, 1.1, 1.7, 1.9]; let mut result = 0f32; for(_, value) in my_array.iter().enumerate() { if *value > 1.5 { break; } else { result += *value; } } println!("{}", result); } Again, this will give an answer of 4.4, indicating that the two methods used are the equivalent of each other. If we replace break with continue in the preceding code example, we will get the same result (4.4). The difference between break and continue is that continue jumps to the next value in the iteration rather than jumping out, so if we had the final value of my_array as 1.3, the output at the end should be 5.7. When using break and continue, always keep in mind this difference. While it may not crash the code, mistaking break and continue may lead to results that you may not expect or want. Using loop labels Rust allows us to label our loops. This can be very useful (for example with nested loops). These labels act as symbolic names to the loop and as we have a name to the loop, we can instruct the application to perform a task on that name. Consider the following simple example: fn main() { 'outer_loop: for x in 0..10 { 'inner_loop: for y in 0..10 { if x % 2 == 0 { continue 'outer_loop; } if y % 2 == 0 { continue 'inner_loop; } println!("x: {}, y: {}", x, y); } } } What will this code do? Here x % 2 == 0 (or y % 2 == 0) means that if variable divided by two returns no remainder, then the condition is met and it executes the code in the braces. When x % 2 == 0, or when the value of the loop is an even number, we will tell the application to skip to the next iteration of outer_loop, which is an odd number. However, we will also have an inner loop. Again, when y % 2 is an even value, we will tell the application to skip to the next iteration of inner_loop. In this case, the application will output the following results: While this example may seem very simple, it does allow for a great deal of speed when checking data. Let's go back to our previous example of data being sent to the web service. Recall that we have two values—the recorded data and some other value, for ease, it will be a data point. Each data point is recorded 0.2 seconds apart; therefore, every 5th data point is 1 second. This time, we want all of the values where the data is greater than 1.5 and the associated time of that data point but only on a time when it's dead on a second. As we want the code to be understandable and human readable, we can use a loop label on each loop. The following code is not quite correct. Can you spot why? The code compiles as follows: fn main() { let my_array = vec![0.6f32, 0.4, 0.2, 0.8, 1.3, 1.1, 1.7, 1.9, 1.3, 0.1, 1.6, 0.6, 0.9, 1.1, 1.31, 1.49, 1.5, 0.7]; let my_time = vec![0.2f32, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, 3.6, 3.8]; 'time_loop: for(_, time_value) in my_time.iter().enumerate() { 'data_loop: for(_, value) in my_array.iter().enumerate() { if *value < 1.5 { continue 'data_loop; } if *time_value % 5f32 == 0f32 { continue 'time_loop; } println!("Data point = {} at time {}s", *value, *time_value); } } } This example is a very good one to demonstrate the correct operator in use. The issue is the if *time_value % 5f32 == 0f32 line. We are taking a float value and using the modulus of another float to see if we end up with 0 as a float. Comparing any value that is not a string, int, long, or bool type to another is never a good plan; especially, if the value is returned by some form of calculation. We can also not simply use continue on the time loop, so, how can we solve this problem? If you recall, we're using _ instead of a named parameter for the enumeration of the loop. These values are always an integer, therefore if we replace _ for a variable name, then we can use % 5 to perform the calculation and the code becomes: 'time_loop: for(time_enum, time_value) in my_time.iter().enumerate() { 'data_loop: for(_, value) in my_array.iter().enumerate() { if *value < 1.5 { continue 'data_loop; } if time_enum % 5 == 0 { continue 'time_loop; } println!("Data point = {} at time {}s", *value, *time_value); } } The next problem is that the output isn't correct. The code gives the following: Data point = 1.7 at time 0.4s Data point = 1.9 at time 0.4s Data point = 1.6 at time 0.4s Data point = 1.5 at time 0.4s Data point = 1.7 at time 0.6s Data point = 1.9 at time 0.6s Data point = 1.6 at time 0.6s Data point = 1.5 at time 0.6s The data point is correct, but the time is way out and continually repeats. We still need the continue statement for the data point step, but the time step is incorrect. There are a couple of solutions, but possibly the simplest will be to store the data and the time into a new vector and then display that data at the end. The following code gets closer to what is required: fn main() { let my_array = vec![0.6f32, 0.4, 0.2, 0.8, 1.3, 1.1, 1.7, 1.9, 1.3, 0.1, 1.6, 0.6, 0.9, 1.1, 1.31, 1.49, 1.5, 0.7]; let my_time = vec![0.2f32, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, 3.6, 3.8]; let mut my_new_array = vec![]; let mut my_new_time = vec![]; 'time_loop: for(t, _) in my_time.iter().enumerate() { 'data_loop: for(v, value) in my_array.iter().enumerate() { if *value < 1.5 { continue 'data_loop; } else { if t % 5 != 0 { my_new_array.push(*value); my_new_time.push(my_time[v]); } } if v == my_array.len() { break; } } } for(m, my_data) in my_new_array.iter().enumerate() { println!("Data = {} at time {}", *my_data, my_new_time[m]); } } We will now get the following output: Data = 1.7 at time 1.4 Data = 1.9 at time 1.6 Data = 1.6 at time 2.2 Data = 1.5 at time 3.4 Data = 1.7 at time 1.4 Yes, we now have the correct data, but the time starts again. We're close, but it's not right yet. We aren't continuing the time_loop loop and we will also need to introduce a break statement. To trigger the break, we will create a new variable called done. When v, the enumerator for my_array, reaches the length of the vector (this is the number of elements in the vector), we will change this from false to true. This is then tested outside of the data_loop. If done == true, break out of the loop. The final version of the code is as follows: fn main() { let my_array = vec![0.6f32, 0.4, 0.2, 0.8, 1.3, 1.1, 1.7, 1.9, 1.3, 0.1, 1.6, 0.6, 0.9, 1.1, 1.31, 1.49, 1.5, 0.7]; let my_time = vec![0.2f32, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, 3.6]; let mut my_new_array = vec![]; let mut my_new_time = vec![]; let mut done = false; 'time_loop: for(t, _) in my_time.iter().enumerate() { 'data_loop: for(v, value) in my_array.iter().enumerate() { if v == my_array.len() - 1 { done = true; } if *value < 1.5 { continue 'data_loop; } else { if t % 5 != 0 { my_new_array.push(*value); my_new_time.push(my_time[v]); } else { continue 'time_loop; } } } if done {break;} } for(m, my_data) in my_new_array.iter().enumerate() { println!("Data = {} at time {}", *my_data, my_new_time[m]); } } Our final output from the code is this: Recursive functions The final form of loop to consider is known as a recursive function. This is a function that calls itself until a condition is met. In pseudocode, the function looks like this: float my_function(i32:a) { // do something with a if (a != 32) { my_function(a); } else { return a; } } An actual implementation of a recursive function would look like this: fn recurse(n:i32) { let v = match n % 2 { 0 => n / 2, _ => 3 * n + 1 }; println!("{}", v); if v != 1 { recurse(v) } } fn main() { recurse(25) } The idea of a recursive function is very simple, but we need to consider two parts of this code. The first is the let line in the recurse function and what it means: let v = match n % 2 { 0 => n / 2, _ => 3 * n + 1 }; Another way of writing this is as follows: let mut v = 0i32; if n % 2 == 0 { v = n / 2; } else { v = 3 * n + 1; } In C#, this will equate to the following: var v = n % 2 == 0 ? n / 2 : 3 * n + 1; The second part is that the semicolon is not being used everywhere. Consider the following example: fn main() { recurse(25) } What is the difference between having and not having a semicolon? Rust operates on a system of blocks called closures. The semicolon closes a block. Let's see what that means. Consider the following code as an example: fn main() { let x = 5u32; let y = { let x_squared = x * x; let x_cube = x_squared * x; x_cube + x_squared + x }; let z = { 2 * x; }; println!("x is {:?}", x); println!("y is {:?}", y); println!("z is {:?}", z); } We have two different uses of the semicolon. If we look at the let y line first: let y = { let x_squared = x * x; let x_cube = x_squared * x; x_cube + x_squared + x // no semi-colon }; This code does the following: The code within the braces is processed. The final line, without the semicolon, is assigned to y. Essentially, this is considered as an inline function that returns the line without the semicolon into the variable. The second line to consider is for z: let z = { 2 * x; }; Again, the code within the braces is evaluated. In this case, the line ends with a semicolon, so the result is suppressed and () to z. When it is executed, we will get the following results: In the code example, the line within fn main calling recurse gives the same result with or without the semicolon. Summary In this, we've covered the different types of loops that are available within Rust, as well as gained an understanding of when to use a semicolon and what it means to omit it. We have also considered enumeration and iteration over a vector and array and how to handle the data held within them. Resources for Article: Further resources on this subject: Extra, Extra Collection, and Closure Changes that Rock! [article] Create a User Profile System and use the Null Coalesce Operator [article] Fine Tune Your Web Application by Profiling and Automation [article]
Read more
  • 0
  • 0
  • 6229
article-image-fast-data-manipulation-r
Packt
14 Oct 2016
28 min read
Save for later

Fast Data Manipulation with R

Packt
14 Oct 2016
28 min read
Data analysis is a combination of art and science. The art part consists of data exploration and visualization, which is usually done best with better intuition and understanding of the data. The science part consists of statistical analysis, which relies on concrete knowledge of statistics and analytic skills. However, both parts of a serious research require proper tools and good skills to work with them. R is exactly the proper tool to do data analysis with. In this article by Kun Ren, author of the book Learning R Programming, we will discuss how R and data.table package make it easy to transform data and, thus, greatly unleash our productivity. (For more resources related to this topic, see here.) Loading data as data frames The most basic data structures in R are atomic vectors, such as. numeric, logical, character, and complex vector, and list. An atomic vector stores elements of the same type while list is allowed to store different types of elements. The most commonly used data structure in R to store real-world data is data frame. A data frame stores data in tabular form. In essence, a data frame is a list of vectors with equal length but maybe different types. Most of the code in this article is based on a group of fictitious data about some products (you can download the data at https://gist.github.com/renkun-ken/ba2d33f21efded23db66a68240c20c92). We will use the readr package to load the data for better handling of column types. If you don't have this package installed, please run install.packages("readr"). library(readr) product_info <- read_csv("data/product-info.csv") product_info ##    id      name  type   class released ## 1 T01    SupCar   toy vehicle      yes ## 2 T02  SupPlane   toy vehicle       no ## 3 M01     JeepX model vehicle      yes ## 4 M02 AircraftX model vehicle      yes ## 5 M03    Runner model  people      yes ## 6 M04    Dancer model  people       no Once the data is loaded into memory as a data frame, we can take a look at its column types, shown as follows: sapply(product_info, class) ##          id        name        type       class    released ## "character" "character" "character" "character" "character" Using built-in functions to manipulate data frames Although a data frame is essentially a list of vectors, we can access it like a matrix due to all column vectors being the same length. To select rows that meet certain conditions, we will supply a logical vector as the first argument of [] while the second is left empty. For example, we can take out all rows of toy type, shown as follows: product_info[product_info$type == "toy", ] ##    id     name type   class released ## 1 T01   SupCar  toy vehicle      yes ## 2 T02 SupPlane  toy vehicle       no Or, we can take out all rows that are not released. product_info[product_info$released == "no", ] ##    id     name  type   class released ## 2 T02 SupPlane   toy vehicle       no ## 6 M04   Dancer model  people       no To filter columns, we can supply a character vector as the second argument while the first is left empty, which is exactly the same with how we subset a matrix. product_info[1:3, c("id", "name", "type")] ##    id     name  type ## 1 T01   SupCar   toy ## 2 T02 SupPlane   toy ## 3 M01    JeepX model Alternatively, we can filter the data frame by regarding it as a list. We can supply only one character vector of column names in []. product_info[c("id", "name", "class")] ##    id      name   class ## 1 T01    SupCar vehicle ## 2 T02  SupPlane vehicle ## 3 M01     JeepX vehicle ## 4 M02 AircraftX vehicle ## 5 M03    Runner  people ## 6 M04    Dancer  people To filter a data frame by both row and column, we can supply a vector as the first argument to select rows and a vector as the second to select columns. product_info[product_info$type == "toy", c("name", "class", "released")] ##       name   class released ## 1   SupCar vehicle      yes ## 2 SupPlane vehicle       no If the row filtering condition is based on values of certain columns, the preceding code can be very redundant, especially when the condition gets more complicated. Another built-in function to simplify code is subset, as introduced previously. subset(product_info,   subset = type == "model" & released == "yes",   select = name:class) ##        name  type   class ## 3     JeepX model vehicle ## 4 AircraftX model vehicle ## 5    Runner model  people The subset function uses nonstandard evaluation so that we can directly use the columns of the data frame without typing product_info many times because the expressions are meant to be evaluated in the context of the data frame. Similarly, we can use with to evaluate an expression in the context of the data frame, that is, the columns of the data frame can be used as symbols in the expression without repeatedly specifying the data frame. with(product_info, name[released == "no"]) ## [1] "SupPlane" "Dancer" The expression can be more than a simple subsetting. We can summarize the data by counting the occurrences of each possible value of a vector. For example, we can create a table of occurrences of types of records that are released. with(product_info, table(type[released == "yes"])) ## ## model   toy ##     3     1 In addition to the table of product information, we also have a table of product statistics that describe some properties of each product. product_stats <- read_csv("data/product-stats.csv") product_stats ##    id material size weight ## 1 T01    Metal  120   10.0 ## 2 T02    Metal  350   45.0 ## 3 M01 Plastics   50     NA ## 4 M02 Plastics   85    3.0 ## 5 M03     Wood   15     NA ## 6 M04     Wood   16    0.6 Now, think of how we can get the names of products with the top three largest sizes? One way is to sort the records in product_stats by size in descending order, select id values of the top three records, and use these values to filter rows of product_info by id. top_3_id <- product_stats[order(product_stats$size, decreasing = TRUE), "id"][1:3] product_info[product_info$id %in% top_3_id, ] ##    id      name  type   class released ## 1 T01    SupCar   toy vehicle      yes ## 2 T02  SupPlane   toy vehicle       no ## 4 M02 AircraftX model vehicle      yes This approach looks quite redundant. Note that product_info and product_stats actually describe the same set of products in different perspectives. The connection between these two tables is the id column. Each id is unique and means the same product. To access both sets of information, we can put the two tables together into one data frame. The simplest way to do this is use merge: product_table <- merge(product_info, product_stats, by = "id") product_table ##    id      name  type   class released material size weight ## 1 M01     JeepX model vehicle      yes Plastics   50     NA ## 2 M02 AircraftX model vehicle      yes Plastics   85    3.0 ## 3 M03    Runner model  people      yes     Wood   15     NA ## 4 M04    Dancer model  people       no     Wood   16    0.6 ## 5 T01    SupCar   toy vehicle      yes    Metal  120   10.0 ## 6 T02  SupPlane   toy vehicle       no    Metal  350   45.0 Now, we can create a new data frame that is a combined version of product_table and product_info with a shared id column. In fact, if you reorder the records in the second table, the two tables still can be correctly merged. With the combined version, we can do things more easily. For example, with the merged version, we can sort the data frame with any column in one table we loaded without having to manually work with the other. product_table[order(product_table$size), ] ##    id      name  type   class released material size weight ## 3 M03    Runner model  people      yes     Wood   15     NA ## 4 M04    Dancer model  people       no     Wood   16    0.6 ## 1 M01     JeepX model vehicle      yes Plastics   50     NA ## 2 M02 AircraftX model vehicle      yes Plastics   85    3.0 ## 5 T01    SupCar   toy vehicle      yes    Metal  120   10.0 ## 6 T02  SupPlane   toy vehicle       no    Metal  350   45.0 To solve the problem, we can directly use the merged table and get the same answer. product_table[order(product_table$size, decreasing = TRUE), "name"][1:3] ## [1] "SupPlane"  "SupCar"    "AircraftX" The merged data frame allows us to sort the records by a column in one data frame and filter the records by a column in the other. For example, we can first sort the product records by weight in descending order and select all records of model type. product_table[order(product_table$weight, decreasing = TRUE), ][   product_table$type == "model",] ##    id      name  type   class released material size weight ## 6 T02  SupPlane   toy vehicle       no    Metal  350   45.0 ## 5 T01    SupCar   toy vehicle      yes    Metal  120   10.0 ## 2 M02 AircraftX model vehicle      yes Plastics   85    3.0 ## 4 M04    Dancer model  people       no     Wood   16    0.6 Sometimes, the column values are literal but can be converted to standard R data structures to better represent the data. For example, released column in product_info only takes yes and no, which can be better represented with a logical vector. We can use <- to modify the column values, as we learned previously. However, it is usually better to create a new data frame with the existing columns properly adjusted and new columns added without polluting the original data. To do this, we can use transform: transform(product_table,   released = ifelse(released == "yes", TRUE, FALSE),   density = weight / size) ##    id      name  type   class released material size weight ## 1 M01     JeepX model vehicle     TRUE Plastics   50     NA ## 2 M02 AircraftX model vehicle     TRUE Plastics   85    3.0 ## 3 M03    Runner model  people     TRUE     Wood   15     NA ## 4 M04    Dancer model  people    FALSE     Wood   16    0.6 ## 5 T01    SupCar   toy vehicle     TRUE    Metal  120   10.0 ## 6 T02  SupPlane   toy vehicle    FALSE    Metal  350   45.0 ##      density ## 1         NA ## 2 0.03529412 ## 3         NA ## 4 0.03750000 ## 5 0.08333333 ## 6 0.12857143 The result is a new data frame with released converted to a logical vector and a new density column added. You can easily verify that product_table is not modified at all. Additionally, note that transform is like subset, as both functions use nonstandard evaluation to allow direct use of data frame columns as symbols in the arguments so that we don't have to type product_table$ all the time. Now, we will load another table into R. It is the test results of the quality, and durability of each product. We store the data in product_tests. product_tests <- read_csv("data/product-tests.csv") product_tests ##    id quality durability waterproof ## 1 T01      NA         10         no ## 2 T02      10          9         no ## 3 M01       6          4        yes ## 4 M02       6          5        yes ## 5 M03       5         NA        yes ## 6 M04       6          6        yes Note that the values in both quality and durability contain missing values (NA). To exclude all rows with missing values, we can use na.omit(): na.omit(product_tests) ##    id quality durability waterproof ## 2 T02      10          9         no ## 3 M01       6          4        yes ## 4 M02       6          5        yes ## 6 M04       6          6        yes Another way is to use complete.cases() to get a logical vector indicating all complete rows, without any missing value,: complete.cases(product_tests) ## [1] FALSE  TRUE  TRUE  TRUE FALSE  TRUE Then, we can use this logical vector to filter the data frame. For example, we can get the id  column of all complete rows as follows: product_tests[complete.cases(product_tests), "id"] ## [1] "T02" "M01" "M02" "M04" Or, we can get the id column of all incomplete rows: product_tests[!complete.cases(product_tests), "id"] ## [1] "T01" "M03" Note that product_info, product_stats and product_tests all share an id column, and we can merge them altogether. Unfortunately, there's no built-in function to merge an arbitrary number of data frames. We can only merge two existing data frames at a time, or we'll have to merge them recursively. merge(product_table, product_tests, by = "id") ##    id      name  type   class released material size weight ## 1 M01     JeepX model vehicle      yes Plastics   50     NA ## 2 M02 AircraftX model vehicle      yes Plastics   85    3.0 ## 3 M03    Runner model  people      yes     Wood   15     NA ## 4 M04    Dancer model  people       no     Wood   16    0.6 ## 5 T01    SupCar   toy vehicle      yes    Metal  120   10.0 ## 6 T02  SupPlane   toy vehicle       no    Metal  350   45.0 ##   quality durability waterproof ## 1       6          4        yes ## 2       6          5        yes ## 3       5         NA        yes ## 4       6          6        yes ## 5      NA         10         no ## 6      10          9         no Data wrangling with data.table In the previous section, we had an overview on how we can use built-in functions to work with data frames. Built-in functions work, but are usually verbose. In this section, let's use data.table, an enhanced version of data.frame, and see how it makes data manipulation much easier. Run install.packages("data.table") to install the package. As long as the package is ready, we can load the package and use fread() to read the data files as data.table objects. library(data.table) product_info <- fread("data/product-info.csv") product_stats <- fread("data/product-stats.csv") product_tests <- fread("data/product-tests.csv") toy_tests <- fread("data/product-toy-tests.csv") It is extremely easy to filter data in data.table. To select the first two rows, just use [1:2], which instead selects the first two columns for data.frame. product_info[1:2] ##     id     name type   class released ## 1: T01   SupCar  toy vehicle      yes ## 2: T02 SupPlane  toy vehicle       no To filter by logical conditions, just directly type columns names as variables without quotation as the expression is evaluated within the context of product_info: product_info[type == "model" & class == "people"] ##     id   name  type  class released ## 1: M03 Runner model people      yes ## 2: M04 Dancer model people       no It is easy to select or transform columns. product_stats[, .(id, material, density = size / weight)] ##     id material   density ## 1: T01    Metal 12.000000 ## 2: T02    Metal  7.777778 ## 3: M01 Plastics        NA ## 4: M02 Plastics 28.333333 ## 5: M03     Wood        NA ## 6: M04     Wood 26.666667 The data.table object also supports using key for subsetting, which can be much faster than using ==. We can set a column as key for each data.table: setkey(product_info, id) setkey(product_stats, id) setkey(product_tests, id) Then, we can use a value to directly select rows. product_info["M02"] ##     id      name  type   class released ## 1: M02 AircraftX model vehicle      yes We can also set multiple columns as key so as to use multiple values to subset it. setkey(toy_tests, id, date) toy_tests[.("T02", 20160303)] ##     id     date sample quality durability ## 1: T02 20160303     75       8          8 If two data.table objects share the same key, we can join them easily: product_info[product_tests] ##     id      name  type   class released quality durability ## 1: M01     JeepX model vehicle      yes       6          4 ## 2: M02 AircraftX model vehicle      yes       6          5 ## 3: M03    Runner model  people      yes       5         NA ## 4: M04    Dancer model  people       no       6          6 ## 5: T01    SupCar   toy vehicle      yes      NA         10 ## 6: T02  SupPlane   toy vehicle       no      10          9 ##    waterproof ## 1:        yes ## 2:        yes ## 3:        yes ## 4:        yes ## 5:         no ## 6:         no Instead of creating new data.table, in-place modification is also supported. The := sets the values of a column in place without the overhead of making copies and, thus, is much faster than using <-. product_info[, released := (released == "yes")] ##     id      name  type   class released ## 1: M01     JeepX model vehicle     TRUE ## 2: M02 AircraftX model vehicle     TRUE ## 3: M03    Runner model  people     TRUE ## 4: M04    Dancer model  people    FALSE ## 5: T01    SupCar   toy vehicle     TRUE ## 6: T02  SupPlane   toy vehicle    FALSE product_info ##     id      name  type   class released ## 1: M01     JeepX model vehicle     TRUE ## 2: M02 AircraftX model vehicle     TRUE ## 3: M03    Runner model  people     TRUE ## 4: M04    Dancer model  people    FALSE ## 5: T01    SupCar   toy vehicle     TRUE ## 6: T02  SupPlane   toy vehicle    FALSE Another important argument of subsetting a data.table is by, which is used to split the data into multiple parts and for each part the second argument (j) is evaluated. For example, the simplest usage of by is counting the records in each group. In the following code, we can count the number of both released and unreleased products: product_info[, .N, by = released] ##    released N ## 1:     TRUE 4 ## 2:    FALSE 2 The group can be defined by more than one variable. For example, a tuple of type and class can be a group, and for each group, we can count the number of records, as follows: product_info[, .N, by = .(type, class)] ##     type   class N ## 1: model vehicle 2 ## 2: model  people 2 ## 3:   toy vehicle 2 We can also perform the following statistical calculations for each group: product_tests[, .(mean_quality = mean(quality, na.rm = TRUE)),   by = .(waterproof)] ##    waterproof mean_quality ## 1:        yes         5.75 ## 2:         no        10.00 We can chain multiple [] in turn. In the following example, we will first join product_info and product_tests by a shared key id and then calculate the mean value of quality and durability for each group of type and class of released products. product_info[product_tests][released == TRUE,   .(mean_quality = mean(quality, na.rm = TRUE),     mean_durability = mean(durability, na.rm = TRUE)),   by = .(type, class)] ##     type   class mean_quality mean_durability ## 1: model vehicle            6             4.5 ## 2: model  people            5             NaN ## 3:   toy vehicle          NaN            10.0 Note that the values of the by columns will be unique in the resulted data.table; we can use keyby instead of by to ensure that it is automatically used as key by the resulted data.table. product_info[product_tests][released == TRUE,   .(mean_quality = mean(quality, na.rm = TRUE),     mean_durability = mean(durability, na.rm = TRUE)),   keyby = .(type, class)] ##     type   class mean_quality mean_durability ## 1: model  people            5             NaN ## 2: model vehicle            6             4.5 ## 3:   toy vehicle          NaN            10.0 The data.table package also provides functions to perform superfast reshaping of data. For example, we can use dcast() to spread id values along the x-axis as columns and align quality values to all possible date values along the y-axis. toy_quality <- dcast(toy_tests, date ~ id, value.var = "quality") toy_quality ##        date T01 T02 ## 1: 20160201   9   7 ## 2: 20160302  10  NA ## 3: 20160303  NA   8 ## 4: 20160403  NA   9 ## 5: 20160405   9  NA ## 6: 20160502   9  10 Although each month a test is conducted for each product, the dates may not exactly match with each other. This results in missing values if one product has a value on a day but the other has no corresponding value on exactly the same day. One way to fix this is to use year-month data instead of exact date. In the following code, we will create a new ym column that is the first 6 characters of toy_tests. For example, substr(20160101, 1, 6) will result in 201601. toy_tests[, ym := substr(toy_tests$date, 1, 6)] ##     id     date sample quality durability     ym ## 1: T01 20160201    100       9          9 201602 ## 2: T01 20160302    150      10          9 201603 ## 3: T01 20160405    180       9         10 201604 ## 4: T01 20160502    140       9          9 201605 ## 5: T02 20160201     70       7          9 201602 ## 6: T02 20160303     75       8          8 201603 ## 7: T02 20160403     90       9          8 201604 ## 8: T02 20160502     85      10          9 201605 toy_tests$ym ## [1] "201602" "201603" "201604" "201605" "201602" "201603" ## [7] "201604" "201605" This time, we will use ym for alignment instead of date: toy_quality <- dcast(toy_tests, ym ~ id, value.var = "quality") toy_quality ##        ym T01 T02 ## 1: 201602   9   7 ## 2: 201603  10   8 ## 3: 201604   9   9 ## 4: 201605   9  10 Now the missing values are gone, the quality scores of both products in each month are naturally presented. Sometimes, we will need to combine a number of columns into one that indicates the measure and another that stores the value. For example, the following code uses melt() to combine the two measures (quality and durability) of the original data into a column named measure and a column of the measured value. toy_tests2 <- melt(toy_tests, id.vars = c("id", "ym"),   measure.vars = c("quality", "durability"),   variable.name = "measure") toy_tests2 ##      id     ym    measure value ##  1: T01 201602    quality     9 ##  2: T01 201603    quality    10 ##  3: T01 201604    quality     9 ##  4: T01 201605    quality     9 ##  5: T02 201602    quality     7 ##  6: T02 201603    quality     8 ##  7: T02 201604    quality     9 ##  8: T02 201605    quality    10 ##  9: T01 201602 durability     9 ## 10: T01 201603 durability     9 ## 11: T01 201604 durability    10 ## 12: T01 201605 durability     9 ## 13: T02 201602 durability     9 ## 14: T02 201603 durability     8 ## 15: T02 201604 durability     8 ## 16: T02 201605 durability     9 The variable names are now contained in the data, which can be directly used by some packages. For example, we can use ggplot2 to plot data in such format. The following code is an example of a scatter plot with a facet grid of different combination of factors. library(ggplot2) ggplot(toy_tests2, aes(x = ym, y = value)) +   geom_point() +   facet_grid(id ~ measure) The graph generated is shown as follows: The plot can be easily manipulated because the grouping factor (measure) is contained as data rather than columns, which is easier to represent from the perspective of the ggplot2 package. ggplot(toy_tests2, aes(x = ym, y = value, color = id)) +   geom_point() +   facet_grid(. ~ measure) The graph generated is shown as follows: Summary In this article, we used both built-in functions and the data.table package to perform simple data manipulation tasks. Using built-in functions can be verbose while using data.table can be much easier and faster. However, the tasks in real-world data analysis can be much more complex than the examples we demonstrated, which also requires better R programming skills. It is helpful to have a good understanding on how nonstandard evaluation makes data.table so easy to work with, how environment works and scoping rules apply to make your code predictable, and so on. A universal and consistent understanding of how R basically works will certainly give you great confidence to write R code to work with data and enable you to learn packages very quickly. Resources for Article: Further resources on this subject: Supervised Machine Learning [article] Getting Started with Bootstrap [article] Basics of Classes and Objects [article]
Read more
  • 0
  • 0
  • 4136

article-image-learning-how-manage-records-visualforc
Packt
14 Oct 2016
7 min read
Save for later

Learning How to Manage Records in Visualforc

Packt
14 Oct 2016
7 min read
In this article by Keir Bowden, author of the book, Visualforce Development Cookbook - Second Edition we will cover the following styling fields and table columns as per requirement One of the common use cases for Visualforce pages is to simplify, streamline, or enhance the management of sObject records. In this article, we will use Visualforce to carry out some more advanced customization of the user interface—redrawing the form to change available picklist options, or capturing different information based on the user's selections. (For more resources related to this topic, see here.) Styling fields as required Standard Visualforce input components, such as <apex:inputText />, can take an optional required attribute. If set to true, the component will be decorated with a red bar to indicate that it is required, and form submission will fail if a value has not been supplied, as shown in the following screenshot: In the scenario where one or more inputs are required and there are additional validation rules, for example, when one of either the Email or Phone fields is defined for a contact, this can lead to a drip feed of error messages to the user. This is because the inputs make repeated unsuccessful attempts to submit the form, each time getting slightly further in the process. Now, we will create a Visualforce page that allows a user to create a contact record. The Last Name field is captured through a non-required input decorated with a red bar identical to that created for required inputs. When the user submits the form, the controller validates that the Last Name field is populated and that one of the Email or Phone fields is populated. If any of the validations fail, details of all errors are returned to the user. Getting ready This topic makes use of a controller extension so this must be created before the Visualforce page. How to do it… Navigate to the Apex Classes setup page by clicking on Your Name | Setup | Develop | Apex Classes. Click on the New button. Paste the contents of the RequiredStylingExt.cls Apex class from the code downloaded into the Apex Class area. Click on the Save button. Navigate to the Visualforce setup page by clicking on Your Name | Setup | Develop | Visualforce Pages. Click on the New button. Enter RequiredStyling in the Label field. Accept the default RequiredStyling that is automatically generated for the Name field. Paste the contents of the RequiredStyling.page file from the code downloaded into the Visualforce Markup area and click on the Save button. Navigate to the Visualforce setup page by clicking on Your Name | Setup | Develop | Visualforce Pages. Locate the entry for the RequiredStyling page and click on the Security link. On the resulting page, select which profiles should have access and click on the Save button. How it works… Opening the following URL in your browser displays the RequiredStyling page to create a new contact record: https://<instance>/apex/RequiredStyling. Here, <instance> is the Salesforce instance specific to your organization, for example, na6.salesforce.com. Clicking on the Save button without populating any of the fields results in the save failing with a number of errors: The Last Name field is constructed from a label and text input component rather than a standard input field, as an input field would enforce the required nature of the field and stop the submission of the form: <apex:pageBlockSectionItem > <apex:outputLabel value="Last Name"/> <apex:outputPanel id="detailrequiredpanel" layout="block" styleClass="requiredInput"> <apex:outputPanel layout="block" styleClass="requiredBlock" /> <apex:inputText value="{!Contact.LastName}"/> </apex:outputPanel> </apex:pageBlockSectionItem> The required styles are defined in the Visualforce page rather than relying on any existing Salesforce style classes to ensure that if Salesforce changes the names of its style classes, this does not break the page. The controller extension save action method carries out validation of all fields and attaches error messages to the page for all validation failures: if (String.IsBlank(cont.name)) { ApexPages.addMessage(new ApexPages.Message( ApexPages.Severity.ERROR, 'Please enter the contact name')); error=true; } if ( (String.IsBlank(cont.Email)) && (String.IsBlank(cont.Phone)) ) { ApexPages.addMessage(new ApexPages.Message( ApexPages.Severity.ERROR, 'Please supply the email address or phone number')); error=true; } Styling table columns as required When maintaining records that have required fields through a table, using regular input fields can end up with an unsightly collection of red bars striped across the table. Now, we will create a Visualforce page to allow a user to create a number of contact records via a table. The contact Last Name column header will be marked as required, rather than the individual inputs. Getting ready This topic makes use of a custom controller, so this will need to be created before the Visualforce page. How to do it… First, create the custom controller by navigating to the Apex Classes setup page by clicking on Your Name | Setup | Develop | Apex Classes. Click on the New button. Paste the contents of the RequiredColumnController.cls Apex class from the code downloaded into the Apex Class area. Click on the Save button. Next, create a Visualforce page by navigating to the Visualforce setup page by clicking on Your Name | Setup | Develop | Visualforce Pages. Click on the New button. Enter RequiredColumn in the Label field. Accept the default RequiredColumn that is automatically generated for the Name field. Paste the contents of the RequiredColumn.page file from the code downloaded into the Visualforce Markup area and click on the Save button. Navigate to the Visualforce setup page by clicking on Your Name | Setup | Develop | Visualforce Pages. Locate the entry for the RequiredColumn page and click on the Security link. On the resulting page, select which profiles should have access and click on the Save button. How it works… Opening the following URL in your browser displays the RequiredColumn page: https://<instance>/apex/RequiredColumn. Here, <instance> is the Salesforce instance specific to your organization, for example, na6.salesforce.com. The Last Name column header is styled in red, indicating that this is a required field. Attempting to create a record where only First Name is specified results in an error message being displayed against the Last Name input for the particular row: The Visualforce page sets the required attribute on the inputField components in the Last Name column to false, which removes the red bar from the component: <apex:column > <apex:facet name="header"> <apex:outputText styleclass="requiredHeader" value="{!$ObjectType.Contact.fields.LastName.label}" /> </apex:facet> <apex:inputField value="{!contact.LastName}" required="false"/> </apex:column> The Visualforce page custom controller Save method checks if any of the fields in the row are populated, and if this is the case, it checks that the last name is present. If the last name is missing from any record, an error is added. If an error is added to any record, the save does not complete: if ( (!String.IsBlank(cont.FirstName)) || (!String.IsBlank(cont.LastName)) ) { // a field is defined - check for last name if (String.IsBlank(cont.LastName)) { error=true; cont.LastName.addError('Please enter a value'); } String.IsBlank() is used as this carries out three checks at once: to check that the supplied string is not null, it is not empty, and it does not only contain whitespace. Summary Thus in this article we successfully mastered the techniques to style fields and table columns as per the custom needs. Resources for Article: Further resources on this subject: Custom Components in Visualforce [Article] Visualforce Development with Apex [Article] Learning How to Manage Records in Visualforce [Article]
Read more
  • 0
  • 0
  • 9223
Modal Close icon
Modal Close icon