Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7010 Articles
article-image-audio-processing-and-generation-maxmsp
Packt
25 Nov 2014
19 min read
Save for later

Audio Processing and Generation in Max/MSP

Packt
25 Nov 2014
19 min read
In this article, by Patrik Lechner, the author of Multimedia Programming Using Max/MSP and TouchDesigner, focuses on the audio-specific examples. We will take a look at the following audio processing and generation techniques: Additive synthesis Subtractive synthesis Sampling Wave shaping Nearly every example provided here might be understood very intuitively or taken apart in hours of math and calculation. It's up to you how deep you want to go, but in order to develop some intuition; we'll have to be using some amount of Digital Signal Processing (DSP) theory. We will briefly cover the DSP theory, but it is highly recommended that you study its fundamentals deeper to clearly understand this scientific topic in case you are not familiar with it already. (For more resources related to this topic, see here.) Basic audio principles We already saw and stated that it's important to know, see, and hear what's happening along a signal way. If we work in the realm of audio, there are four most important ways to measure a signal, which are conceptually partly very different and offer a very broad perspective on audio signals if we always have all of them in the back of our heads. These are the following important ways: Numbers (actual sample values) Levels (such as RMS, LUFS, and dB FS) Transversal waves (waveform displays, so oscilloscopes) Spectra (an analysis of frequency components) There are many more ways to think about audio or signals in general, but these are the most common and important ones. Let's use them inside Max right away to observe their different behavior. We'll feed some very basic signals into them: DC offset, a sinusoid, and noise. The one that might surprise you the most and get you thinking is the constant signal or DC offset (if it's digital-analog converted). In the following screenshot, you can see how the different displays react: In general, one might think, we don't want any constant signals at all; we don't want any DC offset. However, we will use audio signals a lot to control things later, say, an LFO or sequencers that should run with great timing accuracy. Also, sometimes, we just add a DC offset to our audio streams by accident. You can see in the preceding screenshot, that a very slowly moving or constant signal can be observed best by looking at its value directly, for example, using the [number~] object. In a level display, the [meter~] or [levelmeter~] objects will seem to imply that the incoming signal is very loud, in fact, it should be at -6 dB Full Scale (FS). As it is very loud, we just can't hear anything since the frequency is infinitely low. This is reflected by the spectrum display too; we see a very low frequency at -6 dB. In theory, we should just see an infinitely thin spike at 0 Hz, so everything else can be considered an (inevitable but reducible) measuring error. Audio synthesis Awareness of these possibilities of viewing a signal and their constraints, and knowing how they actually work, will greatly increase our productivity. So let's get to actually synthesizing some waveforms. A good example of different views of a signal operation is Amplitude Modulation (AM); we will also try to formulate some other general principles using the example of AM. Amplitude modulation Amplitude modulation means the multiplication of a signal with an oscillator. This provides a method of generating sidebands, which is partial in a very easy, intuitive, and CPU-efficient way. Amplitude modulation seems like a word that has a very broad meaning and can be used as soon as we change a signal's amplitude by another signal. While this might be true, in the context of audio synthesis, it very specifically means the multiplication of two (most often sine) oscillators. Moreover, there is a distinction between AM and Ring Modulation. But before we get to this distinction, let's look at the following simple multiplication of two sine waves, and we are first going to look at the result in an oscilloscope as a wave: So in the preceding screenshot, we can see the two sine waves and their product. If we imagine every pair of samples being multiplied, the operation seems pretty intuitive as the result is what we would expect. But what does this resulting wave really mean besides looking like a product of two sine waves? What does it sound like? The wave seems to have stayed in there certainly, right? Well, viewing the product as a wave and looking at the whole process in the time domain rather than the frequency domain is helpful but slightly misleading. So let's jump over to the following frequency domain and look what's happening with the spectrum: So we can observe here that if we multiply a sine wave a with a sine wave b, a having a frequency of 1000 Hz and b a frequency of 100 Hz, we end up with two sine waves, one at 900 Hz and another at 1100 Hz. The original sine waves have disappeared. In general, we can say that the result of multiplying a and b is equal to adding and subtracting the frequencies. This is shown in the Equivalence to Sum and difference subpatcher (in the following screenshot, the two inlets to the spectrum display overlap completely, which might be hard to see): So in the preceding screenshot, you see a basic AM patcher that produces sidebands that we can predict quite easily. Multiplication is commutative; you will say, 1000 + 100 = 1100, 1000 - 100 = 900; that's alright, but what about 100 - 1000 and 100 + 1000? We get -900 and 1100 once again? It still works out, and the fact that it does has to do with negative frequencies, or the symmetry of a real frequency spectrum around 0. So you can see that the two ways of looking at our signal and thinking about AM lend different opportunities and pitfalls. Here is another way to think about AM: it's the convolution of the two spectra. We didn't talk about convolution yet; we will at a later point. But keep it in mind or do a little research on your own; this aspect of AM is yet another interesting one. Ring modulation versus amplitude modulation The difference between ring modulation and what we call AM in this context is that the former one uses a bipolar modulator and the latter one uses a unipolar one. So actually, this is just about scaling and offsetting one of the factors. The difference in the outcome is yet a big one; if we keep one oscillator unipolar, the other one will be present in the outcome. If we do so, it starts making sense to call one oscillator on the carrier and the other (unipolar) on the modulator. Also, it therefore introduces a modulation depth that controls the amplitude of the sidebands. In the following screenshot, you can see the resulting spectrum; we have the original signal, so the carrier plus two sidebands, which are the original signals, are shifted up and down: Therefore, you can see that AM has a possibility to roughen up our spectrum, which means we can use it to let through an original spectrum and add sidebands. Tremolo Tremolo (from the Latin word tremare, to shake or tremble) is a musical term, which means to change a sound's amplitude in regular short intervals. Many people confuse it with vibrato, which is a modulating pitch at regular intervals. AM is tremolo and FM is vibrato, and as a simple reminder, think that the V of vibrato is closer to the F of FM than to the A of AM. So multiplying the two oscillators' results in a different spectrum. But of course, we can also use multiplication to scale a signal and to change its amplitude. If we wanted to have a sine wave that has a tremolo, that is an oscillating variation in amplitude, with, say, a frequency of 1 Hertz, we would again multiply two sine waves, one with 1000 Hz for example and another with a frequency of 0.5 Hz. Why 0.5 Hz? Think about a sine wave; it has two peaks per cycle, a positive one and a negative one. We can visualize all that very well if we think about it in the time domain, looking at the result in an oscilloscope. But what about our view of the frequency domain? Well, let's go through it; when we multiply a sine with 1000 Hz and one with 0.5 Hz, we actually get two sine waves, one with 999.5 Hz and one with 100.5 Hz. Frequencies that close create beatings, since once in a while, their positive and negative peaks overlap, canceling out each other. In general, the frequency of the beating is defined by the difference in frequency, which is 1 Hz in this case. So we see, if we look at it this way, we come to the same result again of course, but this time, we actually think of two frequencies instead of one being attenuated. Lastly, we could have looked up trigonometric identities to anticipate what happens if we multiply two sine waves. We find the following: Here, φ and θ are the two angular frequencies multiplied by the time in seconds, for example: This is the equation for the 1000 Hz sine wave. Feedback Feedback always brings the complexity of a system to the next level. It can be used to stabilize a system, but can also make a given system unstable easily. In a strict sense, in the context of DSP, stability means that for a finite input to a system, we get finite output. Obviously, feedback can give us infinite output for a finite input. We can use attenuated feedback, for example, not only to make our AM patches recursive, adding more and more sidebands, but also to achieve some surprising results as we will see in a minute. Before we look at this application, let's quickly talk about feedback in general. In the digital domain, feedback always demands some amount of delay. This is because the evaluation of the chain of operations would otherwise resemble an infinite amount of operations on one sample. This is true for both the Max message domain (we get a stack overflow error if we use feedback without delaying or breaking the chain of events) and the MSP domain; audio will just stop working if we try it. So the minimum network for a feedback chain as a block diagram looks something like this: In the preceding screenshot, X is the input signal and x[n] is the current input sample; Y is the output signal and y[n] is the current output sample. In the block marked Z-m, i is a delay of m samples (m being a constant). Denoting a delay with Z-m comes from a mathematical construct named the Z-transform. The a term is also a constant used to attenuate the feedback circle. If no feedback is involved, it's sometimes helpful to think about block diagrams as processing whole signals. For example, if you think of a block diagram that consists only of multiplication with a constant, it would make a lot of sense to think of its output signal as a scaled version of the input signal. We wouldn't think of the network's processing or its output sample by sample. However, as soon as feedback is involved, without calculation or testing, this is the way we should think about the network. Before we look at the Max version of things, let's look at the difference equation of the network to get a better feeling of the notation. Try to find it yourself before looking at it too closely! In Max, or rather in MSP, we can introduce feedback as soon as we use a [tapin~] [tapout~] pair that introduces a delay. The minimum delay possible is the signal vector size. Another way is to simply use a [send~] and [receive~] pair in our loop. The [send~] and [receive~] pair will automatically introduce this minimum amount of delay if needed, so the delay will be introduced only if there is a feedback loop. If we need shorter delays and feedback, we have to go into the wonderful world of gen~. Here, our shortest delay time is one sample, and can be introduced via the [history] object. In the Fbdiagram.maxpat patcher, you can find a Max version, an MSP version, and a [gen~] version of our diagram. For the time being, let's just pretend that the gen domain is just another subpatcher/abstraction system that allows shorter delays with feedback and has a more limited set of objects that more or less work the same as the MSP ones. In the following screenshot, you can see the difference between the output of the MSP and the [gen~] domain. Obviously, the length of the delay time has quite an impact on the output. Also, don't forget that the MSP version's output will vary greatly depending on our vector size settings. Let's return to AM now. Feedback can, for example, be used to duplicate and shift our spectrum again and again. In the following screenshot, you can see a 1000 Hz sine wave that has been processed by a recursive AM to be duplicated and shifted up and down with a 100 Hz spacing: In the maybe surprising result, we can achieve with this technique is this: if we the modulating oscillator and the carrier have the same frequency, we end up with something that almost sounds like a sawtooth wave. Frequency modulation Frequency modulation or FM is a technique that allows us to create a lot of frequency components out of just two oscillators, which is why it was used a lot back in the days when oscillators were a rare, expensive good, or CPU performance was low. Still, especially when dealing with real-time synthesis, efficiency is a crucial factor, and the huge variety of sounds that can be achieved with just two oscillators and very few parameters can be very useful for live performance and so on. The idea of FM is of course to modulate an oscillator's frequency. The basic, admittedly useless form is depicted in the following screenshot: While trying to visualize what happens with the output in the time domain, we can imagine it as shown in the following screenshot. In the preceding screenshot, you can see the signal that is controlling the frequency. It is a sine wave with a frequency of 50 Hz, scaled and offset to range from -1000 to 5000, so the center or carrier frequency is 2000 Hz, which is modulated to an amount of 3000 Hz. You can see the output of the modulated oscillator in the following screenshot: If we extend the upper patch slightly, we end up with this: Although you can't see it in the screenshot, the sidebands are appearing with a 100 Hz spacing here, that is, with a spacing equal to the modulator's frequency. Pretty similar to AM right? But depending on the modulation amount, we get more and more sidebands. Controlling FM If the ratio between F(c) and F(m) is an integer, we end up with a harmonic spectrum, therefore, it may be more useful to rather control F(m) indirectly via a ratio parameter as it's done inside the SimpleRatioAndIndex subpatcher. Also, an Index parameter is typically introduced to make an FM patch even more controllable. The modulation index is defined as follows: Here, I is the index, Am is the amplitude of the modulation, what we called amount before, and fm is the modulator's frequency. So finally, after adding these two controls, we might arrive here: FM offers a wide range of possibilities, for example, the fact that we have a simple control for how harmonic/inharmonic our spectrum is can be useful to synthesize the mostly noisy attack phase of many instruments if we drive the ratio and index with an envelope as it's done in the SimpleEnvelopeDriven subpatcher. However, it's also very easy to synthesize very artificial, strange sounds. This basically has the following two reasons: Firstly, the partials appearing have amplitudes governed by Bessel functions that may seem quite unpredictable; the partials sometimes seem to have random amplitudes. Secondly, negative frequencies and fold back. If we generate partials with frequencies below 0 Hz, it is equivalent to creating the same positive frequency. For frequencies greater than the sample rate/2 (sample rate/2 is what's called the Nyquist rate), the frequencies reflect back into the spectrum that can be described by our sampling rate (this is an effect also called aliasing). So at a sampling rate of 44,100 Hz, a partial with a frequency of -100 Hz will appear at 100 Hz, and a partial with a frequency of 43100 kHz will appear at 1000 Hz, as shown in the following screenshot: So, for frequencies between the Nyquist frequency and the sampling frequency, what we hear is described by this: Here, fs is the sampling rate, f0 is the frequency we hear, and fi is the frequency we are trying to synthesize. Since FM leads to many partials, this effect can easily come up, and can both be used in an artistically interesting manner or sometimes appear as an unwanted error. In theory, an FM signal's partials extend to even infinity, but the amplitudes become negligibly small. If we want to reduce this behavior, the [poly~] object can be used to oversample the process, generating a bit more headroom for high frequencies. The phenomenon of aliasing can be understood by thinking of a real (in contrast to imaginary) digital signal as having a symmetrical and periodical spectrum; let's not go into too much detail here and look at it in the time domain: In the previous screenshot, we again tried to synthesize a sine wave with 43100 Hz (the dotted line) at a sampling rate of 44100 Hz. What we actually get is the straight black line, a sine with 1000 Hz. Each big black dot represents an actual sample, and there is only one single band-limited signal connecting them: the 1000 Hz wave that is only partly visible here (about half its wavelength). Feedback It is very common to use feedback with FM. We can even frequency modulate one oscillator with itself, making the algorithm even cheaper since we have only one table lookup. The idea of feedback FM quickly leads us to the idea of making networks of oscillators that can be modulated by each other, including feedback paths, but let's keep it simple for now. One might think that modulating one oscillator with itself should produce chaos; FM being a technique that is not the easiest to control, one shouldn't care for playing around with single operator feedback FM. But the opposite is the case. Single operator FM yields very predictable partials, as shown in the following screenshot, and in the Single OP FBFM subpatcher: Again, we are using a gen~ patch, since we want to create a feedback loop and are heading for a short delay in the loop. Note that we are using the [param] object to pass a message into the gen~ object. What should catch your attention is that although the carrier frequency has been adjusted to 1000 Hz, the fundamental frequency in the spectrum is around 600 Hz. What can help us here is switching to phase modulation. Phase modulation If you look at the gen~ patch in the previous screenshot, you see that we are driving our sine oscillator with a phasor. The cycle object's phase inlet assumes an input that ranges from 0 to 1 instead of from 0 to 2π, as one might think. To drive a sine wave through one full cycle in math, we can use a variable ranging from 0 to 2π, so in the following formula, you can imagine t being provided by a phasor, which is the running phase. The 2π multiplication isn't necessary in Max since if we are using [cycle~], we are reading out a wavetable actually instead of really computing the sine or cosine of the input. This is the most common form of denoting a running sinusoid with frequency f0 and phase φ. Try to come up with a formula that describes frequency modulation! Simplifying the phases by setting it to zero, we can denote FM as follows: This can be shown to be nearly identical to the following formula: Here, f0 is the frequency of the carrier, fm is the frequency of the modulator, and A is the modulation amount. Welcome to phase modulation. If you compare it, the previous formula actually just inserts a scaled sine wave where the phase φ used to be. So phase modulation is nearly identical to frequency modulation. Phase modulation has some advantages though, such as providing us with an easy method of synchronizing multiple oscillators. But let's go back to the Max side of things and look at a feedback phase modulation patch right away (ignoring simple phase modulation, since it really is so similar to FM): This gen~ patcher resides inside the One OP FBPM subpatcher and implements phase modulation using one oscillator and feedback. Interestingly, the spectrum is very similar to the one of a sawtooth wave, with the feedback amount having a similar effect of a low-pass filter, controlling the amount of partials. If you take a look at the subpatcher, you'll find the following three sound sources: Our feedback FM gen~ patcher A [saw~] object for comparison A poly~ object We have already mentioned the problem of aliasing and the [poly~] object has already been proposed to treat the problem. However, it allows us to define the quality of parts of patches in general, so let's talk about the object a bit before moving on since we will make great use of it. Before moving on, I would like to tell you that you can double-click on it to see what is loaded inside, and you will see that the subpatcher we just discussed contains a [poly~] object that contains yet another version of our gen~ patcher. Summary In this article, we've finally come to talking about audio. We've introduced some very common techniques and thought about refining them and getting things done properly and efficiently (think about poly~). By now, you should feel quite comfortable building synths that mix techniques such as FM, subtractive synthesis, and feature modulation, as well as using matrices for routing both audio and modulation signals where you need them. Further resources on this subject: Moodle for Online Communities [Article] Techniques for Creating a Multimedia Database [Article] Moodle 2.0 Multimedia: Working with 2D and 3D Maps [Article]
Read more
  • 0
  • 0
  • 14435

article-image-understanding-hbase-ecosystem
Packt
24 Nov 2014
11 min read
Save for later

Understanding the HBase Ecosystem

Packt
24 Nov 2014
11 min read
This article by Shashwat Shriparv, author of the book, Learning HBase, will introduce you to the world of HBase. (For more resources related to this topic, see here.) HBase is a horizontally scalable, distributed, open source, and a sorted map database. It runs on top of Hadoop file system that is Hadoop Distributed File System (HDFS). HBase is a NoSQL nonrelational database that doesn't always require a predefined schema. It can be seen as a scaling flexible, multidimensional spreadsheet where any structure of data is fit with on-the-fly addition of new column fields, and fined column structure before data can be inserted or queried. In other words, HBase is a column-based database that runs on top of Hadoop distributed file system and supports features such as linear scalability (scale out), automatic failover, automatic sharding, and more flexible schema. HBase is modeled on Google BigTable. It was inspired by Google BigTable, which is a compressed, high-performance, proprietary data store built on the Google filesystem. HBase was developed as a Hadoop subproject to support storage of structural data, which can take advantage of most distributed files systems (typically, the Hadoop Distributed File System known as HDFS). The following table contains key information about HBase and its features: Features Description Developed by Apache Written in Java Type Column oriented License Apache License Lacking features of relational databases SQL support, relations, primary, foreign, and unique key constraints, normalization Website http://hbase.apache.org Distributions Apache, Cloudera Download link http://mirrors.advancedhosters.com/apache/hbase/ Mailing lists The user list: user-subscribe@hbase.apache.org The developer list: dev-subscribe@hbase.apache.org Blog http://blogs.apache.org/hbase/ HBase layout on top of Hadoop The following figure represents the layout information of HBase on top of Hadoop: There is more than one ZooKeeper in the setup, which provides high availability of master status; a RegionServer may contain multiple rations. The RegionServers run on the machines where DataNodes run. There can be as many RegionServers as DataNodes. RegionServers can have multiple HRegions; one HRegion can have one HLog and multiple HFiles with its associate's MemStore. HBase can be seen as a master-slave database where the master is called HMaster, which is responsible for coordination between client application and HRegionServer. It is also responsible for monitoring and recording metadata changes and management. Slaves are called HRegionServers, which serve the actual tables in form of regions. These regions are the basic building blocks of the HBase tables, which contain distribution of tables. So, HMaster and RegionServer work in coordination to serve the HBase tables and HBase cluster. Usually, HMaster is co-hosted with Hadoop NameNode daemon process on a server and communicates to DataNode daemon for reading and writing data on HDFS. The RegionServer runs or is co-hosted on the Hadoop DataNodes. Comparing architectural differences between RDBMs and HBase Let's list the major differences between relational databases and HBase: Relational databases HBase Uses tables as databases Uses regions as databases Filesystems supported are FAT, NTFS, and EXT Filesystem supported is HDFS The technique used to store logs is commit logs The technique used to store logs is Write-Ahead Logs (WAL) The reference system used is coordinate system The reference system used is ZooKeeper Uses the primary key Uses the row key Partitioning is supported Sharding is supported Use of rows, columns, and cells Use of rows, column families, columns, and cells HBase features Let's see the major features of HBase that make it one of the most useful databases for the current and future industry: Automatic failover and load balancing: HBase runs on top of HDFS, which is internally distributed and automatically recovered using multiple block allocation and replications. It works with multiple HMasters and region servers. This failover is also facilitated using HBase and RegionServer replication. Automatic sharding: An HBase table is made up of regions that are hosted by RegionServers and these regions are distributed throughout the RegionServers on different DataNodes. HBase provides automatic and manual splitting of these regions to smaller subregions, once it reaches a threshold size to reduce I/O time and overhead. Hadoop/HDFS integration: It's important to note that HBase can run on top of other filesystems as well. While HDFS is the most common choice as it supports data distribution and high availability using distributed Hadoop, for which we just need to set some configuration parameters and enable HBase to communicate to Hadoop, an out-of-the-box underlying distribution is provided by HDFS. Real-time, random big data access: HBase uses log-structured merge-tree (LSM-tree) as data storage architecture internally, which merges smaller files to larger files periodically to reduce disk seeks. MapReduce: HBase has a built-in support of Hadoop MapReduce framework for fast and parallel processing of data stored in HBase. You can search for the Package org.apache.hadoop.hbase.mapreduce for more details. Java API for client access: HBase has a solid Java API support (client/server) for easy development and programming. Thrift and a RESTtful web service: HBase not only provides a thrift and RESTful gateway but also web service gateways for integrating and accessing HBase besides Java code (HBase Java APIs) for accessing and working with HBase. Support for exporting metrics via the Hadoop metrics subsystem: HBase provides Java Management Extensions (JMX) and exporting matrix for monitoring purposes with tools such as Ganglia and Nagios. Distributed: HBase works when used with HDFS. It provides coordination with Hadoop so that distribution of tables, high availability, and consistency is supported by it. Linear scalability (scale out): Scaling of HBase is not scale up but scale out, which means that we don't need to make servers more powerful but we add more machines to its cluster. We can add more nodes to the cluster on the fly. As soon as a new RegionServer node is up, the cluster can begin rebalancing, start the RegionServer on the new node, and it is scaled up, it is as simple as that. Column oriented: HBase stores each column separately in contrast with most of the relational databases, which uses stores or are row-based storage. So in HBase, columns are stored contiguously and not the rows. More about row- and column-oriented databases will follow. HBase shell support: HBase provides a command-line tool to interact with HBase and perform simple operations such as creating tables, adding data, and scanning data. This also provides full-fledged command-line tool using which we can interact with HBase and perform operations such as creating table, adding data, removing data, and a few other administrative commands. Sparse, multidimensional, sorted map database: HBase is a sparse, multidimensional, sorted map-based database, which supports multiple versions of the same record. Snapshot support: HBase supports taking snapshots of metadata for getting the previous or correct state form of data. HBase in the Hadoop ecosystem Let's see where HBase sits in the Hadoop ecosystem. In the Hadoop ecosystem, HBase provides a persistent, structured, schema-based data store. The following figure illustrates the Hadoop ecosystem: HBase can work as a separate entity on the local filesystem (which is not really effective as no distribution is provided) as well as in coordination with Hadoop as a separate but connected entity. As we know, Hadoop provides two services, a distributed files system (HDFS) for storage and a MapReduce framework for processing in a parallel mode. When there was a need to store structured data (data in the form of tables, rows and columns), which most of the programmers are already familiar with, the programmers were finding it difficult to process the data that was stored on HDFS as an unstructured flat file format. This led to the evolution of HBase, which provided a way to store data in a structural way. Consider that we have got a CSV file stored on HDFS and we need to query from it. We would need to write a Java code for this, which wouldn't be a good option. It would be better if we could specify the data key and fetch the data from that file. So, what we can do here is create a schema or table with the same structure of CSV file to store the data of the CSV file in the HBase table and query using HBase APIs, or HBase shell using key. Data representation in HBase Let's look into the representation of rows and columns in HBase table: An HBase table is divided into rows, column families, columns, and cells. Row keys are unique keys to identify a row, column families are groups of columns, columns are fields of the table, and the cell contains the actual value or the data. So, we have been through the introduction of HBase; now, let's see what Hadoop and its components are in brief. It is assumed here that you are already familiar with Hadoop; if not, following a brief introduction about Hadoop will help you to understand it. Hadoop Hadoop is an underlying technology of HBase, providing high availability, fault tolerance, and distribution. It is an Apache-sponsored, free, open source, Java-based programming framework which supports large dataset storage. It provides distributed file system and MapReduce, which is a distributed programming framework. It provides a scalable, reliable, distributed storage and development environment. Hadoop makes it possible to run applications on a system with tens to tens of thousands of nodes. The underlying distributed file system provides large-scale storage, rapid data access. It has the following submodules: Hadoop Common: This is the core component that supports the other Hadoop modules. It is like the master components facilitating communication and coordination between different Hadoop modules. Hadoop distributed file system: This is the underlying distributed file system, which is abstracted on the top of the local filesystem that provides high throughput of read and write operations of data on Hadoop. Hadoop YARN: This is the new framework that is shipped with newer releases of Hadoop. It provides job scheduling and job and resource management. Hadoop MapReduce: This is the Hadoop-based processing system that provides parallel processing of large data and datasets. Other Hadoop subprojects are HBase, Hive, Ambari, Avro, Cassandra (Cassandra isn't a Hadoop subproject, it's a related project; they solve similar problems in different ways), Mahout, Pig, Spark, ZooKeeper (ZooKeeper isn't a Hadoop subproject. It's a dependency shared by many distributed systems), and so on. All of these have different usability and the combination of all these subprojects forms the Hadoop ecosystem. Core daemons of Hadoop The following are the core daemons of Hadoop: NameNode: This stores and manages all metadata about the data present on the cluster, so it is the single point of contact to Hadoop. In the new release of Hadoop, we have an option of more than one NameNode for high availability. JobTracker: This runs on the NameNode and performs the MapReduce of the jobs submitted to the cluster. SecondaryNameNode: This maintains the backup of metadata present on the NameNode, and also records the file system changes. DataNode: This will contain the actual data. TaskTracker: This will perform tasks on the local data assigned by the JobTracker. The preceding are the daemons in the case of Hadoop v1 or earlier. In newer versions of Hadoop, we have ResourceManager instead of JobTracker, the node manager instead of TaskTrackers, and the YARN framework instead of a simple MapReduce framework. The following is the comparison between daemons in Hadoop 1 and Hadoop 2: Hadoop 1 Hadoop 2 HDFS NameNode Secondary NameNode DataNode   NameNode (more than one active/standby) Checkpoint node DataNode Processing MapReduce v1 JobTracker TaskTracker   YARN (MRv2) ResourceManager NodeManager Application Master Comparing HBase with Hadoop As we now know what HBase and what Hadoop are, let's have a comparison between HDFS and HBase for better understanding: Hadoop/HDFS HBase This provide filesystem for distributed storage This provides tabular column-oriented data storage This is optimized for storage of huge-sized files with no random read/write of these files This is optimized for tabular data with random read/write facility This uses flat files This uses key-value pairs of data The data model is not flexible Provides a flexible data model This uses file system and processing framework This uses tabular storage with built-in Hadoop MapReduce support This is mostly optimized for write-once read-many This is optimized for both read/write many Summary So in this article, we discussed the introductory aspects of HBase and it's features. We have also discussed HBase's components and their place in the HBase ecosystem. Resources for Article: Further resources on this subject: The HBase's Data Storage [Article] HBase Administration, Performance Tuning [Article] Comparative Study of NoSQL Products [Article]
Read more
  • 0
  • 0
  • 15533

article-image-configuring-vshield-app
Packt
24 Nov 2014
10 min read
Save for later

Configuring vShield App

Packt
24 Nov 2014
10 min read
In this article by Mike Greer, the author of vSphere Security Cookbook, we will cover the following topics: Installing vShield App Configuring vShield App Configuring vShield App Flow Monitoring (For more resources related to this topic, see here.) Introduction In most modern operating systems exists the capability to install a firewall on the host itself. The rules configured in a host-based firewall manage the traffic at the host level, and provide an additional layer of defense along with network firewalls and intrusion detection systems. Multiple layers of security provide a complete defense-in-depth architecture. The concept of defense-in-depth builds layers of security providing protection, should another layer fail or be compromised. The second component of the vShield family to be configured that we'll discuss is vShield App. vShield App is a host-based layer 2 firewall that is implemented at the vNIC level of the hypervisor. vShield App presents itself as a virtual appliance in the vCenter management tool. For each protected ESXi host, there is an associated vShield App virtual machine that runs on the said host. To protect the entire virtualization environment managed by vCenter, it is important to install vShield App on each ESXi host in the datacenter. Failure to protect each host will allow the opportunity for virtual machines to be moved to an unprotected host either by vMotion or manually. In the event where DRS is being used, it is very likely that virtual machines will be moved to an unprotected host, assuming that it has resources available and there is a high load on adjacent hosts. Installing vShield App The vShield App is required to provide host-level security and firewall services to each individual ESXi host. This process must be completed on each ESXi host individually. Getting ready A Core Infrastructure Suite (CIS) or vCloud Networking and Security (vCNS) license must be installed prior to installing vShield App and vShield Edge. vShield App is installed per ESXi host, and vShield Manager must have been previously installed as a prerequisite. In order to proceed, we require access to vShield Web Console. The client can be run on any modern Windows or Mac desktop operating system or server operating system. vShield Web Console requires Adobe Flash, which is not supported on Linux operating systems at this time. Ensure the account used for login has administrative rights to vShield Manager. How to do it… Perform the following steps: We'll open the vShield Manager Web Console and log in with an administrative account. Navigate to Datacenter | Lab Cluster | esx5501.training.lab within vShield Manager. Ensure the ESXi host targeted for installation is not hosting the VM running vCenter. A connection is required to vCenter and the installation of vShield App will disrupt the network connection to the ESXi host. Locate vShield App from the Summary tab. Click on Install next to vShield App. Select the Datastore to hold the vShield App service information. In our example, we'll use datastore1. Select the available Management Port Group that can communicate with the vShield Manager installed previously. In our example, we'll use Internal Network. Enter the vShield App IP address. In our example, we'll use 192.168.10.30. The IP address of the vShield App must be unique and not previously assigned. Enter the Netmask. In our example, we'll use 255.255.255.0. Enter the Default Gateway. In our example, we'll use 192.168.10.1. Ensure the vShield Endpoint checkbox is cleared. Click on the Install button. The status will be shown during the installation, as shown in the following screenshot: Verify the completion of the setup with no errors. If an error occurs, the details regarding the error will be highlighted in yellow and begin with vShield App installation encountered error while installing service VM <error details>. Repeat this process for additional ESXi hosts. If vCenter is running on an ESXi host, use vMotion to migrate that VM to another host prior to installation. How it works… vShield App provides firewall functionality to an ESXi host by installing a virtual appliance that is tied to the local host. The virtual appliance is stored on the local datastore to the host. Each firewall appliance is named with the host name included for clarity. In our example, we installed on the ESXi host named esx5501.training.lab and the corresponding firewall appliance is named vShield-FW-esx5501.training.lab. One important point to consider is if the vShield App fails on a particular ESXi host and the default rule is set to deny, then all traffic will be denied to the host which can make troubleshooting difficult. If a vShield App installation fails and requires manual removal, the process to remove the failed install will require the ESXi host to be rebooted in the process. As a result, all virtual machines running on the host will need to be migrated to other nodes of the cluster or powered down. Configuring vShield App There are several important configurations to set in the vShield App management Web Console. Configuring Fail Safe Mode sets the actions that will be taken if the vShield App fails or is down. The Fail Safe Mode can either be set to allow or block. Excluding virtual machines such as vCenter is key to allow proper functionality since it will exclude any virtual machine from firewall rules. Getting ready In order to proceed, we require access to vShield Web Console. The client can be run on any modern Windows or Mac desktop operating system or server operating system. vShield Web Console requires Adobe Flash, which is not supported on Linux operating systems at this time. Ensure the account used for log in has administrative rights to vShield Manager. How to do it… Viewing current status is the first step in assessing the state of the vShield App. Once the app is verified to be in a healthy state, additional configurations can be accomplished by performing the following steps: Launch vSphere Client using an account with administrative rights. For our example, view the following: Choose Home | Inventory | Hosts and Clusters from the menu bar. Navigate to Datacenter | Lab Cluster | esx5501.training.lab. Select the vShield tab. Expand vShield-FW-esx5501.training.lab (192.168.10.30). Note the Status: In Sync status, there are two options to either Force Sync or Restart. Current Management Port Information is displayed including the packet, byte, and error information. Syslog Servers can be added by an IP Address. Configuring the Fail Safe Policy allows traffic to flow or be blocked should the vShield App firewall be down or offline for any reason. Navigate to Settings & Reports | vShield App within vShield Manager. Click on Change under Fail Safe. This step will lead to the following screen: Click on Yes. Note that Default Fail Safe Configuration set to Block is now changed to Allow. In a production situation, there are few times the fail safe setting will be changed to Allow. In a small test environment, should the vShield App be unavailable; all connectivity to the ESXi host will be blocked by default. Configuring the Exclusion List allows certain virtual machines to function without host based firewall rules being applied to them. This can be done by performing the following steps: Navigate to Settings & Reports | vShield App within vShield Manager. Click on Add under Exclusion List. Select a virtual machine to exclude from vShield App (in our case, it is a Linked vCenter server). Click on Add. Click on OK. Click on OK in the next dialog box to confirm. The selected virtual machine is now excluded from protection How it works… The vShield App host firewall is installed per ESXi host and automatically named by the installation program to include the name of the host. It is also important to note that when a host is put into the maintenance mode, the vShield firewall must be shut down in order to let the host successfully achieve the maintenance mode. Current Status provides a single view of the firewall associated with the host including the traffic status displayed by packet count, link, and admin status. One important status check is if the firewall is in sync with vShield Manager. Should the firewall fall out of sync, it can be forced to sync and if that fails, the option for restart is also present on the status page. Fail Safe Policy is an important consideration should the vShield App virtual appliance fail for any reason. The default setting is to block all traffic and this might seem like a good idea at the outset. Careful consideration should be given on setting this flag, depending on what type of virtual machines are running on a specific host or cluster. In the situation where mission critical applications are running on virtual machines within an internal cluster or host, it will make sense to allow traffic if the vShield App were down. The average time it will take to identify and remediate the failure could cause a significant impact on the amount of business lost. Exclusion List, as the name implies, allows certain virtual machines to remain outside the protection of the vShield App firewall. Critical infrastructure such as DNS or Domain Controllers are good candidates to be added to the exclusion list. vCenter servers should always be added to the exclusion list. Configuring vShield App Flow Monitoring The vShield App Flow Monitoring is a traffic analysis tool that provides statistics and graphs of the traffic on the virtual network as it passes through a host running vShield App. The information collected and displayed by Flow Monitoring is detailed to the protocol level and is very useful in spotting unwanted traffic flows. Getting ready In order to proceed, we require access to the vShield App through the vSphere Client plugin. The plugin can be enabled through the Plug-ins menu in the vSphere Client. This client can be run on any modern Windows desktop operating system or server operating system. The vShield vSphere Client plugin requires Adobe Flash, which is not supported on Linux operating systems at this time. Ensure the vCenter account used for login has administrative rights to vShield Manager. How to do it… To view the current traffic flow, launch vSphere Client using an account with administrative rights. For our example view the following: Navigate to Home | Inventory | Hosts and Clusters from the menu bar. Navigate to Datacenter andclick on the vShield tab. Select Flow Monitoring. Note that the Summary information is displayed by default. Summary Information Click on Details to view detailed information. Allowed Flows and Blocked Flows are available to view. Select DNS-UDP to identify the host lookup traffic on port 53. Click on Add Rule for Rule Id 1002. By changing Action from Allow to Block, a firewall rule can be modified to block the DNS traffic from the web server to the DNS Server. Click on Cancel. How it works… The Flow Monitoring component of vShield App, in addition to providing great detail, is able to create vShield App firewall rules on the fly. As shown in the preceding example, we were able to identify the DNS traffic from our web server accessing an internal DNS server. Due to our governance rules, servers accessed in the DMZ are not allowed to request DNS from internal servers. Implementing a firewall rule adds a control to this policy and gets easily implemented once the administrator noticed the request. The ability to view traffic by Top Flows, Top Destinations, and Top Sources is very valuable when troubleshooting a problem or tracking down a virus or trojan that is attempting to send valuable information outside the organization during a breach. Summary This article has thus covered how to set up and configure vShield App. Resources for Article: Further resources on this subject: Introduction to Veeam® Backup & Replication for VMware [article] VMware vCenter Operations Manager Essentials - Introduction to vCenter Operations Manager [article] Introduction to vSphere Distributed switches [article]
Read more
  • 0
  • 0
  • 3805

article-image-decoupling-units-unittestmock
Packt
24 Nov 2014
27 min read
Save for later

Decoupling Units with unittest.mock

Packt
24 Nov 2014
27 min read
In this article by Daniel Arbuckle, author of the book Learning Python Testing, you'll learn how by using the unittest.mock package, you can easily perform the following: Replace functions and objects in your own code or in external packages. Control how replacement objects behave. You can control what return values they provide, whether they raise an exception, even whether they make any calls to other functions, or create instances of other objects. Check whether the replacement objects were used as you expected: whether functions or methods were called the correct number of times, whether the calls occurred in the correct order, and whether the passed parameters were correct. (For more resources related to this topic, see here.) Mock objects in general All right, before we get down to the nuts and bolts of unittest.mock, let's spend a few moments talking about mock objects overall. Broadly speaking, mock objects are any objects that you can use as substitutes in your test code, to keep your tests from overlapping and your tested code from infiltrating the wrong tests. However, like most things in programming, the idea works better when it has been formalized into a well-designed library that you can call on when you need it. There are many such libraries available for most programming languages. Over time, the authors of mock object libraries have developed two major design patterns for mock objects: in one pattern, you can create a mock object and perform all of the expected operations on it. The object records these operations, and then you put the object into playback mode and pass it to your code. If your code fails to duplicate the expected operations, the mock object reports a failure. In the second pattern, you can create a mock object, do the minimal necessary configuration to allow it to mimic the real object it replaces, and pass it to your code. It records how the code uses it, and then you can perform assertions after the fact to check whether your code used the object as expected. The second pattern is slightly more capable in terms of the tests that you can write using it but, overall, either pattern works well. Mock objects according to unittest.mock Python has several mock object libraries; as of Python 3.3, however, one of them has been crowned as a member of the standard library. Naturally that's the one we're going to focus on. That library is, of course, unittest.mock. The unittest.mock library is of the second sort, a record-actual-use-and-then-assert library. The library contains several different kinds of mock objects that, between them, let you mock almost anything that exists in Python. Additionally, the library contains several useful helpers that simplify assorted tasks related to mock objects, such as temporarily replacing real objects with mocks. Standard mock objects The basic element of unittest.mock is the unittest.mock.Mock class. Even without being configured at all, Mock instances can do a pretty good job of pretending to be some other object, method, or function. There are many mock object libraries for Python; so, strictly speaking, the phrase "mock object" could mean any object that was created by any of these libraries. Mock objects can pull off this impersonation because of a clever, somewhat recursive trick. When you access an unknown attribute of a mock object, instead of raising an AttributeError exception, the mock object creates a child mock object and returns that. Since mock objects are pretty good at impersonating other objects, returning a mock object instead of the real value works at least in the common case. Similarly, mock objects are callable; when you call a mock object as a function or method, it records the parameters of the call and then, by default, returns a child mock object. A child mock object is a mock object in its own right, but it knows that it's connected to the mock object it came from—its parent. Anything you do to the child is also recorded in the parent's memory. When the time comes to check whether the mock objects were used correctly, you can use the parent object to check on all of its descendants. Example: Playing with mock objects in the interactive shell (try it for yourself!): $ python3.4 Python 3.4.0 (default, Apr 2 2014, 08:10:08) [GCC 4.8.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from unittest.mock import Mock, call >>> mock = Mock() >>> mock.x <Mock name='mock.x' id='140145643647832'> >>> mock.x <Mock name='mock.x' id='140145643647832'> >>> mock.x('Foo', 3, 14) <Mock name='mock.x()' id='140145643690640'> >>> mock.x('Foo', 3, 14) <Mock name='mock.x()' id='140145643690640'> >>> mock.x('Foo', 99, 12) <Mock name='mock.x()' id='140145643690640'> >>> mock.y(mock.x('Foo', 1, 1)) <Mock name='mock.y()' id='140145643534320'> >>> mock.method_calls [call.x('Foo', 3, 14), call.x('Foo', 3, 14), call.x('Foo', 99, 12), call.x('Foo', 1, 1), call.y(<Mock name='mock.x()' id='140145643690640'>)] >>> mock.assert_has_calls([call.x('Foo', 1, 1)]) >>> mock.assert_has_calls([call.x('Foo', 1, 1), call.x('Foo', 99, 12)]) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.4/unittest/mock.py", line 792, in assert_has_ calls ) from cause AssertionError: Calls not found. Expected: [call.x('Foo', 1, 1), call.x('Foo', 99, 12)] Actual: [call.x('Foo', 3, 14), call.x('Foo', 3, 14), call.x('Foo', 99, 12), call.x('Foo', 1, 1), call.y(<Mock name='mock.x()' id='140145643690640'>)] >>> mock.assert_has_calls([call.x('Foo', 1, 1), ... call.x('Foo', 99, 12)], any_order = True) >>> mock.assert_has_calls([call.y(mock.x.return_value)]) There are several important things demonstrated in this interactive session. First, notice that the same mock object was returned each time that we accessed mock.x. This always holds true: if you access the same attribute of a mock object, you'll get the same mock object back as the result. The next thing to notice might seem more surprising. Whenever you call a mock object, you get the same mock object back as the return value. The returned mock isn't made new for every call, nor is it unique for each combination of parameters. We'll see how to override the return value shortly but, by default, you get the same mock object back every time you call a mock object. This mock object can be accessed using the return_value attribute name, as you might have noticed from the last statement of the example. The unittest.mock package contains a call object that helps to make it easier to check whether the correct calls have been made. The call object is callable, and takes note of its parameters in a way similar to mock objects, making it easy to compare it to a mock object's call history. However, the call object really shines when you have to check for calls to descendant mock objects. As you can see in the previous example, while call('Foo', 1, 1) will match a call to the parent mock object, if the call used these parameters, call.x('Foo', 1, 1), it matches a call to the child mock object named x. You can build up a long chain of lookups and invocations. For example: >>> mock.z.hello(23).stuff.howdy('a', 'b', 'c') <Mock name='mock.z.hello().stuff.howdy()' id='140145643535328'> >>> mock.assert_has_calls([ ... call.z.hello().stuff.howdy('a', 'b', 'c') ... ]) >>> Notice that the original invocation included hello(23), but the call specification wrote it simply as hello(). Each call specification is only concerned with the parameters of the object that was finally called after all of the lookups. The parameters of intermediate calls are not considered. That's okay because they always produce the same return value anyway unless you've overridden that behavior, in which case they probably don't produce a mock object at all. You might not have encountered an assertion before. Assertions have one job, and one job only: they raise an exception if something is not as expected. The assert_has_calls method, in particular, raises an exception if the mock object's history does not include the specified calls. In our example, the call history matches, so the assertion method doesn't do anything visible. You can check whether the intermediate calls were made with the correct parameters, though, because the mock object recorded a call immediately to mock.z.hello(23) before it recorded a call to mock.z.hello().stuff.howdy('a', 'b', 'c'): >>> mock.mock_calls.index(call.z.hello(23)) 6 >>> mock.mock_calls.index(call.z.hello().stuff.howdy('a', 'b', 'c')) 7 This also points out the mock_calls attribute that all mock objects carry. If the various assertion functions don't quite do the trick for you, you can always write your own functions that inspect the mock_calls list and check whether things are or are not as they should be. We'll discuss the mock object assertion methods shortly. Non-mock attributes What if you want a mock object to give back something other than a child mock object when you look up an attribute? It's easy; just assign a value to that attribute: >>> mock.q = 5 >>> mock.q 5 There's one other common case where mock objects' default behavior is wrong: what if accessing a particular attribute is supposed to raise an AttributeError? Fortunately, that's easy too: >>> del mock.w >>> mock.w Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.4/unittest/mock.py", line 563, in __getattr__ raise AttributeError(name) AttributeError: w Non-mock return values and raising exceptions Sometimes, actually fairly often, you'll want mock objects posing as functions or methods to return a specific value, or a series of specific values, rather than returning another mock object. To make a mock object always return the same value, just change the return_value attribute: >>> mock.o.return_value = 'Hi' >>> mock.o() 'Hi' >>> mock.o('Howdy') 'Hi' If you want the mock object to return different value each time it's called, you need to assign an iterable of return values to the side_effect attribute instead, as follows: >>> mock.p.side_effect = [1, 2, 3] >>> mock.p() 1 >>> mock.p() 2 >>> mock.p() 3 >>> mock.p() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.4/unittest/mock.py", line 885, in __call__ return _mock_self._mock_call(*args, **kwargs) File "/usr/lib64/python3.4/unittest/mock.py", line 944, in _mock_call result = next(effect) StopIteration If you don't want your mock object to raise a StopIteration exception, you need to make sure to give it enough return values for all of the invocations in your test. If you don't know how many times it will be invoked, an infinite iterator such as itertools.count might be what you need. This is easily done: >>> mock.p.side_effect = itertools.count() If you want your mock to raise an exception instead of returning a value, just assign the exception object to side_effect, or put it into the iterable that you assign to side_effect: >>> mock.e.side_effect = [1, ValueError('x')] >>> mock.e() 1 >>> mock.e() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.4/unittest/mock.py", line 885, in __call__ return _mock_self._mock_call(*args, **kwargs) File "/usr/lib64/python3.4/unittest/mock.py", line 946, in _mock_call raise result ValueError: x The side_effect attribute has another use, as well that we'll talk about. Mocking class or function details Sometimes, the generic behavior of mock objects isn't a close enough emulation of the object being replaced. This is particularly the case when it's important that they raise exceptions when used improperly, since mock objects are usually happy to accept any usage. The unittest.mock package addresses this problem using a technique called speccing. If you pass an object into unittest.mock.create_autospec, the returned value will be a mock object, but it will do its best to pretend that it's the same object you passed into create_autospec. This means that it will: Raise an AttributeError if you attempt to access an attribute that the original object doesn't have, unless you first explicitly assign a value to that attribute Raise a TypeError if you attempt to call the mock object when the original object wasn't callable Raise a TypeError if you pass the wrong number of parameters or pass a keyword parameter that isn't viable if the original object was callable Trick isinstance into thinking that the mock object is of the original object's type Mock objects made by create_autospec share this trait with all of their children as well, which is usually what you want. If you really just want a specific mock to be specced, while its children are not, you can pass the template object into the Mock constructor using the spec keyword. Here's a short demonstration of using create_autospec: >>> from unittest.mock import create_autospec >>> x = Exception('Bad', 'Wolf') >>> y = create_autospec(x) >>> isinstance(y, Exception) True >>> y <NonCallableMagicMock spec='Exception' id='140440961099088'> Mocking function or method side effects Sometimes, for a mock object to successfully take the place of a function or method means that the mock object has to actually perform calls to other functions, or set variable values, or generally do whatever a function can do. This need is less common than you might think, and it's also somewhat dangerous for testing purposes because, when your mock objects can execute arbitrary code, there's a possibility that they stop being a simplifying tool for enforcing test isolation, and become a complex part of the problem instead. Having said that, there are still times when you need a mocked function to do something more complex than simply returning a value, and we can use the side_effect attribute of mock objects to achieve this. We've seen side_effect before, when we assigned an iterable of return values to it. If you assign a callable to side_effect, this callable will be called when the mock object is called and passed the same parameters. If the side_effect function raises an exception, this is what the mock object does as well; otherwise, the side_effect return value is returned by the mock object. In other words, if you assign a function to a mock object's side_effect attribute, this mock object in effect becomes that function with the only important difference being that the mock object still records the details of how it's used. The code in a side_effect function should be minimal, and should not try to actually do the job of the code the mock object is replacing. All it should do is perform any expected externally visible operations and then return the expected result.Mock object assertion methods As we saw in the Standard mock objects section, you can always write code that checks the mock_calls attribute of mock objects to see whether or not things are behaving as they should. However, there are some particularly common checks that have already been written for you, and are available as assertion methods of the mock objects themselves. As is normal for assertions, these assertion methods return None if they pass, and raise an AssertionError if they fail. The assert_called_with method accepts an arbitrary collection of arguments and keyword arguments, and raises an AssertionError unless these parameters were passed to the mock the last time it was called. The assert_called_once_with method behaves like assert_called_with, except that it also checks whether the mock was only called once and raises AssertionError if that is not true. The assert_any_call method accepts arbitrary arguments and keyword arguments, and raises an AssertionError if the mock object has never been called with these parameters. We've already seen the assert_has_calls method. This method accepts a list of call objects, checks whether they appear in the history in the same order, and raises an exception if they do not. Note that "in the same order" does not necessarily mean "next to each other." There can be other calls in between the listed calls as long as all of the listed calls appear in the proper sequence. This behavior changes if you assign a true value to the any_order argument. In that case, assert_has_calls doesn't care about the order of the calls, and only checks whether they all appear in the history. The assert_not_called method raises an exception if the mock has ever been called. Mocking containers and objects with a special behavior One thing the Mock class does not handle is the so-called magic methods that underlie Python's special syntactic constructions: __getitem__, __add__, and so on. If you need your mock objects to record and respond to magic methods—in other words, if you want them to pretend to be container objects such as dictionaries or lists, or respond to mathematical operators, or act as context managers or any of the other things where syntactic sugar translates it into a method call underneath—you're going to use unittest.mock.MagicMock to create your mock objects. There are a few magic methods that are not supported even by MagicMock, due to details of how they (and mock objects) work: __getattr__, __setattr__, __init__ , __new__, __prepare__, __instancecheck__, __subclasscheck__, and __del__. Here's a simple example in which we use MagicMock to create a mock object supporting the in operator: >>> from unittest.mock import MagicMock >>> mock = MagicMock() >>> 7 in mock False >>> mock.mock_calls [call.__contains__(7)] >>> mock.__contains__.return_value = True >>> 8 in mock True >>> mock.mock_calls [call.__contains__(7), call.__contains__(8)] Things work similarly with the other magic methods. For example, addition: >>> mock + 5 <MagicMock name='mock.__add__()' id='140017311217816'> >>> mock.mock_calls [call.__contains__(7), call.__contains__(8), call.__add__(5)] Notice that the return value of the addition is a mock object, a child of the original mock object, but the in operator returned a Boolean value. Python ensures that some magic methods return a value of a particular type, and will raise an exception if that requirement is not fulfilled. In these cases, MagicMock's implementations of the methods return a best-guess value of the proper type, instead of a child mock object. There's something you need to be careful of when it comes to the in-place mathematical operators, such as += (__iadd__) and |= (__ior__), and that is the fact that MagicMock handles them somewhat strangely. What it does is still useful, but it might well catch you by surprise: >>> mock += 10 >>> mock.mock_calls [] What was that? Did it erase our call history? Fortunately, no, it didn't. What it did was assign the child mock created by the addition operation to the variable called mock. That is entirely in accordance with how the in-place math operators are supposed to work. Unfortunately, it has still cost us our ability to access the call history, since we no longer have a variable pointing at the parent mock object. Make sure that you have the parent mock object set aside in a variable that won't be reassigned, if you're going to be checking in-place math operators. Also, you should make sure that your mocked in-place operators return the result of the operation, even if that just means return self.return_value, because otherwise Python will assign None to the left-hand variable. There's another detailed way in which in-place operators work that you should keep in mind: >>> mock = MagicMock() >>> x = mock >>> x += 5 >>> x <MagicMock name='mock.__iadd__()' id='139845830142216'> >>> x += 10 >>> x <MagicMock name='mock.__iadd__().__iadd__()' id='139845830154168'> >>> mock.mock_calls [call.__iadd__(5), call.__iadd__().__iadd__(10)] Because the result of the operation is assigned to the original variable, a series of in-place math operations builds up a chain of child mock objects. If you think about it, that's the right thing to do, but it is rarely what people expect at first. Mock objects for properties and descriptors There's another category of things that basic Mock objects don't do a good job of emulating: descriptors. Descriptors are objects that allow you to interfere with the normal variable access mechanism. The most commonly used descriptors are created by Python's property built-in function, which simply allows you to write functions to control getting, setting, and deleting a variable. To mock a property (or other descriptor), create a unittest.mock.PropertyMock instance and assign it to the property name. The only complication is that you can't assign a descriptor to an object instance; you have to assign it to the object's type because descriptors are looked up in the type without first checking the instance. That's not hard to do with mock objects, fortunately: >>> from unittest.mock import PropertyMock >>> mock = Mock() >>> prop = PropertyMock() >>> type(mock).p = prop >>> mock.p <MagicMock name='mock()' id='139845830215328'> >>> mock.mock_calls [] >>> prop.mock_calls [call()] >>> mock.p = 6 >>> prop.mock_calls [call(), call(6)] The thing to be mindful of here is that the property is not a child of the object named mock. Because of this, we have to keep it around in its own variable because otherwise we'd have no way of accessing its history. The PropertyMock objects record variable lookup as a call with no parameters, and variable assignment as a call with the new value as a parameter. You can use a PropertyMock object if you actually need to record variable accesses in your mock object history. Usually you don't need to do that, but the option exists. Even though you set a property by assigning it to an attribute of a type, you don't have to worry about having your PropertyMock objects bleed over into other tests. Each Mock you create has its own type object, even though they all claim to be of the same class: >>> type(Mock()) is type(Mock()) False Thanks to this feature, any changes that you make to a mock object's type object are unique to that specific mock object. Mocking file objects It's likely that you'll occasionally need to replace a file object with a mock object. The unittest.mock library helps you with this by providing mock_open, which is a factory for fake open functions. These functions have the same interface as the real open function, but they return a mock object that's been configured to pretend that it's an open file object. This sounds more complicated than it is. See for yourself: >>> from unittest.mock import mock_open >>> open = mock_open(read_data = 'moose') >>> with open('/fake/file/path.txt', 'r') as f: ... print(f.read()) ... moose If you pass a string value to the read_data parameter, the mock file object that eventually gets created will use that value as the data source when its read methods get called. As of Python 3.4.0, read_data only supports string objects, not bytes. If you don't pass read_data, read method calls will return an empty string. The problem with the previous code is that it makes the real open function inaccessible, and leaves a mock object lying around where other tests might stumble over it. Read on to see how to fix these problems. Replacing real code with mock objects The unittest.mock library gives a very nice tool for temporarily replacing objects with mock objects, and then undoing the change when our test is done. This tool is unittest.mock.patch. There are a lot of different ways in which that patch can be used: it works as a context manager, a function decorator, and a class decorator; additionally, it can create a mock object to use for the replacement or it can use the replacement object that you specify. There are a number of other optional parameters that can further adjust the behavior of the patch. Basic usage is easy: >>> from unittest.mock import patch, mock_open >>> with patch('builtins.open', mock_open(read_data = 'moose')) as mock: ... with open('/fake/file.txt', 'r') as f: ... print(f.read()) ... moose >>> open <built-in function open> As you can see, patch dropped the mock open function created by mock_open over the top of the real open function; then, when we left the context, it replaced the original for us automatically. The first parameter of patch is the only one that is required. It is a string describing the absolute path to the object to be replaced. The path can have any number of package and subpackage names, but it must include the module name and the name of the object inside the module that is being replaced. If the path is incorrect, patch will raise an ImportError, TypeError, or AttributeError, depending on what exactly is wrong with the path. If you don't want to worry about making a mock object to be the replacement, you can just leave that parameter off: >>> import io >>> with patch('io.BytesIO'): ... x = io.BytesIO(b'ascii data') ... io.BytesIO.mock_calls [call(b'ascii data')] The patch function creates a new MagicMock for you if you don't tell it what to use for the replacement object. This usually works pretty well, but you can pass the new parameter (also the second parameter, as we used it in the first example of this section) to specify that the replacement should be a particular object; or you can pass the new_callable parameter to make patch use the value of that parameter to create the replacement object. We can also force the patch to use create_autospec to make the replacement object, by passing autospec=True: >>> with patch('io.BytesIO', autospec = True): ... io.BytesIO.melvin Traceback (most recent call last): File "<stdin>", line 2, in <module> File "/usr/lib64/python3.4/unittest/mock.py", line 557, in __getattr__ raise AttributeError("Mock object has no attribute %r" % name) AttributeError: Mock object has no attribute 'melvin' The patch function will normally refuse to replace an object that does not exist; however, if you pass it create=True, it will happily drop a mock object wherever you like. Naturally, this is not compatible with autospec=True. The patch function covers the most common cases. There are a few related functions that handle less common but still useful cases. The patch.object function does the same thing as patch, except that, instead of taking the path string, it accepts an object and an attribute name as its first two parameters. Sometimes this is more convenient than figuring out the path to an object. Many objects don't even have valid paths (for example, objects that exist only in a function local scope), although the need to patch them is rarer than you might think. The patch.dict function temporarily drops one or more objects into a dictionary under specific keys. The first parameter is the target dictionary; the second is a dictionary from which to get the key and value pairs to put into the target. If you pass clear=True, the target will be emptied before the new values are inserted. Notice that patch.dict doesn't create the replacement values for you. You'll need to make your own mock objects, if you want them. Mock objects in action That was a lot of theory interspersed with unrealistic examples. Let's take a look at what we've learned and apply it for a more realistic view of how these tools can help us. Better PID tests The PID tests suffered mostly from having to do a lot of extra work to patch and unpatch time.time, and had some difficulty breaking the dependence on the constructor. Patching time.time Using patch, we can remove a lot of the repetitiveness of dealing with time.time; this means that it's less likely that we'll make a mistake somewhere, and saves us from spending time on something that's kind of boring and annoying. All of the tests can benefit from similar changes: >>> from unittest.mock import Mock, patch >>> with patch('time.time', Mock(side_effect = [1.0, 2.0, 3.0, 4.0, 5.0])): ... import pid ... controller = pid.PID(P = 0.5, I = 0.5, D = 0.5, setpoint = 0, ... initial = 12) ... assert controller.gains == (0.5, 0.5, 0.5) ... assert controller.setpoint == [0.0] ... assert controller.previous_time == 1.0 ... assert controller.previous_error == -12.0 ... assert controller.integrated_error == 0.0 Apart from using patch to handle time.time, this test has been changed. We can now use assert to check whether things are correct instead of having doctest compare the values directly. There's hardly any difference between the two approaches, except that we can place the assert statements inside the context managed by patch. Decoupling from the constructor Using mock objects, we can finally separate the tests for the PID methods from the constructor, so that mistakes in the constructor cannot affect the outcome: >>> with patch('time.time', Mock(side_effect = [2.0, 3.0, 4.0, 5.0])): ... pid = imp.reload(pid) ... mock = Mock() ... mock.gains = (0.5, 0.5, 0.5) ... mock.setpoint = [0.0] ... mock.previous_time = 1.0 ... mock.previous_error = -12.0 ... mock.integrated_error = 0.0 ... assert pid.PID.calculate_response(mock, 6) == -3.0 ... assert pid.PID.calculate_response(mock, 3) == -4.5 ... assert pid.PID.calculate_response(mock, -1.5) == -0.75 ... assert pid.PID.calculate_response(mock, -2.25) == -1.125 What we've done here is set up a mock object with the proper attributes, and pass it into calculate_response as the self-parameter. We could do this because we didn't create a PID instance at all. Instead, we looked up the method's function inside the class and called it directly, allowing us to pass whatever we wanted as the self-parameter instead of having Python's automatic mechanisms handle it. Never invoking the constructor means that we're immune to any errors it might contain, and guarantees that the object state is exactly what we expect here in our calculate_response test. Summary In this article, we've learned about a family of objects that specialize in impersonating other classes, objects, methods, and functions. We've seen how to configure these objects to handle corner cases where their default behavior isn't sufficient, and we've learned how to examine the activity logs that these mock objects keep, so that we can decide whether the objects are being used properly or not. Resources for Article: Further resources on this subject: Installing NumPy, SciPy, matplotlib, and IPython [Article] Machine Learning in IPython with scikit-learn [Article] Python 3: Designing a Tasklist Application [Article]
Read more
  • 0
  • 0
  • 7112

article-image-amazon-web-services
Packt
20 Nov 2014
16 min read
Save for later

Amazon Web Services

Packt
20 Nov 2014
16 min read
 In this article, by Prabhakaran Kuppusamy and Uchit Vyas, authors of AWS Development Essentials, you will learn different tools and methods available to perform the same operation with different, varying complexities. Various options are available, depending on the user's level of experience. In this article, we will start with an overview of each service, learn about the various tools available for programmer interaction, and finally see the troubleshooting and best practices to be followed while using these services. AWS provides a handful of services in every area. In this article, we will cover the following topics: Navigate through the AWS Management Console Describe the security measures that AWS provides AWS interaction through the SDK and IDE tools (For more resources related to this topic, see here.) Background of AWS and its needs AWS is based on an idea presented by Chris Pinkham and Benjamin Black with a vision towards Amazon's retail computing infrastructure. The first Amazon offering was SQS, in the year 2004. Officially, AWS was launched and made available online in 2006, and within a year, 200,000 developers signed up for these services. Later, due to a natural disaster (June 29, 2012 storm in North Virginia, which brought down most of the servers residing at this location) and technical events, AWS faced a lot of challenges. A similar event happened on December 2012, after which AWS has been providing services as stated. AWS learned from these events and made sure that the same kind of outage didn't occur even if the same event occurred again. AWS is an idea born in a single room, but the idea is now made available and used by almost all the cloud developers and IT giants. AWS is greatly loved by all kinds of technology admirers. Irrespective of the user's expertise, AWS has something for various types of users. For an expert programmer, AWS has SDKs for each service. Using these SDKs, the programmer can perform operations by entering commands in the command-line interface. However an end user with limited knowledge of programming can still perform similar operations using the graphical user interface of the AWS Management Console, which is accessible through a web browser. If the programmers need interactions between a low-level (SDK) and a high-level (Management Console), they can go for the integrated development environment (IDE) tools, for which AWS provides plugins and add-ons. One such commonly used IDE for which AWS has provided add-ons is the Eclipse IDE. As of now, we will start with the AWS Management Console. The AWS Management Console The most popular method of accessing AWS is via the Management Console because of its simplicity of usage and power. Another reason why the end user prefers the Management Console is that it doesn't require any software to start with; having an Internet connection and a browser is sufficient. As the name suggests, the Management Console is a place where administrative and advanced operations can be performed on your AWS account details or AWS services. The Management Console mainly focuses on the following features: One-click access to AWS's services AWS account administration AWS management using handheld devices AWS infrastructure management across the globe One-click access to the AWS services To access the Management Console, all you need to do is first sign up with AWS. Once done, the Management Console will be available at https://console.aws.amazon.com/. Once you have signed up, you will be directed to the following page: Each and every icon on this page is an Amazon Web Service. Two or more services will be grouped under a category. For example, in the Analytics category, you can see three services, namely, Data Pipeline, Elastic MapReduce, and Kinesis. Starting with any of these services is very easy. Have a look at the description of the service at the bottom of the service icon. As soon as you click on the service icon, it will take you to the Getting started page of the corresponding service, where brief as well as detailed guidelines are available. In order to start with any of the services, only two things are required. The first one is an AWS account and the second one is the supported browser. The Getting started section usually will have a video, which explains the specialty and use cases of the service that you selected. Once you finish reading the Getting started section, optionally you can go through the DOC files specific to the service to know more about the syntaxes and usage of the service operations. AWS account administration The account administration is one of the most important things to make note of. To do this, click on your displayed name (in this case, Prabhakar) at the top of the page, and then click on the My Account option, as shown in the preceding screenshot. At the beginning of every month, you don't want AWS to deduct all your salary by stating that you have used these many services costing this much money; hence, all this management information is available in the Management Console. Using the Management Console, you can infer the following information: The monthly billing in brief as well as the detailed manner (cost split-up of each service) along with a provision to view VAT and tax exemption Account details, such as the display name and contact information Provision to close the AWS account All the preceding operations and much more are possible. AWS management using handheld devices Managing and accessing the AWS services is through (but not limited to) PC. AWS provides a handful of applications almost for all or most of the mobile platforms, such as Android, iOS, and so on. Using these applications, you can perform all the AWS operations on the move. You won't believe that having a 7-inch Android tablet with the installed AWS Console application from Google Play will enable you to ask for any Elastic Compute Cloud (EC2) instance from Amazon and control it (start, stop, and terminate) very easily. You can install an SSH client in the tablet and connect to the Linux terminal. However, if you wish to make use of the Windows instance from EC2, you might use the Graphics User Interface (GUI) more frequently than a command line. A few more sophisticated software and hardware might be needed, for example, you should have a VNC viewer or remote desktop connection software to get the GUI of the EC2 instance borrowed. As you are making use of the GUI in addition to the keyboard, you will need a pointer device, such as a mouse. As a result, you will almost get addicted to the concept of cloud computing going mobile. AWS infrastructure management across the globe At this point, you might be aware that you can get all of these AWS services from servers residing at any of the following locations. To control these services used by you in different regions, you don't have to go anywhere else. You can control it right here in the same Management Console. Using the same Management Console, just by clicking on N.Virginia and choosing the location (at the top of the Management Console), you can make the service available in that region, as shown in the following screenshot: You can choose the server location at which you want the service (data and machine) to be made available based on the following two factors: The first factor is the distance between the server's location and the client's location. For example, if you have deployed a web application for a client from North California at a Tokyo location, obviously the latency will be high while accessing the application. Therefore, choosing the optimum service location is the primary factor. The second factor is the charge for the service in a specific location. AWS charges more for certain crowded servers. Just for illustration, assume that the server for North California is used by many critical companies. So this might cost you twice if you create your servers at North California compared to the other locations. Hence, you should always consider the tradeoff between the location and cost and then decide on the server location. Whenever you click on any of the services, AWS will always select the location that costs you less money as the default. AWS security measures Whenever you think of moving your data center to a public cloud, the first question that arises in your mind is about data security. In a public cloud, through virtualization technology, multiple users might be using the same hardware (server) in which your data is available. You will learn in detail about how AWS ensures data security. Instance isolation Before learning about instance isolation, you must know how AWS EC2 provisions the instances to the user. This service allows you to rent virtual machines (AWS calls it instances) with whatever configurations you ask. Let's assume that you requested AWS to provision a 2 GB RAM, a 100 GB HDD, and an Ubuntu instance. Within a minute, you will be given the instance's connection details (public DNS, private IP, and so on), and the instance starts running. Does this mean that AWS assembled a 2*1 GB RAM and 100 GB HDD into a CPU cabinet and then installed Ubuntu OS in it and gave you the access? The answer is no. The provisioned instance is not a single PC (or bare metal) with an OS installed in it. The instance is the outcome of a virtual machine provisioned by Amazon's private cloud. The following diagram shows how a virtual machine can be provisioned by a private cloud: Let's examine the diagram from bottom to top. First, we will start with the underlying Hardware/Host. Hardware is the server, which usually has a very high specification. Here, assume that your hardware has the configuration of a 99 GB RAM, a 450 TB HDD, and a few other elements, such as NIC, which you need not consider now. The next component in your sights is the Hypervisor. A hypervisor or virtual machine monitor (VMM) is used to create and run virtual machines on the hardware. In private cloud terms, whichever machine runs a hypervisor on it is called the host machine. Three users can request each of them need instances with a 33 GB RAM and 150 TB HDD space. This request goes to the hypervisor and it then starts creating those VMs. After creating the VMs, a notification about the connection parameters will be sent to each user. In the preceding diagram, you can see the three virtual machines (VMs) created by the hypervisor. All the three VMs are running on different operating systems. Even if all the three virtual machines are used by different users, each will feel that only he/she has access to the single piece of hardware, which is only used by them; user 1 might not know that the same hardware is also being used by user 2, and so on. The process of creating a virtual version of a machine or storage or network is called virtualization. The funny part is that none of the virtual machines knows that it is being virtualized (that is, all the VMs are created on the same host). After getting this information about your instances, some users may feel deceived, and some will be even disappointed and cry out loud, has your instance been created on a shared disc or resource? Even though the disc (or hardware) is shared, one instance (or owner of the instance) is isolated from the other instances on the same disc through a firewall. This concept is termed as instance isolation. The following diagram demonstrates instance isolation in AWS: The preceding diagram clearly demonstrates how EC2 provides instances to every user. Even though all the instances are lying in the same disc, they are isolated by hypervisor. Hypervisor has a firewall that does this isolation. So, the physical interface will not interact with the underlying hardware (machine or disc where instances are available) or virtual interface directly. All these interactions will be through hypervisor's firewall. This way AWS ensures that no user can directly access the disc, and no instance can directly interact with another instance even if both instances are running on the same hardware. In addition to the firewall, during the creation of the EC2 instance, the user can specify the permitted and denied security groups of the instance. These two ideologies provide instance isolation. In the preceding diagram, Customer 1, Customer 2, and so on are virtualized discs since the customer instances have no access to raw or actual disc devices. As an added security measure, the user can encrypt his/her disc so that other users cannot access the disc content (even if someone gets in contact with the disc). Isolated GovCloud Similar to North California or Asia Pacific, GovCloud is also a location where you can get your AWS services. This location is specifically designed only for government and agencies whose data is very confidential and valuable, and disclosing this data might result in disaster. By default, this location will not be available to the user. If you want access to this location, then you need to raise a compliance request at http://aws.amazon.com/compliance/contact/ submit the FedRAMP Package Request Form downloadable at http://cloud.cio.gov/document/fedramp-package-request-form. From these two URLs, you can understand how secured the cloud location really is. CloudTrail CloudTrail is an AWS service that performs the user activity and changes tracking. Enabling CloudTrail will log all the API request information into your S3 bucket, which you have created solely for this purpose. CloudTrail also allows you to create an SNS topic as soon as a new logfile is created by CloudTrail. CloudTrail, in hand with SNS, provides real-time user activity as messages to the user. Password This might sound funny. After looking at CloudTrail, if you feel that someone else is accessing your account, the best option is to change the password. Never let anyone look at your password, as this could easily comprise an entire account. Sharing the password is like leaving your treasury door open. Multi-Factor Authentication Until now, to access AWS through a browser, you had to log in at http://aws.amazon.com and enter your username and password. However, enabling Multi-Factor Authentication (MFA) will add another layer of security and ask you to provide an authentication code sent to the device configured with this account. In the security credential page at https://console.aws.amazon.com/iam/home?#security_credential, there is a provision to enable MFA. Clicking on Enable will display the following window: Selecting the first option A virtual MFA device will not cost you money, but this requires a smartphone (with an Android OS), and you need to download an app from the App Store. After this, during every login, you need to look at your smartphone and enter the authentication token. More information is available at https://youtu.be/MWJtuthUs0w. Access Keys (Access Key ID and Secret Access Key) In the same security credentials page, next to MFA, these access keys will be made available. AWS will not allow you to have more than two access keys. However, you can delete and create as many access keys as possible, as shown in the following screenshot: This access key ID is used while accessing the service via the API and SDK. During this time, you must provide this ID. Otherwise, you won't be able to perform any operation. To put it in other words, if someone else gets or knows this ID, they could pretend to be you through the SDK and API. In the preceding screenshot, the first key is inactive and the second key is active. The Create New Access Key button is disabled because I already have a maximum number of allowed access keys. As an added measure, I forged my actual IDs. It is a very good practice to delete a key and create a new key every month using the Delete command link and toggle the active keys every week (by making it active and inactive) by clicking on the Make Active or Make Inactive command links. Never let anyone see these IDs. If you are ever in doubt, delete the ID and create a new one. Clicking on Create New Access Key button (assuming that you have less than two IDs) will display the following window, asking you to download the new access key ID as a CSV file: The CloudFront key pairs The CloudFront key pairs are very similar to the access-key IDs. Without these keys, you will not be able to perform any operation on CloudFront. Unlike the access key ID (which has only access key ID and secret access key), here you will have a private key and a public key along with the access key ID, as shown in the following screenshot: If you lose these keys once, then you need to delete the key pair and create a new key pair. This is also an added security measure. X.509 certificates X.509 certificates are mandatory if you wish to make any SOAP requests on any AWS service. Clicking on Create new certificate will display the following window, which performs exactly the same function: Account identifiers There are two IDs that are used to identify ourselves when accessing the service via the API or SDK. These are the AWS account ID and the canonical user ID. These two IDs are unique. Just as with the preceding parameters, never share these IDs or let anyone see them. If someone has your access ID or key pair, the best option is generate a new one. But it is not possible to generate a new account ID or canonical user ID. Summary In this article, you learned the AWS Management Console and its commonly used SDKs and IDEs. You also learned how AWS secures your data. Then, you looked at the AWS plugin configuration on the Eclipse IDE. The first part made the user familiar with the AWS Management Console. After that, you explored a few of the important security aspects of AWS and learned how AWS handles it. Finally, you learned about the different AWS tools available to the programmer to make his development work easier. In the end, you examined the common SDKs and IDE tools of AWS. Resources for Article: Further resources on this subject: Amazon DynamoDB - Modelling relationships, Error handling [article] A New Way to Scale [article] Deployment and Post Deployment [article]
Read more
  • 0
  • 0
  • 23415

article-image-managing-heroku-command-line
Packt
20 Nov 2014
27 min read
Save for later

Managing Heroku from the Command Line

Packt
20 Nov 2014
27 min read
In this article by Mike Coutermarsh, author of Heroku Cookbook, we will cover the following topics: Viewing application logs Searching logs Installing add-ons Managing environment variables Enabling the maintenance page Managing releases and rolling back Running one-off tasks and dynos Managing SSH keys Sharing and collaboration Monitoring load average and memory usage (For more resources related to this topic, see here.) Heroku was built to be managed from its command-line interface. The better we learn it, the faster and more effective we will be in administering our application. The goal of this article is to get comfortable with using the CLI. We'll see that each Heroku command follows a common pattern. Once we learn a few of these commands, the rest will be relatively simple to master. In this article, we won't cover every command available in the CLI, but we will focus on the ones that we'll be using the most. As we learn each command, we will also learn a little more about what is happening behind the scenes so that we get a better understanding of how Heroku works. The more we understand, the more we'll be able to take advantage of the platform. Before we start, let's note that if we ever need to get a list of the available commands, we can run the following command: $ heroku help We can also quickly display the documentation for a single command: $ heroku help command_name Viewing application logs Logging gets a little more complex for any application that is running multiple servers and several different types of processes. Having visibility into everything that is happening within our application is critical to maintaining it. Heroku handles this by combining and sending all of our logs to one place, the Logplex. The Logplex provides us with a single location to view a stream of our logs across our entire application. In this recipe, we'll learn how to view logs via the CLI. We'll learn how to quickly get visibility into what's happening within our application. How to do it… To start, let's open up a terminal, navigate to an existing Heroku application, and perform the following steps: First, to view our applications logs, we can use the logs command: $ heroku logs2014-03-31T23:35:51.195150+00:00 app[web.1]:   Rendered pages/about.html.slim within layouts/application (25.0ms) 2014-03-31T23:35:51.215591+00:00 app[web.1]:   Rendered layouts/_navigation_links.html.erb (2.6ms)2014-03-31T23:35:51.230010+00:00 app[web.1]:   Rendered layouts/_messages.html.slim (13.0ms)2014-03-31T23:35:51.215967+00:00 app[web.1]:   Rendered layouts/_navigation.html.slim (10.3ms)2014-03-31T23:35:51.231104+00:00 app[web.1]: Completed 200 OK in 109ms (Views: 65.4ms | ActiveRecord: 0.0ms)2014-03-31T23:35:51.242960+00:00 heroku[router]: at=info method=GET path= Heroku logs anything that our application sends to STDOUT or STDERR. If we're not seeing logs, it's very likely our application is not configured correctly.  We can also watch our logs in real time. This is known as tailing: $ heroku logs --tail Instead of --tail, we can also use -t. We'll need to press Ctrl + C to end the command and stop tailing the logs. If we want to see the 100 most recent lines, we can use -n: $ heroku logs -n 100 The Logplex stores a maximum of 1500 lines. To view more lines, we'll have to set up a log storage. We can filter the logs to only show a specific process type. Here, we will only see logs from our web dynos: $ heroku logs -p web If we want, we can be as granular as showing the logs from an individual dyno. This will show only the logs from the second web dyno: $ heroku logs -p web.2 We can use this for any process type; we can try it for our workers if we'd like: $ heroku logs -p worker The Logplex contains more than just logs from our application. We can also view logs generated by Heroku or the API. Let's try changing the source to Heroku to only see the logs generated by Heroku. This will only show us logs related to the router and resource usage: $ heroku logs --source heroku To view logs for only our application, we can set the source to app: $ heroku logs --source app We can also view logs from the API. These logs will show any administrative actions we've taken, such as scaling dynos or changing configuration variables. This can be useful when multiple developers are working on an application: $ heroku logs --source api We can even combine the different flags. Let's try tailing the logs for only our web dynos: $ heroku logs -p web --tail That's it! Remember that if we ever need more information on how to view logs via the CLI, we can always use the help command: $ heroku help logs How it works Under the covers, the Heroku CLI is simply passes our request to Heroku's API and then uses Ruby to parse and display our logs. If you're interested in exactly how it works, the code is open source on GitHub at https://github.com/heroku/heroku/blob/master/lib/heroku/command/logs.rb. Viewing logs via the CLI is most useful in situations where we need to see exactly what our application is doing right now. We'll find that we use it a lot around deploys and when debugging issues. Since the Logplex has a limit of 1500 lines, it's not meant to view any historical data. For this, we'll need to set up log drains and enable a logging add-on. Searching logs Heroku does not have the built-in capability to search our logs from the command line. We can get around this limitation easily by making use of some other command-line tools. In this recipe, we will learn how to combine Heroku's logs with Grep, a command-line tool to search text. This will allow us to search our recent logs for keywords, helping us track down errors more quickly. Getting ready For this recipe, we'll need to have Grep installed. For OS X and Linux machines, it should already be installed. We can install Grep using the following steps: To check if we have Grep installed, let's open up a terminal and type the following: $ grepusage: grep [-abcDEFGHhIiJLlmnOoPqRSsUVvwxZ] [-A num] [-B num] [-C[num]]       [-e pattern] [-f file] [--binary-files=value] [--color=when]       [--context[=num]] [--directories=action] [--label] [--line-buffered]       [--null] [pattern] [file ...] If we do not see usage instructions, we can visit http://www.gnu.org/software/grep/ for the download and installation instructions. How to do it… Let's start searching our logs by opening a terminal and navigating to one of our Heroku applications using the following steps: To search for a keyword in our logs, we need to pipe our logs into Grep. This simply means that we will be passing our logs into Grep and having Grep search them for us. Let's try this now. The following command will search the output of heroku logs for the word error: $ heroku logs | grep error Sometimes, we might want to search for a longer string that includes special characters. We can do this by surrounding it with quotes: $ heroku logs | grep "path=/pages/about host" It can be useful to also see the lines surrounding the line that matched our search. We can do this as well. The next command will show us the line that contains an error as well as the three lines above and below it: $ heroku logs | grep error -C 3 We can even search with regular expressions. The next command will show us every line that matches a number that ends with MB. So, for example, lines with 100 MB, 25 MB, or 3 MB will all appear: $ heroku logs | grep 'd*MB' To learn more about regular expressions, visit http://regex.learncodethehardway.org/. How it works… Like most Unix-based tools, Grep was built to accomplish a single task and to do it well. Global regular expression print (Grep) is built to search a set of files for a pattern and then print all of the matches. Grep can also search anything it receives through standard input; this is exactly how we used it in this recipe. By piping the output of our Heroku logs into Grep, we are passing our logs to Grep as standard input. See also To learn more about Grep, visit http://www.tutorialspoint.com/unix_commands/grep.htm Installing add-ons Our application needs some additional functionality provided by an outside service. What should we do? In the past, this would have involved creating accounts, managing credentials, and, maybe, even bringing up servers and installing software. This whole process has been simplified by the Heroku add-on marketplace. For any additional functionality that our application needs, our first stop should always be Heroku add-ons. Heroku has made attaching additional resources to our application a plug-and-play process. If we need an additional database, caching, or error logging, they can be set up with a single command. In this recipe, we will learn the ins and outs of using the Heroku CLI to install and manage our application's add-ons. How to do it... To begin, let's open a terminal and navigate to one of our Heroku applications using the following steps: Let's start by taking a look at all of the available Heroku add-ons. We can do this with the addons:list command: $ heroku addons:list There are so many add-ons that viewing them through the CLI is pretty difficult. For easier navigation and search, we should take a look at https://addons.heroku.com/. If we want to see the currently installed add-ons for our application, we can simply type the following: $ heroku addons=== load-tester-rails Configured Add-onsheroku-postgresql:dev       HEROKU_POSTGRESQL_MAROONheroku-postgresql:hobby-dev HEROKU_POSTGRESQL_ONYXlibrato:developmentnewrelic:stark Remember that for any command, we can always add --app app_name to specify the application. Alternatively, our application's add-ons are also listed through the Heroku Dashboard available at https://dashboard.heroku.com. The installation of a new add-on is done with addons:add. Here, we are going to install the error logging service, Rollbar: $ heroku addons:add rollbarheroku addons:add rollbarAdding rollbar on load-tester-rails... done, v22 (free)Use `heroku addons:docs rollbar` to view documentation. We can quickly open up the documentation for an add-on with addons:docs: $ heroku addons:docs rollbar Removing an add-on is just as simple. We'll need to type our application name to confirm. For this example, our application is called load-tester-rails: $ heroku addons:remove rollbar!   WARNING: Destructive Action!   This command will affect the app: load-tester-rails!   To proceed, type "load-tester-rails" or re-run this command with --confirm load-tester-rails > load-tester-railsRemoving rollbar on load-tester-rails... done, v23 (free) Each add-on comes with different tiers of service. Let's try upgrading our rollbar add-on to the starter tier: $ heroku addons:upgrade rollbar:starterUpgrading to rollbar:starter on load-tester-rails... done, v26 ($12/mo)Plan changed to starterUse `heroku addons:docs rollbar` to view documentation. Now, if we want, we can downgrade back to its original level with addons:downgrade: $ heroku addons:downgrade rollbarDowngrading to rollbar on load-tester-rails... done, v27 (free)Plan changed to freeUse `heroku addons:docs rollbar` to view documentation. If we ever forget any of the commands, we can always use help to quickly see the documentation: $ heroku help addons Some add-ons might charge you money. Before continuing, let's double check that we only have the correct ones enabled, using the $ heroku addons command. How it works… Heroku has created a standardized process for all add-on providers to follow. This ensures a consistent experience when provisioning any add-on for our application. It starts when we request the creation of an add-on. Heroku sends an HTTP request to the provider, asking them to provision an instance of their service. The provider must then respond to Heroku with the connection details for their service in the form of environment variables. For example, if we were to provision Redis To Go, we will get back our connection details in a REDISTOGO_URL variable: REDISTOGO_URL: redis://user:pass@server.redistogo.com:9652 Heroku adds these variables to our application and restarts it. On restart, the variables are available for our application, and we can connect to the service using them. The specifics on how to connect using the variables will be in the add-ons documentation. Installation will depend on the specific language or framework we're using. See also For details on creating our own add-ons, the process is well documented on Heroku's website at https://addons.heroku.com/provider Check out Kensa, the CLI to create Heroku add-ons, at https://github.com/heroku/kensa Managing environment variables Our applications will often need access to various credentials in the form of API tokens, usernames, and passwords for integrations with third-party services. We can store this information in our Git repository, but then, anyone with access to our code will also have a copy of our production credentials. We should instead use environment variables to store any configuration information for our application. Configuration information should be separate from our application's code and instead be tied to the specific deployment of the application. Changing our application to use environment variables is simple. Let's look at an example in Ruby; let's assume that we currently have secret_api_token defined in our application's code: secret_api_token = '123abc' We can remove the token and replace it with an environment variable: secret_api_token = ENV['SECRET_TOKEN'] In addition to protecting our credentials, using environment variables makes our application more configurable. We'll be able to quickly make configuration changes without having to change code and redeploy. The terms "configuration variable" and "environment variable" are interchangeable. Heroku usually uses "configuration" due to how tightly the variables are coupled with the state of the application. How to do it... Heroku makes it easy to set our application's environment variables through the config command. Let's launch a terminal and navigate to an existing Heroku project to try it out, using the following steps: We can use the config command to see a list of all our existing environment variables: $ heroku config To view only the value of a specific variable, we can use get: $ heroku config:get DATABASE_URL To set a new variable, we can use set: $ heroku config:set VAR_NAME=var_valueSetting config vars and restarting load-tester-rails... done, v28VAR_NAME: var_value Each time we set a config variable, Heroku will restart our application. We can set multiple values at once to avoid multiple restarts: $ heroku config:set SECRET=value SECRET2=valueSetting config vars and restarting load-tester-rails... done, v29SECRET: valueSECRET2: value To delete a variable, we use unset: $ heroku config:unset SECRETUnsetting SECRET and restarting load-tester-rails... done, v30 If we want, we can delete multiple variables with a single command: $ heroku config:unset VAR_NAME SECRET2Unsetting VAR_NAME and restarting load-tester-rails... done, v31Unsetting SECRET2 and restarting load-tester-rails... done, v32 Heroku tracks each configuration change as a release. This makes it easy for us to roll back changes if we make a mistake. How it works… Environment variables are used on Unix-based operating systems to manage and share configuration information between applications. As they are so common, changing our application to use them does not lock us into deploying only to Heroku. Heroku stores all of our configuration variables in one central location. Each change to these variables is tracked, and we can view the history by looking through our past releases. When Heroku spins up a new dyno, part of the process is taking all of our configuration settings and setting them as environment variables on the dyno. This is why whenever we make a configuration change, Heroku restarts our dynos. As configuration variables are such a key part of our Heroku application, any change to them will also be included in our Heroku logs. See also Read about the Twelve-Factor app's rule on configuration at http://12factor.net/config Enabling the maintenance page Occasionally, we will need to make changes to our application that requires downtime. The proper way to do this is to put up a maintenance page that displays a friendly message and respond to all the incoming HTTP requests with a 503 Service Unavailable status. Doing this will keep our users informed and also avoid any negative SEO effects. Search engines understand that when they receive a 503 response, they should come back later to recrawl the site. If we didn't use a maintenance page and our application returned a 404 or 500 errors instead, it's possible that a search engine crawler might remove the page from their index. How to do it... Let's open up a terminal and navigate to one of our Heroku projects to begin with, using the following steps: We can view if our application's maintenance page is currently enabled with the maintenance command: $ heroku maintenanceoff Let's try turning it on. This will stop traffic from being routed to our dynos and show the maintenance page as follows: $ heroku maintenance:onEnabling maintenance mode for load-tester-rails... done Now, if we visit our application, we'll see the default Heroku maintenance page: To disable the maintenance page and resume sending users to our application, we can use the maintenance:off command: $ heroku maintenance:offDisabling maintenance mode for load-tester-rails... done Managing releases and rolling back What do we do if disaster strikes and our newly released code breaks our application? Luckily for us, Heroku keeps a copy of every deploy and configuration change to our application. This enables us to roll back to a previous version while we work to correct the errors in our latest release. Heads up! Rolling back only affects application code and configuration variables. Add-ons and our database will not be affected by a rollback. In this recipe, we will learn how to manage our releases and roll back code from the CLI. How to do it... In this recipe, we'll view and manage our releases from the Heroku CLI, using the releases command. Let's open up a terminal now and navigate to one of our Heroku projects by performing the following steps: Heroku tracks every deploy and configuration change as a release. We can view all of our releases from both the CLI and the web dashboard with the releases command: $ heroku releases=== load-tester-rails Releasesv33 Add WEB_CON config vars coutermarsh.mike@gmail.com 2014/03/30 11:18:49 (~ 5h ago)v32 Remove SEC config vars       coutermarsh.mike@gmail.com 2014/03/29 19:38:06 (~ 21h ago)v31 Remove VAR config vars     coutermarsh.mike@gmail.com 2014/03/29 19:38:05 (~ 21h ago)v30 Remove config vars       coutermarsh.mike@gmail.com 2014/03/29 19:27:05 (~ 21h ago)v29 Deploy 9218c1c vars coutermarsh.mike@gmail.com 2014/03/29 19:24:29 (~ 21h ago) Alternatively, we can view our releases through the Heroku dashboard. Visit https://dashboard.heroku.com, select one of our applications, and click on Activity: We can view detailed information about each release using the info command. This shows us everything about the change and state of the application during this release: $ heroku releases:info v33=== Release v33Addons: librato:development       newrelic:stark       rollbar:free       sendgrid:starterBy:     coutermarsh.mike@gmail.comChange: Add WEB_CONCURRENCY config varsWhen:   2014/03/30 11:18:49 (~ 6h ago)=== v33 Config VarsWEB_CONCURRENCY: 3 We can revert to the previous version of our application with the rollback command: $ heroku rollbackRolling back load-tester-rails... done, v32!   Warning: rollback affects code and config vars; it doesn't add or remove addons. To undo, run: heroku rollback v33 Rolling back creates a new version of our application in the release history. We can also specify a specific version to roll back to: $ heroku rollback v30Rolling back load-tester-rails... done, v30 The version we roll back to does not have to be an older version. Although it sounds contradictory, we can also roll back to newer versions of our application. How it works… Behind the scenes, each Heroku release is tied to a specific slug and set of configuration variables. As Heroku keeps a copy of each slug that we deploy, we're able to quickly roll back to previous versions of our code without having to rebuild our application. For each deploy release created, it will include a reference to the Git SHA that was pushed to master. The Git SHA is a reference to the last commit made to our repository before it was deployed. This is useful if we want to know exactly what code was pushed out in that release. On our local machine, we can run the $ git checkout git-sha-here command to view our application's code in the exact state it was when deployed. Running one-off tasks and dynos In more traditional hosting environments, developers will often log in to servers to perform basic administrative tasks or debug an issue. With Heroku, we can do this by launching one-off dynos. These are dynos that contain our application code but do not serve web requests. For a Ruby on Rails application, one-off dynos are often used to run database migrations or launch a Rails console. How to do it... In this recipe, we will learn how to execute commands on our Heroku applications with the heroku run command. Let's launch a terminal now to get started with the following steps: To have Heroku start a one-off dyno and execute any single command, we will use heroku run. Here, we can try it out by running a simple command to print some text to the screen: $ heroku run echo "hello heroku"Running `echo "hello heroku"` attached to terminal... up, run.7702"hello heroku" One-off dynos are automatically shut down after the command has finished running. We can see that Heroku is running this command on a dyno with our application's code. Let's run ls to see a listing of the files on the dyno. They should look familiar: $ heroku run lsRunning `ls` attached to terminal... up, run.5518app bin config config.ru db Gemfile Gemfile.lock lib log Procfile     public Rakefile README README.md tmp If we want to run multiple commands, we can start up a bash session. Type exit to close the session: $ heroku run bashRunning `bash` attached to terminal... up, run.2331~ $ lsapp bin config config.ru db Gemfile Gemfile.lock      lib log Procfile public Rakefile README README.md tmp~ $ echo "hello"hello~ $ exitexit We can run tasks in the background using the detached mode. The output of the command goes to our logs rather than the screen: $ heroku run:detached echo "hello heroku"Running `echo hello heroku` detached... up, run.4534Use `heroku logs -p run.4534` to view the output. If we need more power, we can adjust the size of the one-off dynos. This command will launch a bash session in a 2X dyno: $ heroku run --size=2X bash If we are running one-off dynos in the detached mode, we can view their status and stop them in the same way we would stop any other dyno: $ heroku ps=== run: one-off processesrun.5927 (1X): starting 2014/03/29 16:18:59 (~ 6s ago)$ heroku ps:stop run.5927 How it works… When we issue the heroku run command, Heroku spins up a new dyno with our latest slug and runs the command. Heroku does not start our application; the only command that runs is the command that we explicitly pass to it. One-off dynos act a little differently than standard dynos. If we create one dyno in the detached mode, it will run until we stop it manually, or it will shut down automatically after 24 hours. It will not restart like a normal dyno will. If we run bash from a one-off dyno, it will run until we close the connection or reach an hour of inactivity. Managing SSH keys Heroku manages access to our application's Git repository with SSH keys. When we first set up the Heroku Toolbelt, we had to upload either a new or existing public key to Heroku's servers. This key allows us to access our Heroku Git repositories without entering our password each time. If we ever want to deploy our Heroku applications from another computer, we'll either need to have the same key on that computer or provide Heroku with an additional one. It's easy enough to do this via the CLI, which we'll learn in this recipe. How to do it… To get started, let's fire up a terminal. We'll be using the keys command in this recipe by performing the following steps: First, let's view all of the existing keys in our Heroku account: $ heroku keys=== coutermarsh.mike@gmail.com Keysssh-rsa AAAAB3NzaC...46hEzt1Q== coutermarsh.mike@gmail.comssh-rsa AAAAB3NzaC...6EU7Qr3S/v coutermarsh.mike@gmail.comssh-rsa AAAAB3NzaC...bqCJkM4w== coutermarsh.mike@gmail.com To remove an existing key, we can use keys:remove. To the command, we need to pass a string that matches one of the keys: $ heroku keys:remove "7Qr3S/v coutermarsh.mike@gmail.com"Removing 7Qr3S/v coutermarsh.mike@gmail.com SSH key... done To add our current user's public key, we can use keys:add. This will look on our machine for a public key (~/.ssh/id_rsa.pub) and upload it: $ heroku keys:addFound existing public key: /Users/mike/.ssh/id_rsa.pubUploading SSH public key /Users/mike/.ssh/id_rsa.pub… done To create a new SSH key, we can run $ ssh-keygen -t rsa. If we'd like, we can also specify where the key is located if it is not in the default /.ssh/ directory: $ heroku keys:add /path/to/key.pub How it works… SSH keys are the standard method for password-less authentication. There are two parts to each SSH key. There is a private key, which stays on our machine and should never be shared, and there is a public key, which we can freely upload and share. Each key has its purpose. The public key is used to encrypt messages. The private key is used to decrypt messages. When we try to connect to our Git repositories, Heroku's server uses our public key to create an encrypted message that can only be decrypted by our private key. The server then sends the message to our machine; our machine's SSH client decrypts it and sends the response to the server. Sending the correct response successfully authenticates us. SSH keys are not used for authentication to the Heroku CLI. The CLI uses an authentication token that is stored in our ~/.netrc file. Sharing and collaboration We can invite collaborators through both the web dashboard and the CLI. In this recipe, we'll learn how to quickly invite collaborators through the CLI. How to do it… To start, let's open a terminal and navigate to the Heroku application that we would like to share, using the following steps: To see the current users who have access to our application, we can use the sharing command: $ heroku sharing=== load-tester-rails Access Listcoutermarsh.mike@gmail.com ownermike@form26.com             collaborator To invite a collaborator, we can use sharing:add: $ heroku sharing:add coutermarshmike@gmail.com Adding coutermarshmike@gmail.com to load-tester-rails as collaborator... done Heroku will send an e-mail to the user we're inviting, even if they do not already have a Heroku account. If we'd like to revoke access to our application, we can do so with sharing:remove:$ heroku sharing:remove coutermarshmike@gmail.comRemoving coutermarshmike@gmail.com from load-tester-rails collaborators... done How it works… When we add another collaborator to our Heroku application, they are granted the same abilities as us, except that they cannot manage paid add-ons or delete the application. Otherwise, they have full control to administrate the application. If they have an existing Heroku account, their SSH key will be immediately added to the application's Git repository. See also Interested in using multiple Heroku accounts on a single machine? Take a look at the Heroku-accounts plugin at https://github.com/ddollar/heroku-accounts. Monitoring load average and memory usage We can monitor the resource usage of our dynos from the command line using the log-runtime-metrics plugin. This will give us visibility into the CPU and memory usage of our dynos. With this data, we'll be able to determine if our dynos are correctly sized, detect problems earlier, and determine whether we need to scale our application. How to do it… Let's open up a terminal; we'll be completing this recipe with the CLI by performing the following steps: First, we'll need to install the log-runtime-metrics plugin via the CLI. We can do this easily through heroku labs: $ heroku labs:enable log-runtime-metrics Now that the runtime metrics plugin is installed, we'll need to restart our dynos for it to take effect: $ heroku restart Now that the plugin is installed and running, our dynos' resource usage will be printed to our logs. Let's view them now: $ heroku logsheroku[web.1]: source=web.1 dyno=heroku.21 sample#load_avg_1m=0.00 sample#load_avg_5m=0.00heroku[web.1]: source=web.1 dyno=heroku.21sample#memory_total=105.28MB sample#memory_rss=105.28MBsample#memory_cache=0.00MBsample#memory_swap=0.00MBsample#memory_pgpgin=31927pagessample#memory_pgpgout=4975pages From the logs, we can see that for this application, our load average is 0, and this dyno is using a total of 105 MB of RAM. How it works… Now that we have some insight into how our dynos are using resources, we need to learn how to interpret these numbers. Understanding the utilization of our dynos will be key for us if we ever need to diagnose a performance-related issue. In our logs, we will now see load_avg_1m and load_avg_5m. This is our dynos' load average over a 1-minute and 5-minute period. The timeframes are helpful in determining whether we're experiencing a brief spike in activity or it is more sustained. Load average is the amount of total computational work that the CPU has to complete. The 1X and 2X dynos have access to four virtual cores. A load average of four means that the dynos' CPU is fully utilized. Any value above four is a warning sign that the dyno might be overloaded, and response times could begin to suffer. Web applications are typically not CPU-intensive applications, seeing low load averages for web dynos should be expected. If we start seeing high load averages, we should consider either adding more dynos or using larger dynos to handle the load. Our memory usage is also shown in the logs. The key value that we want to keep track of is memory_rrs, which is the total amount of RAM being utilized by our application. It's best to keep this value no higher than 50 to 70 percent of the total RAM available on the dyno. For a 1X dyno with 512 MB of memory, this would mean keeping our memory usage no greater than 250 to 350 MB. This allows our application's room to grow under load and helps us avoid any memory swapping. Seeing values above 70 percent is an indication that we need to either adjust our application's memory usage or scale up. Memory swap occurs when our dyno runs out of RAM. To compensate, our dyno will begin using its hard drive to store data that will normally be stored in RAM. For any web application, any swap should be considered evil. This value should always be zero. If our dyno starts swapping, we can expect that it will significantly slow down our application's response times. Seeing any swap is an immediate indication that we must either reduce our application's memory consumption or start scaling. See also Load average and memory usage are particularly useful when performing application load tests. Summary In this article, we learned various commands on how to view application logs, installing add-ons, viewing application logs, enabling the maintenance page, managing SSH keys, sharing and collaboration, and so on. Resources for Article: Further resources on this subject: Securing vCloud Using the vCloud Networking and Security App Firewall [article] vCloud Networks [article] Apache CloudStack Architecture [article]
Read more
  • 0
  • 0
  • 33833
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-using-leap-motion-controller-arduino
Packt
19 Nov 2014
18 min read
Save for later

Using the Leap Motion Controller with Arduino

Packt
19 Nov 2014
18 min read
This article by Brandon Sanders, the author of the book Mastering Leap Motion, focuses on what he specializes in—hardware. While normal applications are all fine and good, he finds it much more gratifying if a program he writes has an impact in the physical world. (For more resources related to this topic, see here.) One of the most popular hobbyist hardware solutions, as I'm sure you know, is the Arduino. This cute little blue board from Italy brought the power of micro controllers to the masses. Throughout this article, we're going to work on integrating a Leap Motion Controller with an Arduino board via a simplistic program; the end goal is to make the built-in LED on an Arduino board blink either slower or faster depending on how far a user's hand is away from the Leap. While this is a relatively simple task, it's a great way to demonstrate how you can connect something like the Leap to an external piece of hardware. From there, it's only a hop, skip, and jump to control robots and other cool things with the Leap! This project will follow the client-server model of programming: we'll be writing a simple Java server which will be run from a computer, and a C++ client which will run on an Arduino board connected to the computer. The server will be responsible for retrieving Leap Motion input and sending it to the client, while the client will be responsible for making an LED blink based on data received from the server. Before we begin, I'd like to note that you can download the completed (and working) project from GitHub at https://github.com/Mizumi/Mastering-Leap-Motion-Chapter-9-Project-Leapduino. A few things you'll need Before you begin working on this tutorial, there are a few things you're going to need: A computer (for obvious reasons). A Leap Motion Controller. An Arduino of some kind. This tutorial is based around the Uno model, but other similar models like the Mega should work just as well. A USB cable to connect your Arduino to your computer. Optionally, the Eclipse IDE (this tutorial will assume you're using Eclipse for the sake of readability and instruction). Setting up the environment First off, you're going to need a copy of the Leap Motion SDK so that you can add the requisite library jar files and DLLs to the project. If you don't already have it, you can get a copy of the SDK from https://www.developer.leapmotion.com/downloads/. Next, you're going to need the Java Simple Serial Connector (JSSC) library and the Arduino IDE. You can download the library JAR file for JSSC from GitHub at https://github.com/scream3r/java-simple-serial-connector/releases. Once the download completes, extract the JAR file from the downloaded ZIP folder and store it somewhere safe; you'll need it later on in this tutorial. You can then proceed to download the Arduino IDE from their official website at http://arduino.cc/en/Main/Software. If you're on Windows, you will be able to download a Windows installer file which will automagically install the entire IDE on to your computer. On the other hand, Mac and Linux users will need to instead download .zip or .tgz files and then extract them manually, running the executable binary from the extracted folder contents. Setting up the project To set up our project, perform the following steps: The first thing we're going to do is create a new Java project. This can be easily achieved by opening up Eclipse (to reiterate for the third time, this tutorial will assume you're using Eclipse) and heading over to File -> New -> Java Project. You will then be greeted by a project creation wizard, where you'll be prompted to choose a name for the project (I used Leapduino). Click on the Finish button when you're done. My current development environment is based around the Eclipse IDE for Java Developers, which can be found at http://www.eclipse.org/downloads. The instructions that follow will use Eclipse nomenclature and jargon, but they will still be usable if you're using something else (like NetBeans). Once the project is created, navigate to it in the Package Explorer window. You'll want to go ahead and perform the following actions: Create a new package for the project by right-clicking on the src folder for your project in the Package Explorer and then navigating to New | Package in the resulting tooltip. You can name it whatever you like; I personally called mine com.mechakana.tutorials. You'll now want to add three files to our newly-created package: Leapduino.java, LeapduinoListener.java, and RS232Protocol.java. To create a new file, simply right-click on the package and then navigate to New | Class. Create a new folder in your project by right-clicking on the project name in the Package Explorer and then navigating to New | Folder in the resulting tooltip. For the purposes of this tutorial, please name it Leapduino. Now add one file to your newly created folder: Leapduino.ino. This file will contain all of the code that we're going to upload to the Arduino. With all of our files created, we need to add the libraries to the project. Go ahead and create a new folder at the root directory of your project, called lib. Within the lib folder, you'll want to place the jssc.jar file that you downloaded earlier, along with the LeapJava.jar file from the Leap Motion SDK. Then, you will want to add the appropriate Leap.dll and LeapJava.dll files for your platform to the root of your project. Finally, you'll need to modify your Java build path to link the LeapJava.jar and jssc.jar files to your project. This can be achieved by right-clicking on your project in the Package Explorer (within Eclipse) and navigating to Build Path… | Configure Build Path…. From there, go to the Libraries tab and click on Add JARs…, selecting the two aforementioned JAR files (LeapJava.jar and jssc.jar). When you're done, your project should look similar to the following screenshot: And you're done; now to write some code! Writing the Java side of things With everything set up and ready to go, we can start writing some code. First off, we're going to write the RS232Protocol class, which will allow our application to communicate with any Arduino board connected to the computer via a serial (RS-232) connection. This is where the JSSC library will come into play, allowing us to quickly and easily write code that would otherwise be quite lengthy (and not fun). Fun fact RS-232 is a standard for serial communications and transmission of data. There was a time when it was a common feature on a personal computer, used for modems, printers, mice, hard drives, and so on. With time, though, the Universal Serial Bus (USB) technology replaced RS-232 for many of those roles. Despite this, today's industrial machines, scientific equipment and (of course) robots still make heavy usage of this protocol due to its light weight and ease of use; the Arduino is no exception! Go ahead and open up the RS232Protocol.java file which we created earlier, and enter the following: package com.mechakana.tutorials; import jssc.SerialPort; import jssc.SerialPortEvent; import jssc.SerialPortEventListener; import jssc.SerialPortException; public class RS232Protocol { //Serial port we're manipulating. private SerialPort port; //Class: RS232Listener public class RS232Listener implements SerialPortEventListener {    public void serialEvent(SerialPortEvent event)    {      //Check if data is available.      if (event.isRXCHAR() && event.getEventValue() > 0)      {        try        {          int bytesCount = event.getEventValue();          System.out.print(port.readString(bytesCount));        }                 catch (SerialPortException e) { e.printStackTrace(); }      }    } } //Member Function: connect public void connect(String newAddress) {    try    {      //Set up a connection.      port = new SerialPort(newAddress);         //Open the new port and set its parameters.      port.openPort();      port.setParams(38400, 8, 1, 0);               //Attach our event listener.      port.addEventListener(new RS232Listener());    }     catch (SerialPortException e) { e.printStackTrace(); } } //Member Function: disconnect public void disconnect() {    try { port.closePort(); }     catch (SerialPortException e) { e.printStackTrace(); } } //Member Function: write public void write(String text) {    try { port.writeBytes(text.getBytes()); }     catch (SerialPortException e) { e.printStackTrace(); } } } All in all, RS232Protocol is a simple class—there really isn't a whole lot to talk about here! However, I'd love to point your attention to one interesting part of the class: public class RS232Listener implements SerialPortEventListener { public void serialEvent(SerialPortEvent event) { /*code*/ } } You might have found it rather odd that we didn't create a function for reading from the serial port—we only created a function for writing to it. This is because we've opted to utilize an event listener, the nested RS232Listener class. Under normal operating conditions, this class's serialEvent function will be called and executed every single time new information is received from the port. When this happens, the function will print all of the incoming data out to the user's screen. Isn't that nifty? Moving on, our next class is a familiar one—LeapduinoListener, a simple Listener implementation. This class represents the meat of our program, receiving Leap Motion tracking data and then sending it over our serial port to the connected Arduino. Go ahead and open up LeapduinoListener.java and enter the following code: package com.mechakana.tutorials; import com.leapmotion.leap.*; public class LeapduinoListener extends Listener {   //Serial port that we'll be using to communicate with the Arduino. private RS232Protocol serial; //Constructor public LeapduinoListener(RS232Protocol serial) {    this.serial = serial; } //Member Function: onInit public void onInit(Controller controller) {    System.out.println("Initialized"); } //Member Function: onConnect public void onConnect(Controller controller) {    System.out.println("Connected"); } //Member Function: onDisconnect public void onDisconnect(Controller controller) {    System.out.println("Disconnected"); } //Member Function: onExit public void onExit(Controller controller) {    System.out.println("Exited"); } //Member Function: onFrame public void onFrame(Controller controller) {    //Get the most recent frame.    Frame frame = controller.frame();    //Verify a hand is in view.    if (frame.hands().count() > 0)    {      //Get some hand tracking data.      int hand = (int) (frame.hands().frontmost().palmPosition().getY());      //Send the hand pitch to the Arduino.      serial.write(String.valueOf(hand));      //Give the Arduino some time to process our data.      try { Thread.sleep(30); }      catch (InterruptedException e) { e.printStackTrace(); }    } } } In this class, we've got the basic Leap Motion API onInit, onConnect, onDisconnect, onExit, and onFrame functions. Our onFrame function is fairly straightforward: we get the most recent frame, verify a hand is within view, retrieve its y axis coordinates (height from the Leap Motion Controller) and then send it off to the Arduino via our instance of the RS232Protocol class (which gets assigned during initialization). The remaining functions simply print text out to the console telling us when the Leap has initialized, connected, disconnected, and exited (respectively). And now, for our final class on the Java side of things: Leapduino! This class is a super basic main class that simply initializes the RS232Protocol class and the LeapduinoListener—that's it! Without further ado, go on ahead and open up Leapduino.java and enter the following code: package com.mechakana.tutorials; import com.leapmotion.leap.Controller; public class Leapduino { //Main public static final void main(String args[]) {      //Initialize serial communications.    RS232Protocol serial = new RS232Protocol();    serial.connect("COM4");    //Initialize the Leapduino listener.    LeapduinoListener leap = new LeapduinoListener(serial);    Controller controller = new Controller();    controller.addListener(leap); } } Like all of the classes so far, there isn't a whole lot to say here. That said, there is one line that you must absolutely be aware of, since it can change depending on how you're Arduino is connected: serial.connect("COM4"); Depending on which port Windows chose for your Arduino when it connected to your computer (more on that next), you will need to modify the COM4 value in the above line to match the port your Arduino is on. Examples of values you'll probable use are COM3, COM4, and COM5. And with that, the Java side of things is complete. If you run this project right now, most likely all you'll see will be two lines of output: Initialized and Connected. If you want to see anything else happen, you'll need to move on to the next section and get the Arduino side of things working. Writing the Arduino side of things With our Java coding done, it's time to write some good-old C++ for the Arduino. If you were able to use the Windows installer for Arduino, simply navigate to the Leapduino.ino file in your Eclipse project explorer and double click on it. If you had to extract the entire Arduino IDE and store it somewhere instead of running a simple Windows installer, navigate to it and launch the Arduino.exe file. From there, select File | Open, navigate to the Leapduino.ino file on your computer and double click on it. You will now be presented with a screen similar to the one here: This is the wonderful Arduino IDE—a minimalistic and straightforward text editor and compiler for the Arduino microcontrollers. On the top left of the IDE, you'll find two circular buttons: the check mark verifies (compiles) your code to make sure it works, and the arrow deploys your code to the Arduino board connected to your computer. On the bottom of the IDE, you'll find the compiler output console (the black box), and on the very bottom right you'll see a line of text telling you which Arduino model is connected to your computer, and on what port (I have an Arduino Uno on COM4 in the preceding screenshot). As is typical for many IDEs and text editors, the big white area in the middle is where your code will go. So without further ado, let's get started with writing some code! Input all of the text shown here into the Arduino IDE: //Most Arduino boards have an LED pre-wired to pin 13. int led = 13; //Current LED state. LOW is off and HIGH is on. int ledState = LOW; //Blink rate in milliseconds. long blinkRate = 500; //Last time the LED was updated. long previousTime = 0; //Function: setup void setup() { //Initialize the built-in LED (assuming the Arduino board has one) pinMode(led, OUTPUT); //Start a serial connection at a baud rate of 38,400. Serial.begin(38400); } //Function: loop void loop() { //Get the current system time in milliseconds. unsigned long currentTime = millis(); //Check if it's time to toggle the LED on or off. if (currentTime - previousTime >= blinkRate) {    previousTime = currentTime;       if (ledState == LOW) ledState = HIGH;    else ledState = LOW;       digitalWrite(led, ledState); } //Check if there is serial data available. if (Serial.available()) {    //Wait for all data to arrive.    delay(20);       //Our data.    String data = "";       //Iterate over all of the available data and compound it into      a string.    while (Serial.available())      data += (char) (Serial.read());       //Set the blink rate based on our newly-read data.    blinkRate = abs(data.toInt() * 2);       //A blink rate lower than 30 milliseconds won't really be      perceptable by a human.    if (blinkRate < 30) blinkRate = 30;       //Echo the data.    Serial.println("Leapduino Client Received:");    Serial.println("Raw Leap Data: " + data + " | Blink Rate (MS):      " + blinkRate); } } Now, let's go over the contents. The first few lines are basic global variables, which we'll be using throughout the program (the comments do a good job of describing them, so we won't go into much detail here). The first function, setup, is an Arduino's equivalent of a constructor; it's called only once, when the Arduino is first turned on. Within the setup function, we initialize the built-in LED (most Arduino boards have an LED pre-wired to pin 13) on the board. We then initialize serial communications at a baud rate of 38,400 bits per second—this will allow our board to communicate with the computer later on. Fun fact The baud rate (abbreviated as Bd in some diagrams) is the unit for symbol rate or modulation rate in symbols or pulses per second. Simply put, on serial ports, the baud rate controls how many bits a serial port can send per second—the higher the number, the faster a serial port can communicate. The question is, why don't we set a ridiculously high rate? Well, the higher you go with the baud rate, the more likely it is for there to be data loss—and we all know data loss just isn't good. For many applications, though, a baud rate of 9,600 to 38,400 bits per second is sufficient. Moving on to the second function, loop is the main function in any Arduino program, which is repeatedly called while the Arduino is turned on. Due to this functionality, many programs will treat any code within this function as if it were inside a while (true) loop. In loop, we start off by getting the current system time (in milliseconds) and then comparing it to our ideal blink rate for the LED. If the time elapsed since our last blink exceeds the ideal blink rate, we'll go ahead and toggle the LED on or off accordingly. We then proceed to check if any data has been received over the serial port. If it has, we'll proceed to wait for a brief period of time, 20 milliseconds, to make sure all data has been received. At that point, our code will proceed to read in all of the data, parse it for an integer (which will be our new blink rate), and then echo the data back out to the serial port for diagnostics purposes. As you can see, an Arduino program (or sketch, as they are formally known) is quite simple. Why don't we test it out? Deploying and testing the application With all of the code written, it's time to deploy the Arduino side of things to the, well, Arduino. The first step is to simply open up your Leapduino.ino file in the Arduino IDE. Once that's done, navigate to Tools | Board and select the appropriate option for your Arduino board. In my case, it's an Arduino Uno. At this point, you'll want to verify that you have an Arduino connected to your computer via a USB cable—after all, we can't deploy to thin air! At this point, once everything is ready, simply hit the Deploy button in the top-left of the IDE, as seen here: If all goes well, you'll see the following output in the console after 15 or so seconds: And with that, your Arduino is ready to go! How about we test it out? Keeping your Arduino plugged into your computer, go on over to Eclipse and run the project we just made. Once it's running, try moving your hand up and down over your Leap Motion controller; if all goes well, you'll see the following output from within the console in Eclipse: All of that data is coming directly from the Arduino, not your Java program; isn't that cool? Now, take a look at your Arduino while you're doing this; you should notice that the built-in LED (circled in the following image, labelled L on the board itself) will begin to blink slower or faster depending on how close your hand gets to the Leap. Circled in red: the built-in L LED on an Arduino Uno, wired to pin 13 by default. With this, you've created a simple Leap Motion application for use with an Arduino. From here, you could go on to make an Arduino-controlled robotic arm driven by coordinates from the Leap, or maybe an interactive light show. The possibilities are endless, and this is just the (albeit extremely, extremely simple) tip of the iceberg. Summary In this article, you had a lengthy look at some things you can do with the Leap Motion Controller and hardware such as Arduino. If you have any questions, I encourage you to contact me directly at brandon@mechakana.com. You can also visit my website, http://www.mechakana.com, for more technological goodies and tutorials. Resources for Article: Further resources on this subject: Major SDK components [Article] 2D Twin-stick Shooter [Article] What's Your Input? [Article]
Read more
  • 0
  • 0
  • 36076

article-image-function-passing
Packt
19 Nov 2014
6 min read
Save for later

Function passing

Packt
19 Nov 2014
6 min read
In this article by Simon Timms, the author of the book, Mastering JavaScript Design Patterns, we will cover function passing. In functional programming languages, functions are first-class citizens. Functions can be assigned to variables and passed around just like you would with any other variable. This is not entirely a foreign concept. Even languages such as C had function pointers that could be treated just like other variables. C# has delegates and, in more recent versions, lambdas. The latest release of Java has also added support for lambdas, as they have proven to be so useful. (For more resources related to this topic, see here.) JavaScript allows for functions to be treated as variables and even as objects and strings. In this way, JavaScript is functional in nature. Because of JavaScript's single-threaded nature, callbacks are a common convention and you can find them pretty much everywhere. Consider calling a function at a later date on a web page. This is done by setting a timeout on the window object as follows: setTimeout(function(){alert("Hello from the past")}, 5 * 1000); The arguments for the set timeout function are a function to call and a time to delay in milliseconds. No matter the JavaScript environment in which you're working, it is almost impossible to avoid functions in the shape of callbacks. The asynchronous processing model of Node.js is highly dependent on being able to call a function and pass in something to be completed at a later date. Making calls to external resources in a browser is also dependent on a callback to notify the caller that some asynchronous operation has completed. In basic JavaScript, this looks like the following code: var xmlhttp = new XMLHttpRequest()xmlhttp.onreadystatechange=function()if (xmlhttp.readyState==4 &&xmlhttp.status==200){//process returned data}};xmlhttp.open("GET", http://some.external.resource, true); xmlhttp.send(); You may notice that we assign onreadystatechange before we even send the request. This is because assigning it later may result in a race condition in which the server responds before the function is attached to the ready state change. In this case, we've used an inline function to process the returned data. Because functions are first class citizens, we can change this to look like the following code: var xmlhttp;function requestData(){xmlhttp = new XMLHttpRequest()xmlhttp.onreadystatechange=processData;xmlhttp.open("GET", http://some.external.resource, true); xmlhttp.send();}function processData(){if (xmlhttp.readyState==4 &&xmlhttp.status==200){   //process returned data}} This is typically a cleaner approach and avoids performing complex processing in line with another function. However, you might be more familiar with the jQuery version of this, which looks something like this: $.getJSON('http://some.external.resource', function(json){//process returned data}); In this case, the boiler plate of dealing with ready state changes is handled for you. There is even convenience provided for you should the request for data fail with the following code: $.ajax('http://some.external.resource',{ success: function(json){   //process returned data},error: function(){   //process failure},dataType: "json"}); In this case, we've passed an object into the ajax call, which defines a number of properties. Amongst these properties are function callbacks for success and failure. This method of passing numerous functions into another suggests a great way of providing expansion points for classes. Likely, you've seen this pattern in use before without even realizing it. Passing functions into constructors as part of an options object is a commonly used approach to providing extension hooks in JavaScript libraries. Implementation In Westeros, the tourism industry is almost nonextant. There are great difficulties with bandits killing tourists and tourists becoming entangled in regional conflicts. Nonetheless, some enterprising folks have started to advertise a grand tour of Westeros in which they will take those with the means on a tour of all the major attractions. From King's Landing to Eyrie, to the great mountains of Dorne, the tour will cover it all. In fact, a rather mathematically inclined member of the tourism board has taken to calling it a Hamiltonian tour, as it visits everywhere once. The HamiltonianTour class provides an options object that allows the definition of an options object. This object contains the various places to which a callback can be attached. In our case, the interface for it would look something like the following code: export class HamiltonianTourOptions{onTourStart: Function;onEntryToAttraction: Function;onExitFromAttraction: Function;onTourCompletion: Function;} The full HamiltonianTour class looks like the following code: var HamiltonianTour = (function () {function HamiltonianTour(options) {   this.options = options;}HamiltonianTour.prototype.StartTour = function () {   if (this.options.onTourStart&&typeof (this.options.onTourStart)    === "function")   this.options.onTourStart();   this.VisitAttraction("King's Landing");   this.VisitAttraction("Winterfell");   this.VisitAttraction("Mountains of Dorne");   this.VisitAttraction("Eyrie");   if (this.options.onTourCompletion&&typeof    (this.options.onTourCompletion) === "function")   this.options.onTourCompletion();}; HamiltonianTour.prototype.VisitAttraction = function (AttractionName) {   if (this.options.onEntryToAttraction&&typeof    (this.options.onEntryToAttraction) === "function")   this.options.onEntryToAttraction(AttractionName);    //do whatever one does in a Attraction   if (this.options.onExitFromAttraction&&typeof    (this.options.onExitFromAttraction) === "function")   this.options.onExitFromAttraction(AttractionName);};return HamiltonianTour;})(); You can see in the highlighted code how we check the options and then execute the callback as needed. This can be done by simply using the following code: var tour = new HamiltonianTour({onEntryToAttraction: function(cityname){console.log("I'm delighted to be in " + cityname)}});tour.StartTour(); The output of the preceding code will be: I'm delighted to be in King's LandingI'm delighted to be in WinterfellI'm delighted to be in Mountains of DorneI'm delighted to be in Eyrie Summary In this article, we have learned about function passing. Passing functions is a great approach to solving a number of problems in JavaScript and tends to be used extensively by libraries such as jQuery and frameworks such as Express. It is so commonly adopted that using it provides to added barriers no your code's readability. Resources for Article: Further resources on this subject: Creating Java EE Applications [article] Meteor.js JavaScript Framework: Why Meteor Rocks! [article] Dart with JavaScript [article]
Read more
  • 0
  • 0
  • 11069

Packt
19 Nov 2014
25 min read
Save for later

A Peek Under the Hood – Facts, Types, and Providers

Packt
19 Nov 2014
25 min read
This article is written by Felix Frank, the author of Puppet Essentials. Before you are introduced to the missing language concepts that you will need to use Puppet effectively for bigger projects, there is some background that we should cover first. Don't worry, it won't be all dry theory—most of the important parts of Puppet are relevant to your daily business. (For more resources related to this topic, see here.) These elementary topics will be thoroughly explored in the following sections: Summarizing systems with Facter Understanding the type system Substantiating the model with providers Putting it all together Summarizing systems with Facter Configuration management is quite a dynamic problem. In other words, the systems that need configuration are mostly moving targets. In some situations, system administrators or operators get lucky and work with large quantities of 100 percent uniform hardware and software. In most cases, however, mostly, the landscape of servers and other computing nodes is rather heterogeneous, at least in subtle ways. Even in unified networks, there are likely multiple generations of machines, with small or larger differences required for their respective configurations. For example, a common task for Puppet is to handle the configuration of system monitoring. Your business logic will likely dictate warning thresholds for gauges such as the system load value. However, those thresholds can rarely be static. On a two-processor virtual machine, a system load of 10 represents a crippling overload, while the same value can be absolutely acceptable for a busy DBMS server that has cutting-edge hardware of the largest dimensions. Another important factor can be software platforms. Your infrastructure might span multiple distributions of Linux, or alternate operating systems such as BSD, Solaris, or Windows, each with different ways of handling certain scenarios. Imagine, for example, that you want Puppet to manage some content of the fstab file. On your rare Solaris system, you would have to make sure that Puppet targets the /etc/vfstab file instead of /etc/fstab. It is usually not a good idea to interact with the fstab file in your manifest directly. This example will be rounded off in the section concerning providers. Puppet strives to present you with a unified way of managing all of your infrastructure. It obviously needs a means to allow your manifests to adapt to different kinds of circumstances on the agent machines. This includes their operating system, hardware layout, and many other details. Keep in mind that generally, the manifests have to be compiled on the master machine. There are several conceivable ways to implement a solution for this particular problem. A direct approach would be a language construct that allows the master to send a piece of shell script (or other code) to the agent and receive its output in return. The following is pseudocode however; there are no backtick expressions in the Puppet DSL: if `grep -c ^processor /proc/cpuinfo` > 2 {   $load_warning = 4}else {   $load_warning = 2} This solution would be powerful but expensive. The master would need to call back to the agent whenever the compilation process encounters such an expression. Writing manifests that were able to cope if such a command had returned an error code would be strenuous, and Puppet would likely end up resembling a quirky scripting engine. When using puppet apply instead of the master, such a feature would pose less of a problem, and it is indeed available in the form of the generate function, which works just like the backticks in the pseudocode mentioned previously. The commands are always run on the compiling node though, so this is less useful with an agent/master than apply. Puppet uses a different approach. It relies on a secondary system called Facter, which has the sole purpose of examining the machine on which it is run. It serves a list of well-known variable names and values, all according to the system on which it runs. For example, an actual Puppet manifest that needs to form a condition upon the number of processors on the agent will use this expression: if $processorcount > 4 { … } Facter's variables are called facts, and processorcount is such a fact. The fact values are gathered by the agent just before it requests its catalog from the master. All fact names and values are transferred to the master as part of the request. They are available in the manifest as variables. Facts are available to manifests that are used with puppet apply too, of course. You can test this very simply: puppet apply -e 'notify { "I am $fqdn and have $processorcount CPUs": }' Accessing and using fact values You have already seen an example use of the processorcount fact. In the manifest, each fact value is available as a global variable value. That is why you can just use the $processorcount expression where you need it. You will often see conventional uses such as $::processorcount or $::ipaddress. Prefixing the fact name with double colons was a good idea in older Puppet versions before 3.0. The official Style Guide at https://docs.puppetlabs.com/guides/style_guide.html#namespacing-variables is outdated in this regard and still recommends this. The prefix is no longer necessary. Some helpful facts have already been mentioned. The processorcount fact might play a role for your configuration. When configuring some services, you will want to use the machine's ipaddress value in a configuration file or as an argument value: file {   '/etc/mysql/conf.d/bind-address':       ensure => 'file',       mode  => '644',       content => "[mysqld]\nbind-address=$ipaddress\n",} Besides the hostname, your manifest can also make use of the fully qualified domain name (FQDN) of the agent machine. The agent will use the value of its fqdn fact as the name of its certificate (clientcert) by default. The master receives both these values. Note that the agent can override the fqdn value to any name, whereas the clientcert value is tied to the signed certificate that the agent uses. Sometimes, you will want the master to pass sensitive information to individual nodes. The manifest must identify the agent by its clientcert fact and never use fqdn or hostname instead, for the reason mentioned. An examples is shown in the following code: file {   '/etc/my-secret':      ensure => 'file',       mode   => '600',       owner => 'root',       source =>           "puppet:///modules/secrets/$clientcert/key",} There is a whole group of facts to describe the operating system. Each fact is useful in different situations. The operatingsystem fact takes values such as Debian or CentOS: if $operatingsystem != 'Ubuntu' {   package {       'avahi-daemon':           ensure => absent   }} If your manifest will behave identical for RHEL, CentOS, and Fedora (but not on Debian and Ubuntu), you will make use of the osfamily fact instead: if $osfamily == 'RedHat' {   $kernel_package = 'kernel'} The operatingsystemrelease fact allows you to tailor your manifests to differences between the versions of your OS: if $operatingsystem == 'Debian' {   if versioncmp($operatingsystemrelease, '7.0') >= 0 {       $ssh_ecdsa_support = true   }} Facts such as macaddress, the different SSH host keys, fingerprints, and others make it easy to use Puppet for keeping inventory of your hardware. There is a slew of other useful facts. Of course, the collection will not suit every possible need of every user out there. That is why Facter comes readily extendible. Extending Facter with custom facts Technically, nothing is stopping you from adding your own fact code right next to the core facts, either by maintaining your own Facter package, or even deploying the Ruby code files to your agents directly through Puppet management. However, Puppet offers a much more convenient alternative in the form of custom facts. For now, just create a Ruby file at /etc/puppet/modules/hello_world/lib/facter/hello.rb on the master machine. Puppet will recognize this as a custom fact of the name hello. The inner workings of Facter are very straightforward and goal oriented. There is one block of Ruby code for each fact, and the return value of the block becomes the fact value. Many facts are self-sufficient, but others will rely on the values of one or more basic facts. For example, the method for determining the IP address(es) of the local machine is highly dependent upon the operating system. The hello fact is very simple though: Facter.add(:hello) dosetcode { "Hello, world!" }end The return value of the setcode block is the string, Hello, world!, and you can use this fact as $hello in a Puppet manifest. Before Facter Version 2.0, each fact had a string value. If a code block returns another value, such as an array or hash, Facter 1.x will convert it to a string. The result is not useful in many cases. For this historic reason, there are facts such as ipaddress_eth0 and ipaddress_lo, instead of (or in addition to) a proper hash structure with interface names and addresses. It is important for the pluginsync option to be enabled on the agent side. This has been the default for a long time and should not require any customization. The agent will synchronize all custom facts whenever checking in to the master. They are permanently available on the agent machine after that. You can then retrieve the hello fact from the command line using facter -p hello. By just invoking facter without an argument, you request a list of all fact names and values. When testing your custom facts from the command line, you need to invoke facter with the -p or --puppet option. Puppet itself will always include the custom facts. This article will not cover all aspects of Facter's API, but there is one facility that is quite essential. Many of your custom facts will only be useful on Unix-like systems, and others will only be useful on your Windows boxen. You can retrieve such values using a construct like the following: if Facter.value(:kernel) != "windows"nilelse# actual fact code hereend This would be quite tedious and repetitive though. Instead, you can invoke the confine method within the Facter.add(name) { … } block: Facter.add(:msvs_version) doconfine :kernel => :windowssetcode do   # …endend You can confine a fact to several alternative values as well: confine :kernel => [ :linux, :sunos ] Finally, if a fact does make sense in different circumstances, but requires drastically different code in each respective case, you can add the same fact several times, each with a different set of confine values. Core facts such as ipaddress use this often: Facter.add(:ipaddress) doconfine :kernel => :linux…endFacter.add(:ipaddress) doconfine :kernel => %w{FreeBSD OpenBSD Darwin DragonFly}…end… You can confine facts based on any combination of other facts, not just kernel. It is a very popular choice, though. The operatingsystem or osfamily fact can be more appropriate in certain situations. Technically, you can even confine some of your facts to certain processorcount values and so forth. Simplifying things using external facts If writing and maintaining Ruby code is not desirable in your team for any reason, you might prefer to use an alternative that allows shell scripts, or really any kind of programming language, or even static data with no programming involved at all. Facter allows this in the form of external facts. Creating an external fact is similar to the process for regular custom facts, with the following distinctions: Facts are produced by standalone executables or files with static data, which the agent must find in /etc/facter/facts.d The data is not just a string value, but an arbitrary number of key=value pairs instead The data need not use the ini file notation style—the key/value pairs can also be in the YAML or JSON format. The following external facts hold the same data: # site-facts.txtworkgroup=CT4Site2domain_psk=nm56DxLp% The facts can be written in the YAML format in the following way: # site-facts.yamlworkgroup: CT4Site2domain_psk: nm56DxLp% In the JSON format, facts can be written as follows: # site-facts.json{ 'workgroup': 'CT4Site2', 'domain_psk': 'nm56DxLp%' } The deployment of the external facts works simply through file resources in your Puppet manifest: file {   '/etc/facter/facts.d/site-facts.yaml':       ensure => 'file',       source => 'puppet:///…',} With newer versions of Puppet and Facter, external facts will be automatically synchronized just like custom fact, if they are found in facts.d/* in any module (for example, /etc/puppet/modules/hello_world/facts.d/hello.sh). This is not only more convenient, but has a large benefit: when Puppet must fetch an external fact through a file resource instead, its fact value(s) are not yet available while the catalog is being compiled. The pluginsync mechanism, on the other hand, makes sure that all synced facts are available before manifest compilation starts. When facts are not static and cannot be placed in a txt or YAML file, you can make the file executable instead. It will usually be a shell script, but the implementation is of no consequence; it is just important that properly formatted data is written to the standard output. You can simplify the hello fact this way, in /etc/puppet/modules/hello_world/facts.d/hello: #!/bin/sh echo hello=Hello, world\! For executable facts, the ini styled key=value format is the only supported one. YAML or JSON are not eligible in this context. Goals of Facter The whole structure and philosophy of Facter serves the goal of allowing for platform-agnostic usage and development. The same collection of facts (roughly) is available on all supported platforms. This allows Puppet users to keep a coherent development style through manifests for all those different systems. Facter forms a layer of abstraction over the characteristics of both hardware and software. It is an important piece of Puppet's platform-independent architecture. Another piece that was mentioned before is the type and provider subsystem. Types and providers are explored in greater detail in the following sections. Understanding the type system Each resource represents a piece of state on the agent system. It has a resource type, a name (or a title), and a list of attributes. An attribute can either be property or parameter. Between the two of them, properties represent distinct pieces of state, and parameters merely influence Puppet's actions upon the property values. Let's examine resource types in more detail and understand their inner workings. This is not only important when extending Puppet with resource types of your own. It also helps you anticipate the action that Puppet will take, given your manifest, and get a better understanding of both the master and the agent. First, we take a closer look at the operational structure of Puppet, with its pieces and phases. The agent performs all its work in discreet transactions. A transaction is started every time under any of the following circumstances: The background agent process activates and checks in to the master An agent process is started with the --onetime or --test option A local manifest is compiled using puppet apply The transaction always passes several stages: Gathering fact values to form the actual catalog request. Receiving the compiled catalog from the master. Prefetching of current resource states. Validation of the catalog's content. Synchronization of the system with the property values from the catalog. Facter was explained in the previous section. The resource types become important during compilation and then throughout the rest of the agent transaction. The master loads all resource types to perform some basic checking—it basically makes sure that the types of resources it finds in the manifests do exist and that the attribute names fit the respective type. The resource type's life cycle on the agent side Once the compilation has succeeded, the master hands out the catalog and the agent enters the catalog validation phase. Each resource type can define some Ruby methods that ensure that the passed values make sense. This happens on two levels of granularity: each attribute can validate its input value, and then the resource as a whole can be checked for consistency. One example for attribute value validation can be found in the ssh_authorized_key resource type. A resource of this type fails if its key value contains a whitespace character, because SSH keys cannot comprise multiple strings. Validation of whole resources happens with the cron type for example. It makes sure that the time fields make sense together. The following resource would not pass, because special times such as @midgnight cannot be combined with numeric fields: cron {   'invalid-resource':       command => 'rm -rf /',       special => 'midnight',       weekday => [ '2', '5' ],} Another task during this phase is the transformation of input values to more suitable internal representations. The resource type code refers to this as a munge action. Typical examples of munging are the removal of leading and trailing whitespace from string values, or the conversion of array values to an appropriate string format—this can be a comma-separated list, but for search paths, the separator should be a colon instead. Other kinds of values will use different representations. Next up is the prefetching phase. Some resource types allow the agent to create an internal list of resource instances that are present on the system. For example, this is possible (and makes sense) for installed packages—Puppet can just invoke the package manager to produce the list. For other types, such as file, this would not be prudent. Creating a list of all reachable paths in the whole filesystem can be arbitrarily expensive, depending on the system on which the agent is running. Finally, the agent starts walking its internal graph of interdependent resources. Each resource is brought in sync if necessary. This happens separately for each individual property, for the most part. The ensure property, for types that support it, is a notable exception. It is expected to manage all other properties on its own—when a resource is changed from absent to present through its ensure property (in other words, the resource is getting newly created), this action should bring all other properties in sync as well. There are some notable aspects of the whole agent process. For one, attributes are handled independently. Each can define its own methods for the different phases. There are quite a number of hooks, which allow a resource type author to add a lot of flexibility to the model. For aspiring type authors, skimming through the core types can be quite inspirational. You will be familiar with many attributes, through using them in your manifests and studying their hooks will offer quite some insight. It is also worth noting that the whole validation process is performed by the agent, not the master. This is beneficial in terms of performance. The master saves a lot of work, which gets distributed to the network of agents (which scales with your needs automatically). Substantiating the model with providers At the start of this article, you learned about Facter and how it works like a layer of abstraction over the supported platforms. This unified information base is one of Puppet's most important means to achieve its goal of operating system independence. Another one is the DSL, of course. Finally, Puppet also needs a method to transparently adapt its behavior to the respective platform on which each agent runs. In other words, depending on the characteristics of the computing environment, the agent needs to switch between different implementations for its resources. This is not unlike object-oriented programming—the type system provides a unified interface, like an abstract base class. The programmer need not worry what specific class is being referenced, as long as it correctly implements all required methods. In this analogy, Puppet's providers are the concrete classes that implement the abstract interface. For a practical example, look at package management. Different flavors of UNIX-like operating systems have their own implementation. The most prevalent Puppet platforms use apt and yum, respectively, but can (and sometimes must) also manage their packages through dpkg and rpm. Other platforms use tools such as emerge, zypper, fink, and a slew of other things. There are even packages that exist apart from the operating system software base, handled through gem, pip, and other language-specific package management tools. For each of these management tools, there is a provider for the package type. Many of these tools allow the same set of operations—install and uninstall a package and update a package to a specific version. The latter is not universally possible though. For example, dpkg can only ever install the local package that is specified on the command line, with no other version to choose. There are also some distinct features that are unique to specific tools, or supported by only a few. Some management systems can hold packages at specific versions. Some use different states for uninstalled versus purged packages. Some have a notion of virtual packages. There are some more examples. Because of this potential diversity (which is not limited to package management systems), Puppet providers can opt for features. The set of features is resource type specific. All providers for a type can support one or more of the same group of features. For the package type, there are features such as versionable, purgeable, holdable, and so forth. You can set ensure => purged on any package resource like so: package {   'haproxy':       ensure => 'purged'} However, if you are managing the HAproxy package through rpm, Puppet will fail to make any sense of that, because rpm has no notion of a purged state, and therefore the purgeable feature is missing from the rpm provider. Trying to use an unsupported feature will usually produce an error message. Some attributes (such as install_options) might just get ignored by Puppet instead. The official documentation on the Puppet Labs website holds a complete list of the core resource types and all their built-in providers, along with the respective feature matrices. It is very easy to find suitable providers and their capabilities; the documentation is at https://docs.puppetlabs.com/references/latest/type.html. Providerless resource types There are some resource types that use no providers, but they are rare among the core types. Most of the interesting management tasks that Puppet makes easy just work differently among operating systems, and providers enable this in a most elegant fashion. Even for straightforward tasks that are the same on all platforms, there might be a provider. For example, there is a host type to manage entries in the /etc/hosts file. Its syntax is universal, so the code can technically just be implemented in the type. However, there are actual abstract base classes for certain kinds of providers in the Puppet code base. One of them makes it very easy to build providers that edit files if those files consist of single-line records with ordered fields. Therefore, it makes sense to implement a provider for the host type and base it on this provider class. For the curious, this is what a host resource looks like: host { 'puppet':   ip => '10.144.12.100',   host_aliases => [ 'puppet.example.net', 'master' ]} Summarizing types and providers Puppet's resource types and their providers work together to form a solid abstraction layer over software configuration details. The type system is an extendable basis for Puppet's powerful DSL. It forms an elaborate interface for the polymorphous provider layer. The providers flexibly implement the actual management actions that Puppet is supposed to perform. They map the necessary synchronization steps to commands and system interactions. Many providers cannot satisfy every nuance that the resource type models. The feature system takes care of these disparities in a transparent fashion. Putting it all together Reading this far, you might have gotten the impression that this article is a rather odd mix of topics. While types and providers do belong closely together, the whole introduction to Facter might seem out of place in their context. This is deceptive however: facts do play a vital part in the type/provider structure. They are essential for Puppet to make good choices among providers. Let's look at an example from the Extending Facter with custom facts section once more. It was about fstab entries and the difference of Solaris, where those are found in /etc/vfstab instead of /etc/fstab. That section suggested a manifest that adapts according to a fact value. As you can imagine now, Puppet has a resource type to manage fstab content: the mount type. However, for the small deviation of a different file path, there is no dedicated mount provider for Solaris. There is actually just one provider for all platforms, but on Solaris, it behaves differently. It does this by resolving Facter's osfamily value. The following code example was adapted from the actual provider code: case Facter.value(:osfamily)when "Solaris"fstab = "/etc/vfstab"elsefstab = "/etc/fstab"end In other cases, Puppet should use thoroughly different providers on different platforms, though. Package management is a classic example. On a Red Hat-like platform, you will want Puppet to use the yum provider in virtually all cases. It can be sensible to use rpm, and even apt might be available. However, if you tell Puppet to make sure a package is installed, you expect it to install it using yum if necessary. This is obviously a common theme. Certain management tasks need to be performed in different environments, with very different toolchains. In such cases, it is quite clear which provider would be best suited. To make this happen, a provider can declare itself the default if a condition is met. In the case of yum, it is the following: defaultfor :operatingsystem => [:fedora, :centos, :redhat] The conditions are based around fact values. If the operatingsystem value for a given agent is among the listed, yum will consider itself the default package provider. The operatingsystem and osfamily facts are the most popular choices for such queries in providers, but any fact is eligible. In addition to marking themselves as being default, there is more filtering of providers that relies on fact values. Providers can also confine themselves to certain combinations of values. For example, the yum alternative, zypper, confines itself to SUSE Linux distributions: confine :operatingsystem => [:suse, :sles, :sled, :opensuse] This provider method works just like the confine method in Facter, which was discussed earlier in this article. The provider will not even be seen as valid if the respective facts on the agent machine have none of the white-listed values. If you find yourself looking through code for some core providers, you will notice confinement (and even declaring default providers) on feature values, although there is no Facter fact of that name. These features are not related to provider features either. They are from another layer of introspection similar to Facter, but hardcoded into the Puppet agent. These agent features are a number of flags that identify some system properties that need not be made available to manifests in the form of facts. For example, the posix provider for the exec type becomes the default in the presence of the corresponding feature: defaultfor :feature => :posix You will find that some providers forgo the confine method altogether, as it is not mandatory for correct agent operation. Puppet will also identify unsuitable providers when looking for their necessary operating system commands. For example, the pw provider for certain BSD flavors does not bother with a confine statement. It only declares its one required command: commands :pw => "pw" Agents that find no pw binary in their search path will not try and use this provider at all. This concludes the little tour of the inner workings of types and providers with the help of Facter. Summary Puppet gathers information about all agent systems using Facter. The information base consists of a large number of independent bits, called facts. Manifests can query the values of those facts to adapt to the respective agents that trigger their compilation. Puppet also uses facts to choose among providers, the work horses that make the abstract resource types functional across the wide range of supported platforms. The resource types not only completely define the interface that Puppet exposes in the DSL, they also take care of all validation of input values, transformations that must be performed before handing values to the providers and other related tasks. The providers encapsulate all knowledge of actual operating systems and their respective toolchains. They implement the functionality that the resource types describe. The Puppet model's configurations apply to platforms, which vary from one another, so not every facet of every resource type can make sense for all agents. By exposing only the supported features, a provider can express such limitations. Resources for Article: Further resources on this subject: Module, Facts, Types and Reporting tools in Puppet [article] Quick start – Using the core Puppet resource types [article] Designing Puppet Architectures [article]
Read more
  • 0
  • 0
  • 1309

article-image-look-responsive-design-frameworks
Packt
19 Nov 2014
11 min read
Save for later

A look into responsive design frameworks

Packt
19 Nov 2014
11 min read
In this article, by Thoriq Firdaus author of Responsive Web Design by Example Beginner's Guide Second Edition we will look into responsive web design which is one of the most discussed topics among the web design and development community. So I believe many of you have heard about it to certain extend. (For more resources related to this topic, see here.) Ethan Marcotte was the one who coined the term "Responsive Web Design". He suggests in his article, Responsive Web Design, that the web should seamlessly adjust and adapt to the environment where the users view the website, rather than addressing it exclusively for a specific platform. In other words, the website should be responsive; it should be presentable at any screen size and regardless of the platform in which the website is viewed. Take Time website as an example, the web page fits nicely in a desktop browser with large screen size and also in a mobile browser with limited viewable area. The layout shifts and adapts as the viewport size changes. As you can see from the following screenshot, the header background color turned into dark grey, the image is scaled down proportionally, and the Tap bar appears where Time hides the Latest news, Magazine and Videos section: Yet, building a responsive website could be very tedious work. There are many measurements to consider when building a responsive website, one of which would be creating the responsive grid. Grid helps us to build websites with proper alignment. If you have ever used 960.gs framework, which is one of the popular CSS Frameworks, you should’ve experienced how easy is to organize the web page layout by adding preset classes like grid_1 or push_1 in the elements. However, 960.gs grid is set in fixed unit, pixel (px), which is not applicable when it comes to building a responsive website. We need a Framework with the grid set in percentage (%) unit to build responsive websites; we need a Responsive Framework. A Responsive Framework provides the building blocks to build responsive websites. Generally, it includes the classes to assemble a responsive grid, the basic styles for typography and form inputs, and a few styles to address various browser quirks. Some frameworks even go further with a series of styles for creating common design patterns and Web User Interface such as buttons, navigation bars, and image slider. These predefined styles allow us to develop responsive websites faster with less of the hassle. And the following are a few other reasons why using a Responsive Framework is a favorable option to build responsive websites: Browser Compatibility: Assuring consistency of a web page on different browsers is really painful and more distressing than developing the website itself. But, with a framework, we can minimize the work to address Browser Compatibility issues. The framework developers most likely have tested the framework in various desktop browsers and mobile browsers with the most constrained environment prior to releasing it publicly. Documentation: A framework, in general, also comes with comprehensive documentation that records the bits and pieces on using the framework. The documentation would be very helpful for entry users to begin to study the framework. It is also a great advantage when we are working with a team. We can refer to the documentation to get everyone on the same page and follow the standard code of writing conventions. Community and Extensions: Some popular frameworks like Bootstrap and Foundation have an active community that helps addressing the bugs in the framework and extends the functionality. The jQuery UI Bootstrap is perhaps a good example, in this case. jQuery UI Bootstrap is a collection styles for jQuery UI widgets to match the feel and look of Bootstrap’s original theme. It’s now a common to find free WordPress and Joomla themes that are based using these frameworks. The Responsive.gs framework Responsive.gs is a lightweight responsive framework with merely 1kb of size when compressed. Responsive.gs is based on a width of 940px, and made in three variant of grids: 12, 16, and 24 columns. What’s more, Responsive.gs is shipped with Box Sizing polyfill that enables CSS3 box-sizing in Internet Explorer 8 to Internet Explorer 6, and make it decently presentable in those browsers. Polyfill is a piece code that enables certain web features and capabilities that are not built in the browser natively; usually, it addresses to the older version of Internet Explorer. For example, you can use HTML5 Shiv so that new HTML5 elements, such as <header>, <footer>, and <nav>, are recognized in Internet Explorer 8 to Internet Explorer 6. CSS Box model HTML elements, which are categorized as a block-level element, are essentially a box drawn with the content width, height, margin, padding, and border through CSS. Prior to CSS3, we have been facing a constraint when specifying a box. For instance, when we specify a <div> with width and height of 100px, as follows: div {width: 100px;height: 100px;} The browser will render the div as 100px, square box. However, this will only be true if the padding and border have not been added in. Since a box has four sides, a padding of 10px (padding: 10px;) will actually add 20px to the width and height — 10px for each side, as follows. While it takes up space on the page, the element's margin is space reserved outside the element rather than part of the element itself; thus, if we give an element a background color, the margin area will not take on that color. CSS3 Box sizing CSS3 introduced a new property called box-sizing that lets us to specify how the browser should calculate the CSS box model. There are a couple of values that we can apply within the box-sizing property, which are: content-box: this is the default value of the box model. This value specifies the padding and the border box's thickness outside the specified width and height of the content, as we have demonstrated in the preceding section. border-box: this value will do the opposite; it includes the padding and the border box as the width and height of the box. padding-box: at the time of writing this article, this value is experimental and has just been added recently. This value specifies the box dimensions. Let’s take our preceding as our example, but this time we will set the box-sizing model to border-box. As mentioned in the table above, the border-box value will retain the box’s width and the height for 100px regardless of the padding and border addition. The following illustration shows a comparison between the outputs of the two different values, the content-box (the default) and the border-box. The Bootstrap framework Bootstrap was originally built by Mark Otto and was initially only intended for internal use in Twitter. Short story, Bootstrap was then launched for free for public consumption. Bootstrap has long been associated with Twitter, but since the author has departed from Twitter and Bootstrap itself has grown beyond his expectations..... Date back to the initial development, the responsive feature was not yet added, it was then added in version 2 along with the increasing demand for creating responsive websites. Bootstrap also comes with a lot more added features as compared to Responsive.gs. It is packed with preset user interface styles, which comprise of common User Interfaces used on websites such as buttons, navigation bars, pagination, and forms so you don’t have to create them from scratch again when starting off a new project. On top of that, Bootstrap is also powered with some custom jQuery plugins like image slider, carousel, popover and modal box. You can use and customize Bootstrap in many ways. You can directly customize Bootstrap theme and components directly through the CSS style sheets, the Bootstrap Customization page, or the Bootstrap LESS variables and mixins, which are used to generate the style sheets. The Foundation framework Foundation is a framework created by ZURB, a design agency based in California. Similar to Bootstrap, Foundation is beyond just a responsive CSS framework; it is shipped with preset grid, components, and a number of jQuery plugins to present interactive features. Some high-profile brands have built their website using of Foundation such as McAfee, which is one the most respectable brands for computer anti-virus. Foundation style sheet is powered by Sass, a Ruby-based CSS Pre-processor. There are many complaint that the code in responsive frameworks is excessive; since a framework like Bootstrap is used widely, it has to cover every design scenario and thus it comes with some extra styles that you might not need for your website. Fortunately, we can easily minimize this issue by using the right tools like CSS Preprocessors and following a proper workflow. And speaking the truth, there isn’t a perfect solution, and certainly using a framework isn’t for everyone. It all comes down to your need, your website need, and in particular your client needs and budgets. In reality, you will have to weigh these factors to decide whether you will go with responsive framework or not. Jem Kremer has an extensive discussion on this regard in her article: Responsive Design Frameworks: Just Because You Can, Should You? A brief Introduction to CSS preprocessors Both Bootstrap and Foundation uses CSS Pre-processors to generate their style sheets. Bootstrap uses LESS — though the official support for Sass has just been released recently. Foundation, on the contrary, uses Sass as the only way to generate its style sheets. CSS pre-processor is not an entirely new language. If you have known CSS, you should be accustomed to CSS Pre-preprocessor immediately. CSS Pre-processor simply extends CSS by allowing the use of programming features like Variables, Functions, and Operations. Below is an example of how we write CSS with LESS syntax. @color: #f3f3f3;body {background-color: @color;}p {color: darken(@color, 50%);} When the above code is compiled, it takes the @color variable that we have defined and place the value in the output, as follows. body {background-color: #f3f3f3;}p {color: #737373;} The variable is reusable throughout the style sheet that enables us to retain style consistency and make the style sheet more maintainable. Delve into responsive web design Our discussion on Responsive Web Design herein, though essential, is merely a tip of the iceberg. There are so much more about Responsive Web Design than what have recently covered in the preceding sections. I would suggest that you take your time to get yourself more insight and apprehension on Responsive Web Design including the concept, the technicalities, and some constraints. The following are some of the best recommendations of reference to follow: Also a good place to start Responsive Web Design by Rachel Shillcock. Don’t Forget the Viewport Meta Tag by Ian Yates. How To Use CSS3 Media Queries To Create a Mobile Version of Your Website by Rachel Andrew. Read about the future standard on responsive image using HTML5 Picture Element Responsive Images Done Right: A Guide To <picture> And srcset by Eric Portis a roundup of methods of making data table responsive. Responsive web design inspiration sources Now before we jump down into the next Chapters and start off building responsive websites, it may be a good idea to spend some time looking for ideas and inspiration of responsive websites; to see how they are built, and how the layout is organized in desktop browsers as well as in mobile browsers. It’s a common thing for websites to be redesigned from time to time to stay fresh. So herein, instead of making a pile of website screenshots, which may no longer be relevant in the next several months because of the redesign, we’re better going straight to the websites that curates websites, and following is the places to go: MediaQueries Awwwards CSS Awards WebDesignServed Bootstrap Expo Zurb Responsive Summary Using a framework is the easier and faster way to build responsive websites up and running rather than building everything from scratch on our own. Alas, as mentioned, using a framework also has some negative sides. If it is not done properly, the end result could all go wrong. The website could be stuffed and stuck with unnecessary styles and JavaScript, which at the end makes the website load slowly and hard to maintain. We need to set up the right tools that, not only will they facilitate the projects, but they also help us making the website more easily maintainable. Resources for Article:  Further resources on this subject: Linking Dynamic Content from External Websites [article] Building Responsive Image Sliders [article] Top Features You Need to Know About – Responsive Web Design [article]
Read more
  • 0
  • 0
  • 9964
article-image-managing-public-and-private-groups
Packt
19 Nov 2014
11 min read
Save for later

Managing public and private groups

Packt
19 Nov 2014
11 min read
In this article by Andrew Mallett, the author of CentOS System Administration Essentials, we will look at how we can manage public and private groups and set quotas. The Red Hat and, therefore, the CentOS user management systems deploy a private user group system. Each user created will also belong to an eponymous primary group; in other words, creating a user bob will also create a group bob, to which the user will be the only member. (For more resources related to this topic, see here.) Linux groups Firstly, we have to understand a little about Linux groups. A user has both a primary group and secondary groups. User ID and group ID (UID/GID) are used with the permission management structure in Linux. Every file in any filesystem will be owned by a user and a group by means of storing the UID and GID in the files metadata. Permissions can be assigned to the user, group, or others. Each user has one UID and GID but belongs to just one group, which is a little restrictive, so users additionally have secondary groups. Users can change their current GID to one from their secondary groups using the /usr/bin/newgrp command, effectively switching their GID. In practice, this is not required and leads us to describing the differences between the users' primary group and secondary groups. When creating a new file, the users UID and their current GID are used to create the ownership of the new file. If a user creates a new file, he/she will be the owner of that file and the file will be group owned by his/her own private group, creating an inherently secure system without the need of user intervention. Secondary groups are used in all other situations when accessing resources that currently exist. Users present all of their secondary groups when accessing a resource. In this way, a file that is readable by the users group but not to others will be accessible to a user whose GID is set to his/her own private group, but the list of secondary groups to which they belong includes the users group. When assessing a user's ID, setting the /usr/bin/id command can be very useful. Run without any options or arguments and the output will display your own associated IDs. In the following screenshot, we can see that the user andrew belongs to only the private user group and has no additional secondary group memberships: $ id We will use the same command but this time we will use the user, u1, as an argument. The output will show the associated IDs of that account; this command can be run as a standard user: $ id u1 From the following screenshot, we can see that the user, u1, has the primary group or GID assigned to the private group u1; however, u1 additionally belongs to the users group. With the current IDs in place for the user u1, any new file created will be group owned by GID 501 (u1), but u1 can access any resource accessible to the users and u1 groups without any additional action on u1's part. From an administrative perspective, we need to make sure we assign the correct secondary IDs to our users. The same cannot be said for the first example that we looked at. The user, andrew, currently belongs only to andrew's private group, so he can only access resources where permissions are set to: Their UID (andrew) Their private GID (andrew) Others The user account andrew does not have access to permissions assigned to the users group in the same way that the user u1 does. Adding users to groups We can now see that the user u1 has the desired access to resources shared with the users groups, but what about andrew? How can we help here? If the user already exists and we need to add him/her to a public group, then we can use the usermod command to add the user to an additional group. When we add andrew to the users group, we will also want to maintain any existing secondary groups' memberships. Run the following command: # usermod -G users andrew If we choose to run the preceding command, then andrew would be added to the users groups but, along with his primary group, this would be his only secondary group membership. In other words, if andrew belongs to multiple secondary groups, the -G option overwrites this group list, which is not a good thing. The command ID can display current secondary groups with the -G option: # id -G andrew If we combine the two commands together, then we can effectively append the users groups to the current group list of andrew. To do this, additionally, we have to translate the spaces in the group list supplied by the command ID into commas: # usermod -G$(id -G andrew | tr ' ' ','),users The commands in the parenthesis are evaluated first. The id command creates a space-separated list of secondary groups, and the tr command will translate the spaces to commas (in this case). The group list we supply to usermod needs to be comma delimited but can use group names or IDs. More simply though, we can use the append option to usermod as shown in the following code example: # usermod -a -G users andrew When creating new users, we can simply specify the secondary groups the user should belong to. We don't need to concern ourselves with the existing group membership: # useradd -G users u4# id u4 From the following output, we can see that the new user, u4, is created and added to the secondary group users. Evaluating private group usage You do not need to use private groups schemes. They are the default, but as with all defaults, we can specify options to modify this. Using the -N option with useradd will not create the private groups and, if not specified, the user's primary group or GID will be the users groups. Let's execute the following commands: # useradd -N u5# id u5 The output is shown in the following screenshot, and we see that the users' primary group is the users group: The only security issue that we may need to contend with is that now, by default, any file created by the user u5 will be group owned by a shared group. Depending on the circumstances, this may be not desirable; however, having all files private to the user by default is no more desirable either. This is up to the administration team deciding which model suits the organizational needs best. Getent The /usr/bin/getent command will display a list of entries, Get Entries. The entries are resolved by Name Service Switch Libraries, which are configured in the /etc/nsswitch.conf file. This file has a list of databases and libraries that will be used to access those databases. For example, we could use the getent passwd command to display all users, or getent group to display all groups. We could extend this though to commands such as getent hosts to display host file entries and getent aliases to display user aliases on the system. The nsswitch.conf file will define the libraries used to access the passwd database. On a standard CentOS system, /etc/passwd is often the only local file, but an enterprise system could include Lightweight Directory Access Protocol (LDAP) modules. We search the /etc/nsswitch file for the passwd database using grep: # grep passwd /etc/nsswitch.conf We can see that on my system, we just use the local files to resolve user names: The getent command is a very useful way to quickly list users or groups on your system, and the output can be filtered or sorted as required with the grep and sort commands. For example, if we want to see all configured groups on our system that start with the letter u and have only one additional character in their names, we can use the following command: # getent group | grep 'u.:' | sort The following screenshot shows this command: Quotas In almost all areas of user management, we have to assign disk space quotas of some description in order to give the responsibility of disk space management back to the user. If we do not, then the user would never know the struggles that we have to face in providing them with unlimited disk space. Allowing the user to see that their space is filling up then may prompt them to carry out a little housekeeping. In Linux, disk quotas are applied to the mount points; if you want to limit a user's space in their home directory, then the /home directory will need to be in its own partition. If it is part of the root filesystem, then a user's space will be restricted to all directories within that partition. Quota restrictions are implemented using tools from the quota package. You can use the yum command to verify that it is installed: $ yum list quota If the output of the command indicates that it is available rather than installed, then install the quota with: # yum install quota Setting quotas My system includes a partition for /home and has the quota package installed. We now need to set the correct mount options for the /home partition. Currently, it does not include quotas. To enable this, we will edit the /etc/fstab file and mount options for the /home partition. The following two mount options should be added to enable journal quotas for a selected partition: usrjquota=aquota.user,jqfmt=vfsv0 The userjquota=aquota.user part specifies the quota file, and jqfmt=vfsv0 specifies the quota format. The line in question is shown in the following screenshot: We have enabled journal-based user quotas as we are using ext4, a journal-based filesystem. User space restriction is checked when writing the journal rather than waiting until the changes are flushed to disk. We also set the format of the journal quotas. To make these settings effective, we can remount the /home partition using the following command: # mount -o remount /home We will now need to initialize the quota database; this was referenced in the mount options as aquota.user and will reside at the root of the partition where quotas are enabled. Enabling quotas on a filesystem may take some time, depending on the amount of data in the filesystem: #quotacheck -muv /home Using these options with the /sbin/quotacheck command, we can set the following options: -m: This indicates not to remount as read-only during an operation -u: This is for user quotas -v: This is the verbose output /home: This is the partition to work with, or use -a for all quota-enabled partitions It may be worth adding the quotacheck commands and options into your crontab to ensure that quotacheck is run perhaps once a day. Even though journal quotas are more reliable than traditional quotas, there is no harm in re-evaluating file space used to ensure that the data maintained is accurate. Quotas can be set with the edquota or setquota command; I prefer the setquota command, but traditionally edquota is taught to new administrators. The /usr/sbin/edquota command takes you into your editor to make the changes, whereas /usr/sbin/setquota sets the quota directly from the command line: # setquota -u u1 20000 25000 0 0 /home The preceding command will set the quota for the user u1. Giving the user a soft limit, just a warning when they exceed 20 M (20 x 1k blocks) and implementing a hard limit of 25 M, where the user cannot save any more data in /home. I have not limited the user u1 with either soft or hard limits to the number of files they may have, just the space they use. The /usr/sbin/repquota command can be used to display disk usage: # repquota -uv /home The output from my system is shown in the following screenshot: Summary The big task for this article was to become more accustomed to the vagaries of CentOS group management and being able to properly differentiate between the primary group and secondary groups of a user. During this process, we took the time to evaluate the use of public and private group schemes and the use of the -N option to disable the user's private group during user creation. It was not long before we found ourselves in the depths of /etc/nsswitch.conf and the getent command (get entries). From here, we got down straight to business implementing user disk limits or quotas. Resources for Article: Further resources on this subject: Installing CentOS [article] Creating a sample C#.NET application [article] Introducing SproutCore [article]
Read more
  • 0
  • 0
  • 1835

article-image-plot-function
Packt
18 Nov 2014
17 min read
Save for later

The plot function

Packt
18 Nov 2014
17 min read
In this article by L. Felipe Martins, the author of the book, IPython Notebook Essentials, has discussed about the plot() function, which is an important aspect of matplotlib, an IPython library for production of publication-quality graphs. (For more resources related to this topic, see here.) The plot() function is the workhorse of the matplotlib library. In this section, we will explore the line-plotting and formatting capabilities included in this function. To make things a bit more concrete, let's consider the formula for logistic growth, as follows: This model is frequently used to represent growth that shows an initial exponential phase, and then is eventually limited by some factor. The examples are the population in an environment with limited resources and new products and/or technological innovations, which initially attract a small and quickly growing market but eventually reach a saturation point. A common strategy to understand a mathematical model is to investigate how it changes as the parameters defining it are modified. Let's say, we want to see what happens to the shape of the curve when the parameter b changes. To be able to do what we want more efficiently, we are going to use a function factory. This way, we can quickly create logistic models with arbitrary values for r, a, b, and c. Run the following code in a cell: def make_logistic(r, a, b, c):    def f_logistic(t):        return a / (b + c * exp(-r * t))    return f_logistic The function factory pattern takes advantage of the fact that functions are first-class objects in Python. This means that functions can be treated as regular objects: they can be assigned to variables, stored in lists in dictionaries, and play the role of arguments and/or return values in other functions. In our example, we define the make_logistic() function, whose output is itself a Python function. Notice how the f_logistic() function is defined inside the body of make_logistic() and then returned in the last line. Let's now use the function factory to create three functions representing logistic curves, as follows: r = 0.15 a = 20.0 c = 15.0 b1, b2, b3 = 2.0, 3.0, 4.0 logistic1 = make_logistic(r, a, b1, c) logistic2 = make_logistic(r, a, b2, c) logistic3 = make_logistic(r, a, b3, c) In the preceding code, we first fix the values of r, a, and c, and define three logistic curves for different values of b. The important point to notice is that logistic1, logistic2, and logistic3 are functions. So, for example, we can use logistic1(2.5) to compute the value of the first logistic curve at the time 2.5. We can now plot the functions using the following code: tmax = 40 tvalues = linspace(0, tmax, 300) plot(tvalues, logistic1(tvalues)) plot(tvalues, logistic2(tvalues)) plot(tvalues, logistic3(tvalues)) The first line in the preceding code sets the maximum time value, tmax, to be 40. Then, we define the set of times at which we want the functions evaluated with the assignment, as follows: tvalues = linspace(0, tmax, 300) The linspace() function is very convenient to generate points for plotting. The preceding code creates an array of 300 equally spaced points in the interval from 0 to tmax. Note that, contrary to other functions, such as range() and arange(), the right endpoint of the interval is included by default. (To exclude the right endpoint, use the endpoint=False option.) After defining the array of time values, the plot() function is called to graph the curves. In its most basic form, it plots a single curve in a default color and line style. In this usage, the two arguments are two arrays. The first array gives the horizontal coordinates of the points being plotted, and the second array gives the vertical coordinates. A typical example will be the following function call: plot(x,y) The variables x and y must refer to NumPy arrays (or any Python iterable values that can be converted into an array) and must have the same dimensions. The points plotted have coordinates as follows: x[0], y[0] x[1], y[1] x[2], y[2] … The preceding command will produce the following plot, displaying the three logistic curves: You may have noticed that before the graph is displayed, there is a line of text output that looks like the following: [<matplotlib.lines.Line2D at 0x7b57c50>] This is the return value of the last call to the plot() function, which is a list (or with a single element) of objects of the Line2D type. One way to prevent the output from being shown is to enter None as the last row in the cell. Alternatively, we can assign the return value of the last call in the cell to a dummy variable: _dummy_ = plot(tvalues, logistic3(tvalues)) The plot() function supports plotting several curves in the same function call. We need to change the contents of the cell that are shown in the following code and run it again: tmax = 40 tvalues = linspace(0, tmax, 300) plot(tvalues, logistic1(tvalues),      tvalues, logistic2(tvalues),      tvalues, logistic3(tvalues)) This form saves some typing but turns out to be a little less flexible when it comes to customizing line options. Notice that the text output produced now is a list with three elements: [<matplotlib.lines.Line2D at 0x9bb6cc0>, <matplotlib.lines.Line2D at 0x9bb6ef0>, <matplotlib.lines.Line2D at 0x9bb9518>] This output can be useful in some instances. For now, we will stick with using one call to plot() for each curve, since it produces code that is clearer and more flexible. Let's now change the line options in the plot and set the plot bounds. Change the contents of the cell to read as follows: plot(tvalues, logistic1(tvalues),      linewidth=1.5, color='DarkGreen', linestyle='-') plot(tvalues, logistic2(tvalues),      linewidth=2.0, color='#8B0000', linestyle=':') plot(tvalues, logistic3(tvalues),      linewidth=3.5, color=(0.0, 0.0, 0.5), linestyle='--') axis([0, tmax, 0, 11.]) None Running the preceding command lines will produce the following plots: The options set in the preceding code are as follows: The first curve is plotted with a line width of 1.5, with the HTML color of DarkGreen, and a filled-line style The second curve is plotted with a line width of 2.0, colored with the RGB value given by the hexadecimal string '#8B0000', and a dotted-line style The third curve is plotted with a line width of 3.0, colored with the RGB components, (0.0, 0.0, 0.5), and a dashed-line style Notice that there are different ways of specifying a fixed color: a HTML color name, a hexadecimal string, or a tuple of floating-point values. In the last case, the entries in the tuple represent the intensity of the red, green, and blue colors, respectively, and must be floating-point values between 0.0 and 1.0. A complete list of HTML name colors can be found at http://www.w3schools.com/html/html_colornames.asp. Editor's Tip: For more insights on colors, check out https://dgtl.link/colors Line styles are specified by a symbolic string. The allowed values are shown in the following table: Symbol string Line style '-' Solid (the default) '--' Dashed ':' Dotted '-.' Dash-dot 'None', '', or '' Not displayed After the calls to plot(), we set the graph bounds with the function call: axis([0, tmax, 0, 11.]) The argument to axis() is a four-element list that specifies, in this order, the maximum and minimum values of the horizontal coordinates, and the maximum and minimum values of the vertical coordinates. It may seem non-intuitive that the bounds for the variables are set after the plots are drawn. In the interactive mode, matplotlib remembers the state of the graph being constructed, and graphics objects are updated in the background after each command is issued. The graph is only rendered when all computations in the cell are done so that all previously specified options take effect. Note that starting a new cell clears all the graph data. This interactive behavior is part of the matplotlib.pyplot module, which is one of the components imported by pylab. Besides drawing a line connecting the data points, it is also possible to draw markers at specified points. Change the graphing commands indicated in the following code snippet, and then run the cell again: plot(tvalues, logistic1(tvalues),      linewidth=1.5, color='DarkGreen', linestyle='-',      marker='o', markevery=50, markerfacecolor='GreenYellow',      markersize=10.0) plot(tvalues, logistic2(tvalues),      linewidth=2.0, color='#8B0000', linestyle=':',      marker='s', markevery=50, markerfacecolor='Salmon',      markersize=10.0) plot(tvalues, logistic3(tvalues),      linewidth=2.0, color=(0.0, 0.0, 0.5), linestyle='--',      marker = '*', markevery=50, markerfacecolor='SkyBlue',      markersize=12.0) axis([0, tmax, 0, 11.]) None Now, the graph will look as shown in the following figure: The only difference from the previous code is that now we added options to draw markers. The following are the options we use: The marker option specifies the shape of the marker. Shapes are given as symbolic strings. In the preceding examples, we use 'o' for a circular marker, 's' for a square, and '*' for a star. A complete list of available markers can be found at http://matplotlib.org/api/markers_api.html#module-matplotlib.markers. The markevery option specifies a stride within the data points for the placement of markers. In our example, we place a marker after every 50 data points. The markercolor option specifies the color of the marker. The markersize option specifies the size of the marker. The size is given in pixels. There are a large number of other options that can be applied to lines in matplotlib. A complete list is available at http://matplotlib.org/api/artist_api.html#module-matplotlib.lines. Adding a title, labels, and a legend The next step is to add a title and labels for the axes. Just before the None line, add the following three lines of code to the cell that creates the graph: title('Logistic growth: a={:5.2f}, c={:5.2f}, r={:5.2f}'.format(a, c, r)) xlabel('$t$') ylabel('$N(t)=a/(b+ce^{-rt})$') In the first line, we call the title() function to set the title of the plot. The argument can be any Python string. In our example, we use a formatted string: title('Logistic growth: a={:5.2f}, b={:5.2f}, r={:5.2f}'.format(a, c, r)) We use the format() method of the string class. The formats are placed between braces, as in {:5.2f}, which specifies a floating-point format with five spaces and two digits of precision. Each of the format specifiers is then associated sequentially with one of the data arguments of the method. A full documentation covering the details of string formatting is available at https://docs.python.org/2/library/string.html. The axis labels are set in the calls: xlabel('$t$') ylabel('$N(t)=a/(b+ce^{-rt})$') As in the title() functions, the xlabel() and ylabel() functions accept any Python string. Note that in the '$t$' and '$N(t)=a/(b+ce^{-rt}$' strings, we use LaTeX to format the mathematical formulas. This is indicated by the dollar signs, $...$, in the string. After the addition of a title and labels, our graph looks like the following: Next, we need a way to identify each of the curves in the picture. One way to do that is to use a legend, which is indicated as follows: legend(['b={:5.2f}'.format(b1),        'b={:5.2f}'.format(b2),        'b={:5.2f}'.format(b3)]) The legend() function accepts a list of strings. Each string is associated with a curve in the order they are added to the plot. Notice that we are again using formatted strings. Unfortunately, the preceding code does not produce great results. The legend, by default, is placed in the top-right corner of the plot, which, in this case, hides part of the graph. This is easily fixed using the loc option in the legend function, as shown in the following code: legend(['b={:5.2f}'.format(b1),        'b={:5.2f}'.format(b2),        'b={:5.2f}'.format(b3)], loc='upper left') Running this code, we obtain the final version of our logistic growth plot, as follows: The legend location can be any of the strings: 'best', 'upper right', 'upper left', 'lower left', 'lower right', 'right', 'center left', 'center right', 'lower center', 'upper center', and 'center'. It is also possible to specify the location of the legend precisely with the bbox_to_anchor option. To see how this works, modify the code for the legend as follows: legend(['b={:5.2f}'.format(b1),        'b={:5.2f}'.format(b2),        'b={:5.2f}'.format(b3)], bbox_to_anchor=(0.9,0.35)) Notice that the bbox_to_anchor option, by default, uses a coordinate system that is not the same as the one we specified for the plot. The x and y coordinates of the box in the preceding example are interpreted as a fraction of the width and height, respectively, of the whole figure. A little trial-and-error is necessary to place the legend box precisely where we want it. Note that the legend box can be placed outside the plot area. For example, try the coordinates (1.32,1.02). The legend() function is quite flexible and has quite a few other options that are documented at http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.legend. Text and annotations In this subsection, we will show how to add annotations to plots in matplotlib. We will build a plot demonstrating the fact that the tangent to a curve must be horizontal at the highest and lowest points. We start by defining the function associated with the curve and the set of values at which we want the curve to be plotted, which is shown in the following code: f = lambda x: (x**3 - 6*x**2 + 9*x + 3) / (1 + 0.25*x**2) xvalues = linspace(0, 5, 200) The first line in the preceding code uses a lambda expression to define the f() function. We use this approach here because the formula for the function is a simple, one-line expression. The general form of a lambda expression is as follows: lambda <arguments> : <return expression> This expression by itself creates an anonymous function that can be used in any place that a function object is expected. Note that the return value must be a single expression and cannot contain any statements. The formula for the function may seem unusual, but it was chosen by trial-and-error and a little bit of calculus so that it produces a nice graph in the interval from 0 to 5. The xvalues array is defined to contain 200 equally spaced points on this interval. Let's create an initial plot of our curve, as shown in the following code: plot(xvalues, f(xvalues), lw=2, color='FireBrick') axis([0, 5, -1, 8]) grid() xlabel('$x$') ylabel('$f(x)$') title('Extreme values of a function') None # Prevent text output Most of the code in this segment is explained in the previous section. The only new bit is that we use the grid() function to draw a grid. Used with no arguments, the grid coincides with the tick marks on the plot. As everything else in matplotlib, grids are highly customizable. Check the documentation at http://matplotlib.org/1.3.1/api/pyplot_api.html#matplotlib.pyplot.grid. When the preceding code is executed, the following plot is produced: Note that the curve has a highest point (maximum) and a lowest point (minimum). These are collectively called the extreme values of the function (on the displayed interval, this function actually grows without bounds as x becomes large). We would like to locate these on the plot with annotations. We will first store the relevant points as follows: x_min = 3.213 f_min = f(x_min) x_max = 0.698 f_max = f(x_max) p_min = array([x_min, f_min]) p_max = array([x_max, f_max]) print p_min print p_max The variables, x_min and f_min, are defined to be (approximately) the coordinates of the lowest point in the graph. Analogously, x_max and f_max represent the highest point. Don't be concerned with how these points were found. For the purposes of graphing, even a rough approximation by trial-and-error would suffice. Now, add the following code to the cell that draws the plot, right below the title() command, as shown in the following code: arrow_props = dict(facecolor='DimGray', width=3, shrink=0.05,              headwidth=7) delta = array([0.1, 0.1]) offset = array([1.0, .85]) annotate('Maximum', xy=p_max+delta, xytext=p_max+offset,          arrowprops=arrow_props, verticalalignment='bottom',          horizontalalignment='left', fontsize=13) annotate('Minimum', xy=p_min-delta, xytext=p_min-offset,          arrowprops=arrow_props, verticalalignment='top',          horizontalalignment='right', fontsize=13) Run the cell to produce the plot shown in the following diagram: In the code, start by assigning the variables arrow_props, delta, and offset, which will be used to set the arguments in the calls to annotate(). The annotate() function adds a textual annotation to the graph with an optional arrow indicating the point being annotated. The first argument of the function is the text of the annotation. The next two arguments give the locations of the arrow and the text: xy: This is the point being annotated and will correspond to the tip of the arrow. We want this to be the maximum/minimum points, p_min and p_max, but we add/subtract the delta vector so that the tip is a bit removed from the actual point. xytext: This is the point where the text will be placed as well as the base of the arrow. We specify this as offsets from p_min and p_max using the offset vector. All other arguments of annotate() are formatting options: arrowprops: This is a Python dictionary containing the arrow properties. We predefine the dictionary, arrow_props, and use it here. Arrows can be quite sophisticated in matplotlib, and you are directed to the documentation for details. verticalalignment and horizontalalignment: These specify how the arrow should be aligned with the text. fontsize: This signifies the size of the text. Text is also highly configurable, and the reader is directed to the documentation for details. The annotate() function has a huge number of options; for complete details of what is available, users should consult the documentation at http://matplotlib.org/1.3.1/api/pyplot_api.html#matplotlib.pyplot.annotate for the full details. We now want to add a comment for what is being demonstrated by the plot by adding an explanatory textbox. Add the following code to the cell right after the calls to annotate(): bbox_props = dict(boxstyle='round', lw=2, fc='Beige') text(2, 6, 'Maximum and minimum pointsnhave horizontal tangents',      bbox=bbox_props, fontsize=12, verticalalignment='top') The text()function is used to place text at an arbitrary position of the plot. The first two arguments are the position of the textbox, and the third argument is a string containing the text to be displayed. Notice the use of 'n' to indicate a line break. The other arguments are configuration options. The bbox argument is a dictionary with the options for the box. If omitted, the text will be displayed without any surrounding box. In the example code, the box is a rectangle with rounded corners, with a border width of 2 pixels and the face color, beige. As a final detail, let's add the tangent lines at the extreme points. Add the following code: plot([x_min-0.75, x_min+0.75], [f_min, f_min],      color='RoyalBlue', lw=3) plot([x_max-0.75, x_max+0.75], [f_max, f_max],      color='RoyalBlue', lw=3) Since the tangents are segments of straight lines, we simply give the coordinates of the endpoints. The reason to add the code for the tangents at the top of the cell is that this causes them to be plotted first so that the graph of the function is drawn at the top of the tangents. This is the final result: The examples we have seen so far only scratch the surface of what is possible with matplotlib. The reader should read the matplotlib documentation for more examples. Summary In this article, we learned how to use matplotlib to produce presentation-quality plots. We covered two-dimensional plots and how to set plot options, and annotate and configure plots. You also learned how to add labels, titles, and legends. Edited on July 27, 2018 to replace a broken reference link. Resources for Article: Further resources on this subject: Installing NumPy, SciPy, matplotlib, and IPython [Article] SciPy for Computational Geometry [Article] Fast Array Operations with NumPy [Article]
Read more
  • 0
  • 0
  • 10842

article-image-dart-javascript
Packt
18 Nov 2014
12 min read
Save for later

Dart with JavaScript

Packt
18 Nov 2014
12 min read
In this article by Sergey Akopkokhyants, author of Mastering Dart, we will combine the simplicity of jQuery and the power of Dart in a real example. (For more resources related to this topic, see here.) Integrating Dart with jQuery For demonstration purposes, we have created the js_proxy package to help the Dart code to communicate with jQuery. It is available on the pub manager at https://pub.dartlang.org/packages/js_proxy. This package is layered on dart:js and has a library of the same name and sole class JProxy. An instance of the JProxy class can be created via the generative constructor where we can specify the optional reference on the proxied JsObject: JProxy([this._object]); We can create an instance of JProxy with a named constructor and provide the name of the JavaScript object accessible through the dart:js context as follows: JProxy.fromContext(String name) { _object = js.context[name]; } The JProxy instance keeps the reference on the proxied JsObject class and makes all the manipulation on it, as shown in the following code: js.JsObject _object;    js.JsObject get object => _object; How to create a shortcut to jQuery? We can use JProxy to create a reference to jQuery via the context from the dart:js library as follows: var jquery = new JProxy.fromContext('jQuery'); Another very popular way is to use the dollar sign as a shortcut to the jQuery variable as shown in the following code: var $ = new JProxy.fromContext('jQuery'); Bear in mind that the original jQuery and $ variables from JavaScript are functions, so our variables reference to the JsFunction class. From now, jQuery lovers who moved to Dart have a chance to use both the syntax to work with selectors via parentheses. Why JProxy needs a method call? Usually, jQuery send a request to select HTML elements based on IDs, classes, types, attributes, and values of their attributes or their combination, and then performs some action on the results. We can use the basic syntax to pass the search criteria in the jQuery or $ function to select the HTML elements: $(selector) Dart has syntactic sugar method call that helps us to emulate a function and we can use the call method in the jQuery syntax. Dart knows nothing about the number of arguments passing through the function, so we use the fixed number of optional arguments in the call method. Through this method, we invoke the proxied function (because jquery and $ are functions) and returns results within JProxy: dynamic call([arg0 = null, arg1 = null, arg2 = null,    arg3 = null, arg4 = null, arg5 = null, arg6 = null,    arg7 = null, arg8 = null, arg9 = null]) { var args = []; if (arg0 != null) args.add(arg0); if (arg1 != null) args.add(arg1); if (arg2 != null) args.add(arg2); if (arg3 != null) args.add(arg3); if (arg4 != null) args.add(arg4); if (arg5 != null) args.add(arg5); if (arg6 != null) args.add(arg6); if (arg7 != null) args.add(arg7); if (arg8 != null) args.add(arg8); if (arg9 != null) args.add(arg9); return _proxify((_object as js.JsFunction).apply(args)); } How JProxy invokes jQuery? The JProxy class is a proxy to other classes, so it marks with the @proxy annotation. We override noSuchMethod intentionally to call the proxied methods and properties of jQuery when the methods or properties of the proxy are invoked. The logic flow in noSuchMethod is pretty straightforward. It invokes callMethod of the proxied JsObject when we invoke the method on proxy, or returns a value of property of the proxied object if we call the corresponding operation on proxy. The code is as follows: @override dynamic noSuchMethod(Invocation invocation) { if (invocation.isMethod) {    return _proxify(_object.callMethod(      symbolAsString(invocation.memberName),      _jsify(invocation.positionalArguments))); } else if (invocation.isGetter) {    return      _proxify(_object[symbolAsString(invocation.memberName)]); } else if (invocation.isSetter) {    throw new Exception('The setter feature was not implemented      yet.'); } return super.noSuchMethod(invocation); } As you might remember, all map or Iterable arguments must be converted to JsObject with the help of the jsify method. In our case, we call the _jsify method to check and convert passed arguments aligned with a called function, as shown in the following code: List _jsify(List params) { List res = []; params.forEach((item) {    if (item is Map || item is List) {      res.add(new js.JsObject.jsify(item));    } else {      res.add(item);    } }); return res; } Before return, the result must be passed through the _proxify function as follows: dynamic _proxify(value) {    return value is js.JsObject ? new JProxy(value) : value; } This function wraps all JsObject within a JProxy class and passes other values as it is. An example project Now create the jquery project, open the pubspec.yaml file, and add js_proxy to the dependencies. Open the jquery.html file and make the following changes: <!DOCTYPE html> <html> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>jQuery</title> <link rel="stylesheet" href="jquery.css"> </head> <body> <h1>Jquery</h1> <p>I'm a paragraph</p> <p>Click on me to hide</p> <button>Click me</button> <div class="container"> <div class="box"></div> </div> </body> <script src="//code.jquery.com/jquery-1.11.0.min.js"></script> <script type="application/dart" src="jquery.dart"></script> <script src="packages/browser/dart.js"></script> </html> This project aims to demonstrate that: Communication is easy between Dart and JavaScript The syntax of the Dart code could be similar to the jQuery code In general, you may copy the JavaScript code, paste it in the Dart code, and probably make slightly small changes. How to get the jQuery version? It's time to add js_proxy in our code. Open jquery.dart and make the following changes: import 'dart:html'; import 'package:js_proxy/js_proxy.dart'; /** * Shortcut for jQuery. */ var $ = new JProxy.fromContext('jQuery'); /** * Shortcut for browser console object. */ var console = window.console; main() { printVersion(); } /** * jQuery code: * *   var ver = $().jquery; *   console.log("jQuery version is " + ver); * * JS_Proxy based analog: */ printVersion() { var ver = $().jquery; console.log("jQuery version is " + ver); } You should be familiar with jQuery and console shortcuts yet. The call to jQuery with empty parentheses returns JProxy and contains JsObject with reference to jQuery from JavaScript. The jQuery object has a jQuery property that contains the current version number, so we reach this one via noSuchMethod of JProxy. Run the application, and you will see the following result in the console: jQuery version is 1.11.1 Let's move on and perform some actions on the selected HTML elements. How to perform actions in jQuery? The syntax of jQuery is based on selecting the HTML elements and it also performs some actions on them: $(selector).action(); Let's select a button on the HTML page and fire the click event as shown in the following code: /** * jQuery code: * *   $("button").click(function(){ *     alert('You click on button'); *   }); * * JS_Proxy based analog: */ events() { // We remove 'function' and add 'event' here $("button").click((event) {    // Call method 'alert' of 'window'    window.alert('You click on button'); }); } All we need to do here is just remove the function keyword, because anonymous functions on Dart do not use it and add the event parameter. This is because this argument is required in the Dart version of the event listener. The code calls jQuery to find all the HTML button elements to add the click event listener to each of them. So when we click on any button, a specified alert message will be displayed. On running the application, you will see the following message: How to use effects in jQuery? The jQuery supports animation out of the box, so it sounds very tempting to use it from Dart. Let's take an example of the following code snippet: /** * jQuery code: * *   $("p").click(function() { *     this.hide("slow",function(){ *       alert("The paragraph is now hidden"); *     }); *   }); *   $(".box").click(function(){ *     var box = this; *     startAnimation(); *     function startAnimation(){ *       box.animate({height:300},"slow"); *       box.animate({width:300},"slow"); *       box.css("background-color","blue"); *       box.animate({height:100},"slow"); *       box.animate({width:100},"slow",startAnimation); *     } *   }); * * JS_Proxy based analog: */ effects() { $("p").click((event) {    $(event['target']).hide("slow",(){      window.alert("The paragraph is now hidden");    }); }); $(".box").click((event) {    var box = $(event['target']);    startAnimation() {      box.animate({'height':300},"slow");      box.animate({'width':300},"slow");      box.css("background-color","blue");      box.animate({'height':100},"slow");      box.animate({'width':100},"slow",startAnimation);    };    startAnimation(); }); } This code finds all the paragraphs on the web page to add a click event listener to each one. The JavaScript code uses the this keyword as a reference to the selected paragraph to start the hiding animation. The this keyword has a different notion on JavaScript and Dart, so we cannot use it directly in anonymous functions on Dart. The target property of event keeps the reference to the clicked element and presents JsObject in Dart. We wrap the clicked element to return a JProxy instance and use it to call the hide method. The jQuery is big enough and we have no space in this article to discover all its features, but you can find more examples at https://github.com/akserg/js_proxy. What are the performance impacts? Now, we should talk about the performance impacts of using different approaches across several modern web browsers. The algorithm must perform all the following actions: It should create 10000 DIV elements Each element should be added into the same DIV container Each element should be updated with one style All elements must be removed one by one This algorithm must be implemented in the following solutions: The clear jQuery solution on JavaScript The jQuery solution calling via JProxy and dart:js from Dart The clear Dart solution based on dart:html We implemented this algorithm on all of them, so we have a chance to compare the results and choose the champion. The following HTML code has three buttons to run independent tests, three paragraph elements to show the results of the tests, and one DIV element used as a container. The code is as follows: <div>  <button id="run_js" onclick="run_js_test()">Run JS</button> <button id="run_jproxy">Run JProxy</button> <button id="run_dart">Run Dart</button> </div>   <p id="result_js"></p> <p id="result_jproxy"></p> <p id="result_dart"></p> <div id="container"></div> The JavaScript code based on jQuery is as follows: function run_js_test() { var startTime = new Date(); process_js(); var diff = new Date(new Date().getTime() –    startTime.getTime()).getTime(); $('#result_js').text('jQuery tooks ' + diff +    ' ms to process 10000 HTML elements.'); }     function process_js() { var container = $('#container'); // Create 10000 DIV elements for (var i = 0; i < 10000; i++) {    $('<div>Test</div>').appendTo(container); } // Find and update classes of all DIV elements $('#container > div').css("color","red"); // Remove all DIV elements $('#container > div').remove(); } The main code registers the click event listeners and the call function run_dart_js_test. The first parameter of this function must be investigated. The second and third parameters are used to pass the selector of the result element and test the title: void main() { querySelector('#run_jproxy').onClick.listen((event) {    run_dart_js_test(process_jproxy, '#result_jproxy', 'JProxy'); }); querySelector('#run_dart').onClick.listen((event) {    run_dart_js_test(process_dart, '#result_dart', 'Dart'); }); } run_dart_js_test(Function fun, String el, String title) { var startTime = new DateTime.now(); fun(); var diff = new DateTime.now().difference(startTime); querySelector(el).text = '$title tooks ${diff.inMilliseconds} ms to process 10000 HTML elements.'; } Here is the Dart solution based on JProxy and dart:js: process_jproxy() { var container = $('#container'); // Create 10000 DIV elements for (var i = 0; i < 10000; i++) {    $('<div>Test</div>').appendTo(container.object); } // Find and update classes of all DIV elements $('#container > div').css("color","red"); // Remove all DIV elements $('#container > div').remove(); } Finally, a clear Dart solution based on dart:html is as follows: process_dart() { // Create 10000 DIV elements var container = querySelector('#container'); for (var i = 0; i < 10000; i++) {    container.appendHtml('<div>Test</div>'); } // Find and update classes of all DIV elements querySelectorAll('#container > div').forEach((Element el) {    el.style.color = 'red'; }); // Remove all DIV elements querySelectorAll('#container > div').forEach((Element el) {    el.remove(); }); } All the results are in milliseconds. Run the application and wait until the web page is fully loaded. Run each test by clicking on the appropriate button. My result of the tests on Dartium, Chrome, Firefox, and Internet Explorer are shown in the following table: Web browser jQuery framework jQuery via JProxy Library dart:html Dartium 2173 3156 714 Chrome 2935 6512 795 Firefox 2485 5787 582 Internet Explorer 12262 17748 2956 Now, we have the absolute champion—the Dart-based solution. Even the Dart code compiled in the JavaScript code to be executed in Chrome, Firefox, and Internet Explorer works quicker than jQuery (four to five times) and much quicker than dart:js and JProxy class-based solution (four to ten times). Summary This article showed you how to use Dart and JavaScript together to build web applications. It listed problems and solutions you can use to communicate between Dart and JavaScript and the existing JavaScript program. We compared jQuery, JProxy, and dart:js and cleared the Dart code based on the dart:html solutions to identify who is quicker than others. Resources for Article: Further resources on this subject: Handling the DOM in Dart [article] Dart Server with Dartling and MongoDB [article] Handle Web Applications [article]
Read more
  • 0
  • 0
  • 6627
article-image-searching-and-resolving-conflicts
Packt
18 Nov 2014
11 min read
Save for later

Searching and Resolving Conflicts

Packt
18 Nov 2014
11 min read
This article, by Eric Pidoux, author of Git Best Practices Guide, covers a part of Git that you will definitely meet: conflicts. How can we resolve them? (For more resources related to this topic, see here.) While working together as a team on a project, you will work on the same files. The pull command won't work because there are conflicts, and you might have tried some Git commands and things got bad. In this chapter, we will find solutions to these conflicts and see how we can fix them. We will cover the following topics: Finding content inside your Git repository Stashing your changes Fixing errors by practical examples Finding content inside your repository Sometimes, you will need to find something inside all your files. You can, of course, find it with the search feature of your OS, but Git already knows all your files. Searching file content To search text inside your files, simply use the following command: Erik@server:~$ git grep "Something to find" Erik@server:~$ git grep -n body Master:Website.Index.html:4:       <bodyMaster:Website.Index.html:12:       </body> It will display every match to the given keyword inside your code. All lines use the [commitref]:[filepath]:[linenumber]:[matchingcontent] pattern. Notice that [commitref] isn't displayed on all Git versions. You can also specify the commit references that grep will use to search the keyword: Erik@server:~$ git grep -n body d32lf56 p88e03d HEAD~3 Master:Website.Index.html:4:       <body> Master:Website.Index.html:12:       </body> In this case, grep will look into the d32lf56, p88e03d, and third commit starting by the head pointer. Your repository has to be encoded in UTF-8; otherwise, the grep command won't work. Git allows you to use regex inside the search feature by replacing somethingToFind with a regex. You can use the logical operators (or and and), as shown in the following command: Erik@server:~$ git grep -e myRegex1 --or -e myRegex2 Erik@server:~$ git grep -e myRegex1 --and -e myRegex2 Let's see this with an example. We only have a test.html page inside our last commit, and we want to find whether or not there is a word with an uppercase alphabetic value and numeric values: Erik@server:~$ git grep -e [A-Z] --and -e [0-9] HEAD Master:Website.Test.html:6:       TEST01 With the grep command, you can delve deeper, but it's not necessary to discuss this topic here because you won't use it every day! Showing the current status The git status command is helpful if you have to analyze your repository: Erik@server:~$ git status # On branch master # Your branch is ahead of 'origin/master' by 2 commits # (use "git push" to publish your local commits) # Changes not staged for commit: #   (use "git add<file>..." to update what will be committed) #   (use "git checkout -- <file>..." to discard changes in working directory) # # modified:   myFile1 # modified:   myFile2 # # Untracked files: #   (use "git add<file>..." to include in what will be committed) # # newFile.txt no changes added to commit (use "git add" and/or "git commit -a") Git analyzes the local repository in comparison to the remote repository. In this case, you have to add newFile.txt, commit myFile1 and myFile2, and push them to the remote repository. Exploring the repository history The best way to explore the past commits inside your repository is to use the git log command. For this part, we will assume that there are only two commits. To display all commits, use the following commands: Erik@server:~$ git log --all Commit xxxxxxxxxxx Author: Jim <jim@mail.com> Date: Sun Jul 20 15:10:12 2014 -0300 Fix front bugs on banner   Commit xxxxxxxxxxx Author: Erik <erik@mail.com> Date: Sat Jul 19 07:06:14 2014 -0300 Add the crop feature on website backend This is probably not what you want. After several days of work, you will have plenty of these commits, so how will you filter it? The power of the git log command is that you can quickly find anything in all commits. Let's go for a quick overview of what Git is able to find. We will start by finding the last commit: Erik@server:~$ git log -1 Commit xxxxxxxxxxx Author: Jim <jim@mail.com> Date: Sun Jul 20 15:10:12 2014 -0300 Fix front bugs on banner The number after the git log command indicates that it is the first commit from Head. Too easy! Let's try to find what the last commit of Erik is: Erik@server:~$ git log --author=Erik -1 Commit xxxxxxxxxxx Author: Erik <erik@mail.com> Date: Sat Jul 19 07:06:14 2014 -0300 Add the crop feature on website backend Now, let's find it between two dates: Erik@server:~$ git log --author=Erik --before "2014-07-20" --after "2014-07-18" Commit xxxxxxxxxxx Author: Erik <erik@mail.com> Date: Sat Jul 19 07:06:14 2014 -0300 Add the crop feature on website backend As I told you earlier, there are a lot of parameters to the git log command. You can see all of them using the git help log command. The stat parameter is really useful: Erik@server:~$ git log --author=Jim --stat Commit xxxxxxxxxxx Author: Jim <jim@mail.com> Date: Sun Jul 20 15:10:12 2014 -0300 Fix front bugs on banner   index.php | 1 +      1 file changed, 1 insertion(+) This parameter allows you to view a summary of the changes made in each commit. If you want to see the full changes, try the -p parameter. Remember that the git log command has a file parameter to restrict the search to the git log [file] file. Viewing changes There are two ways to see changes in a repository: git diff and git show. The git diff command lets you see the changes that are not committed. For example, we have an index.phpfile and replace the file content by a line. Just before the lines, you will see a plus (+) or minus (-) sign. The + sign means that content was added and the – sign denotes that it was removed: Erik@server:~$ git diff diff --git a/index.php b/index.php indexb4d22ea..748ebb2 100644 --- a/index.php +++ b/index.php @@ -1,11 +1 @@ -<html> - -<head> -<title>Git is great!</title> -</head> -<body> -<?php - echo 'Git is great'; -?> -</body> -</html> +<b> I added a line</b> If you want to analyze a commit, I suggest you to use the git show command. It will display the full list of changes of the commit: Erik@server:~$ git show commitId There is a way to do the opposite, that is, to display commits for a file with git blame: Erik@server:~$ git blameindex.php e4bac680 (Erik 2014-07-20 19:00:47 +0200 1) <b> I added a line</b> Cleaning your mistakes The first thing to know is that you can always clean your mistake with Git. Sometimes this will be hard or painful for your code, but you can do it! Let's start this section with how to remove untracked files: Erik@server:~$ git clean -n The –n option will make a dry-run (it's always important to see what will happen before you regret it). If you want to also remove directories and hidden files, use this one: Erik@server:~$ git clean -fdx With these options, you will delete new directories (-d) and hidden files (-x) and be able to force them (-f). The git reset command The git reset command will allow you to go back to a previous state (for example, commit). The git reset command has three options (soft, hard, or mixed, by default). In general, the git reset command's aim is to take the current branch, reset it to point somewhere else, and possibly bring the index and work tree along. More concretely, if the master branch (currently checked out) looks like the first row (in the following figure) and you want it to point to B and not C, you will use this command: Erik@server:~$ git reset B The following diagram shows exactly what happened with the previous command. The HEAD pointer was reset from C to B: The following table explains what the options really move: Option Head pointer Working tree Staging area Soft Yes No No Mixed Yes No Yes Hard Yes Yes Yes The three options that you can provide on the reset command can be easily explained: --hard: This option is the simplest. It will restore the content to the given commit. All the local changes will be erased. The git reset --hard command means git reset --hard HEAD, which will reset your files to the previous version and erase your local changes. --mixed: This option resets the index, but not the work tree. It will reset your local files, but the differences found during the process will be marked as local modifications if you analyze them using git status. It's very helpful if you make some bugs on previous commits and want to keep your local changes. --soft: This option will keep all your files, such as mixed, intact. If you use git status, it will appear as changes to commit. You can use this option when you have not committed files as expected, but your work is correct. So you just have to recommit it the way you want. The git reset command doesn't remove untracked files; use git clean instead. Canceling a commit The git revert command allows you to "cancel" your last unpushed commit. I used quotes around cancel because Git doesn't drop the commit; it creates a new commit that executes the opposite of your commit. A pushed commit is irreversible, so you cannot change it. Firstly, let's have a look at the last commits: Erik@server:~$ git log commite4bac680c5818c70ced1205cfc46545d48ae687e Author: Eric Pidoux Date:   Sun Jul 20 19:00:47 2014 +0200 replace all commit0335a5f13b937e8367eff35d78c259cf2c4d10f7 Author: Eric Pidoux Date:   Sun Jul 20 18:23:06 2014 +0200 commitindex.php We want to cancel the 0335… commit: Erik@server:~$ git revert 0335a5f13 Canceling this commit isn't necessary to enter the full commit ID, but just the first characters. Git will find it, but you will have to enter at least six characters to be sure that there isn't another commit that starts with the same characters. Solving merge conflicts When you are working with several branches, a conflict will probably occur while merging them. It appears if two commits from different branches modify the same content and Git isn't able to merge them. If it occurs, Git will mark the conflict and you have to resolve it. For example, Jim modified the index.html file on a feature branch and Erik has to edit it on another branch. When Erik merges the two branches, the conflict occurs. Git will tell you to edit the file to resolve the conflict. In this file, you will find the following: <<<<<<< HEAD Changes from Erik ======= Changes from Jim >>>>>>> b2919weg63bfd125627gre1911c8b08127c85f8 The <<<<<<< characters indicate the start of the merge conflict, the ====== characters indicate the break points used for comparison, and >>>>>>> indicate the end of the conflict. To resolve a conflict, you have to analyze the differences between the two changes and merge them manually. Don't forget to delete the signs added by Git. After resolving it, simply commit the changes. If your merge conflict is too complicated to resolve because you can't easily find the differences, Git provides a useful tool to help you. Git's diff helps you to find differences: Diff --git erik/mergetestjim/mergetest Index.html 88h3d45..92f62w 130634 --- erik/mergetest +++ jim/mergetest @@ -1,3 +1,4 @@ <body> +I added this code between This is the file content -I added a third line of code +And this is the last one So, what happened? The command displays some lines with the changes, with the + mark coming from origin/master; those marked with – are from your local repository, and of course, the lines without a mark are common to both repositories. Summary In this article, we covered all tips and commands that are useful to fix mistakes, resolve conflicts, search inside the commit history, and so on. Resources for Article: Further resources on this subject: Configuration [Article] Parallel Dimensions – Branching with Git [Article] Issues and Wikis in GitLab [Article]
Read more
  • 0
  • 0
  • 1544

article-image-building-beowulf-cluster
Packt
17 Nov 2014
19 min read
Save for later

Building a Beowulf Cluster

Packt
17 Nov 2014
19 min read
A Beowulf cluster is nothing more than a bunch of computers interconnected by Ethernet and running with a Linux or BSD operating system. A key feature is the communication over IP (Internet Protocol) that distributes problems among the boards. The entity of the boards or computers is called a cluster and each board or computer is called a node. In this article, written by Andreas Joseph Reichel, the author of Building a BeagleBone Black Super Cluster, we will first see what is really required for each board to run inside a cluster environment. You will see examples of how to build a cheap and scalable cluster housing and how to modify an ATX power supply in order to use it as a power source. I will then explain the network interconnection of the Beowulf cluster and have a look at its network topology. The article concludes with an introduction to the microSD card usage for installation images and additional swap space as well as external network storage. The following topics will be covered: Describing the minimally required equipment Building a scalable housing Modifying an ATX power source Introducing the Beowulf network topology Managing microSD cards Using external network storage We will first start with a closer look at the utilization of a single BBB and explain the minimal hardware configuration required. (For more resources related to this topic, see here.) Minimal configuration and optional equipment BBB is a single-board computer that has all the components needed to run Linux distributions that support ARMhf platforms. Due to the very powerful network utilities that come with Linux operating systems, it is not necessary to install a mouse or keyboard. Even a monitor is not required in order to install and configure a new BBB. First, we will have a look at the minimal configuration required to use a single board over a network. Minimal configuration A very powerful interface of Linux operating systems is its standard support for SSH. SSH is the abbreviation of Secure Shell, and it enables users to establish an authenticated and encrypted network connection to a remote PC that provides a Shell. Its command line can then be utilized to make use of the PC without any local monitor or keyboard. SSH is the secure replacement for the telnet service. The following diagram shows you the typical configuration of a local area network using SSH for the remote control of a BBB board: The minimal configuration for the SSH control SSH is a key feature of Linux and comes preinstalled on most distributions. If you use Microsoft ® Windows™ as your host operating system, you will require additional software such as putty, which is an SSH client that is available at http://www.putty.org. On Linux and Mac OS, there is usually an SSH client already installed, which can be started using the ssh command. Using a USB keyboard It is practical for several boards to be configured using the same network computer and an SSH client. However, if a system does not boot up, it can be hard for a beginner to figure out the reason. If you get stuck with such a problem and don't find a solution using SSH, or the SSH login is not possible for some reason anymore, it might be helpful to use a local keyboard and a local monitor to control the problematic board such as a usual PC. Installing a keyboard is possible with the onboard USB host port. A very practical way is to use a wireless keyboard and mouse combination. In this case, you only need to plug the wireless control adapter into the USB host port. Using the HDMI adapter and monitor The BBB board supports high definition graphics and, therefore, uses a mini HDMI port for the video output. In order to use a monitor, you need an adapter for mini HDMI to HDMI, DVI, or VGA, respectively. Building a scalable board-mounting system The following image shows you the finished board housing with its key components as well as some installed BBBs. Here, a indicates the threaded rod with the straw as the spacer, b indicates BeagleBone Black, c indicates the Ethernet cable, d indicates 3.5" hard disc cooling fans, e indicates the 5 V power cable, and f indicates the plate with drilled holes. The finished casing with installed BBBs One of the most important things that you have to consider before building a super computer is the space you require. It is not only important to provide stable and practical housing for some BBB boards, but also to keep in mind that you might want to upgrade the system to more boards in the future. This means that you require a scalable system that is easy to upgrade. Also, you need to keep in mind that every single board requires its own power and has to be accessible by hand (reset, boot-selection, and the power button as well as the memory card, and so on). The networking cables also need some place depending on their lengths. There are also flat Ethernet cables that need less space. The tidier the system is built, the easier it will be to track down errors or exchange faulty boards, cables, or memory cards. However, there is a more important point. Although the BBB boards are very power-efficient, they get quite warm depending on their utilization. If you have 20 boards stacked onto each other and do not provide sufficient space for air flow, your system will overheat and suffer from data loss or malfunctions. Insufficient air flow can result in the burning of devices and other permanent hardware damage. Please remember that I'm is not liable for any damages resulting from an insufficient cooling system. Depending on your taste, you can spend a lot of money on your server housing and put some lights inside and make it glow like a Christmas tree. However, I will show you very cheap housing, which is easy and fast to build and still robust enough, scalable, and practical to use. Board-holding rods The key idea of my board installation is to use the existing corner holes of the BBB boards and attach the boards on four rods in order to build a horizontal stack. This stack is then held by two side plates and a base plate. Usually, when I experiment and want to build a prototype, it is helpful not to predefine every single measurement, and then invest money into the sawing and cutting of these parts. Instead, I look around in some hobby markets and see what they have and think about whether I can use these parts. However, drilling some holes is not unavoidable. When you get to drilling holes and using screws and threads, you might know or not know that there are two different systems. One is the metric system and the other is the English system. The BBB board has four holes and their size fits to 1/8" in the English or M3 in the metric system. According to the international standard, this article will only name metric dimensions. For easy and quick installation of the boards, I used four M3 threaded rods that are obtainable at model making or hobby shops. I got mine at Conrad Electronic. For the base plates, I went to a local raw material store. The following diagram shows you the mounting hole positions for the side walls with the dimensions of BBB (dashed line). The measurements are given for the English and metric system. The mounting hole's positions Board spacers As mentioned earlier, it is important to leave enough space between the boards in order to provide finger access to the buttons and, of course, for airflow. First, I mounted each board with eight nuts. However, when you have 16 boards installed and want to uninstall the eighth board from the left, then it will take you a lot of time and nerves to get the nuts along the threaded rods. A simple solution with enough stability is to use short parts of straws. You can buy some thick drinking straws and cut them into equally long parts, each of two or three centimeters in length. Then, you can put them between the boards onto the threaded rods in order to use them as spacers. Of course, this is not the most stable way, but it is sufficient, cheap, and widely available. Cooling system One nice possibility I found for cooling the system is to use hard disk fans. They are not so cheap but I had some lying around for years. Usually, they are mounted to the lower side of 3.5" hard discs, and their width is approximately the length of one BBB. So, they are suitable for the base plate of our casing and can provide enough air flow to cool the whole system. I installed two with two fans each for eight boards and a third one for future upgrades. The following image shows you my system with eight boards installed: A board housing with BBBs and cooling system Once you have built housing with a cooling system, you can install your boards. The next step will be the connection of each board to a power source as well as the network interconnection. Both are described in the following sections. Using a low-cost power source I have seen a picture on the Web where somebody powered a dozen older Beagle Boards with a lot of single DC adapters and built everything into a portable case. The result was a huge mess of cables. You should always try to keep your cables well organized in order to save space and improve the cooling performance. Using an ATX power supply with a cable tree can save you a lot of money compared to buying several standalone power supplies. They are stable and can also provide some protection for hardware, which cheap DC adapters don't always do. In the following section, I will explain the power requirements and how to modify an ATX power supply to fit our needs. Power requirements If you do not use an additional keyboard and mouse and only onboard flash memory, one board needs around 500 mA at 5 V voltage, which gives you a total power of 2.5 Watts for one board. Depending on the installed memory card or other additional hardware, you might need more. Please note that using the Linux distribution described in this article is not compatible with the USB-client port power supply. You have to use the 5 V power jack. Power cables If you want to use an ATX power supply, then you need to build an adapter from the standard PATA or SATA power plugs to a low voltage plug that fits the 5 V jack of the board. You need a 5.5/2.1 mm low voltage plug and they are obtainable from VOLTCRAFT with cables already attached. I got mine from Conrad Electronics (item number 710344). Once you have got your power cables, you can build a small distribution box. Modifying the ATX power supply ATX power supplies are widely available and power-efficient. They cost around 60 dollars, providing more than 500 Watts of output power. For our purpose, we will only need most power on the 5 V rail and some for fans on the 12 V rail. It is not difficult to modify an ATX supply. The trick is to provide the soft-on signal, because ATX supplies are turned on from the mainboard via a soft-on signal on the green wire. If this green wire is connected to the ground, it turns on. If the connection is lost, it turns off. The following image shows you which wires of the ATX mainboard plug have to be cut and attached to a manual switch in order to build a manual on/off switch: The ATX power plug with a green and black wire, as indicated by the red circle, cut and soldered to a switch As we are using the 5 V and most probably the 12 V rail (for the cooling fans) of the power supply, it is not necessary to add resistors. If the output voltage of the supply is far too low, this means that not enough current is flowing for its internal regulation circuitry. If this happens, you can just add a 1/4 Watt 200 Ohms resistor between any +5 V (red) and GND (neighboring black) pin to drain a current of 25 mA. This should never happen when driving the BBB boards, as their power requirements are much higher and the supply should regulate well. The following image shows you the power cable distribution box. I soldered the power cables together with the cut ends of a PATA connector to a PCB board. The power cable distribution box What could happen is that the resistance of one PATA wire is too high and the voltage drop leads to a supply voltage of below 4.5 Volts. If that happens, some of the BBB boards will not power up. Either you need to retry booting these boards separately by their power button later when all others are booted up, or you need to use two PATA wires instead of one to decrease the resistance. Please have a look if this is possible with your power supply and if the two 5 V lines you want to connect do not belong to different regulation circuitries. Setting up the network backbone To interconnect BBB boards via Ethernet, we need a switch or a hub. There is a difference in the functionality between a switch and a hub: With hubs, computers can communicate with each other. Every computer is connected to the hub with a separate Ethernet cable. The hub is nothing more than a multiport repeater. This means that it just repeats all the information it receives for all other ports, and every connected PC has to decide whether the data is for it or not. This produces a lot of network traffic and can slow down the speed. Switches in comparison can control the flow of network traffic based on the address information in each packet. It learns which traffic packets are received by which PC and then forwards them only to the proper port. This allows simultaneous communication across the switch and improves the bandwidth. This is the reason why switches are the preferred choice of network interconnection for our BBB Beowulf cluster. The following table summarizes the main differences between a hub and a switch:   hub Switch Traffic control no yes Bandwidth low high I bought a 24-port Ethernet switch on eBay with 100 Megabit/s ports. This is enough for the BBB boards. The total bandwidth of the switch is 2.4 Gigabit/s. The network topology The typical network topology is a star configuration. This means that every BBB board has its own connection to the switch, and the switch itself is connected to the local area network (LAN). On most Beowulf clusters, there is one special board called the master node. This master node is used to provide the bridge between the cluster and the rest of the LAN. All users (if there are more persons that use the cluster) log in to the master node, and it is only responsible for user management and starting the correct programs on specified nodes. It usually doesn't contribute to any calculation tasks. However, as BBB only has one network connector, it is not possible to use it as a bridge, because a bridge requires two network ports: One connected to the LAN. The other connected to the switch of the cluster. Because of this, we only define one node as the master node, providing some special software features but also contributing to the calculations of the cluster. This way, all BBBs contribute to the overall calculation power, and we do not need any special hardware to build a network bridge. Regarding security, we can manage everything with SSH login rules and the kernel firewall, if required. The following diagram shows you the network topology used in this article. Every BBB has its own IP address, and you have to reserve the required amount of IP addresses in your LAN. They do not have to be successive; however, it makes it easier if you note down every IP for every board. You can give the boards hostnames such as node1, node2, node3, and so on to make them easier to follow. The network topology The RJ45 network cables There is only one thing you have to keep in mind regarding RJ45 Ethernet cables and 100 Megabit/s transmission speed. There are crossover cables and normal ones. The crossover cables have crossed lines regarding data transmission and receiving. This means that one cable can be used to connect two PCs without a hub or switch. Most modern switches can detect when data packets collide, which means when they are received on the transmitting ports and then automatically switch over the lines again. This feature is called auto MDI-X or auto-uplink. If you have a newer switch, you don't need to pay attention to which sort of cable you buy. Usually, normal RJ45 cables without crossover are the preferred choice. The Ethernet multiport switch As described earlier, we use an Ethernet switch rather than a hub. When buying a switch, you have to decide how many ports you want. For future upgrades, you can also buy an 8-port switch, for example, and later, if you want to go from seven boards (one port for the uplink) to 14 boards, you can upgrade with a second 8-port switch and connect both to the LAN. If you want to build a big system from the beginning, you might want to buy a 24-port switch or an even bigger one. The following image shows you my 24-port Ethernet switch with some connected RJ45 cables below the board housing: A 24-port cluster switch The storage memory One thing you might want to think of in the beginning is the amount of space you require for applications and data. The standard version of BBB has 2 GB flash memory onboard and newer ones have 4 GB. A critical feature of computational nodes is the amount of RAM they have installed. On BBB, this is only 512 MB. If you are of the opinion that this is not enough for your tasks, then you can extend the RAM by installing Linux on an external SD card and create a swap partition on it. However, you have to keep in mind that the external swap space is much slower than the DDR3 memory (MB/s compared to GB/s). If the software is nicely programmed, data can always be sufficiently distributed on the nodes, and each node does not need much RAM. However, with more complicated libraries and tasks, you might want to upgrade some day. Installing images on microSD cards For the installation of Linux, we will need Linux Root File System Images on the microSD card. It is always a good idea to keep these cards for future repair or extension purposes. I keep one installation SD for the master node and one for all slave nodes. When I upgrade the system to more slave nodes, I can just insert the installation SD and easily incorporate the new system with a few commands. The swap space on an SD card Usually, it should be possible to boot Linux from the internal memory and utilize the external microSD card solely as the swap space. However, I had problems utilizing the additional space as it was not properly recognized by the system. I obtained best results when booting from the same card I want the swap partition on. The external network storage To reduce the size of the used software, it is always a good idea to compile it dynamically. Each node you want to use for computations has to start the same program. Programs have to be accessible by each node, which means that you would have to install every program on every node. This is a lot of work when adding additional boards and is a waste of memory in general. To circumvent this problem, I use external network storage on the basis of Samba. The master node can then access the Samba share and create a share for all the client nodes by itself. This way, each node has access to the same software and data, and upgrades can be performed easily. Also, the need for local storage memory is reduced. Important libraries that have to be present on each local filesystem can be introduced by hard links pointing to the network storage location. The following image shows you the storage system of my BBB cluster: The storage topology Some of you might worry when I chose Samba over NFS and think that a 100 Megabit networking is too slow for cluster computations. First of all, I chose Samba because I was used to it and it is well known to most hobbyists. It is very easy to install, and I have used it for over 10 years. Only thing you have to keep in mind is that using Samba will cause your filesystem to treat capital and small letters equally. So, your Linux filenames (ext2, ext3, ext4, and so on) will behave like FAT/NTFS filenames. Regarding the network bandwidth, a double value will require 8 bytes of memory and thus, you can transfer a maximum of 2.4 billion double values per second on a hub with 24 ports and 100 Megabit/s. Additionally, libraries are optimized to keep the network talk as low as possible and solve as much as possible on the local CPU memory system. Thus, for most applications, the construction as described earlier will be sufficient. Summary In this article, you were introduced to the whole cluster concept regarding its hardware and interconnection. You were shown a working system configuration using only the minimally required amount of equipment and also some optional possibilities. A description of very basic housing including a cooling system was given as an example for a cheap yet nicely scalable possibility to mount the boards. You also learned how to build a cost-efficient power supply using a widely available ATX supply, and you were shown how to modify it to power several BBBs. Finally, you were introduced to the network topology and the purpose of network switches. A short description about the used storage system ended this article. If you interconnect everything as described in this article, it means that you have created the hardware basis of a super computer cluster. Resources for Article: Further resources on this subject: Protecting GPG Keys in BeagleBone [article] Making the Unit Very Mobile - Controlling Legged Movement [article] Pulse width modulator [article]
Read more
  • 0
  • 0
  • 11210
Modal Close icon
Modal Close icon