Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7018 Articles
article-image-learning-how-classify-real-world-examples
Packt
22 Aug 2013
24 min read
Save for later

Learning How to Classify with Real-world Examples

Packt
22 Aug 2013
24 min read
(For more resources related to this topic, see here.) The Iris dataset The Iris dataset is a classic dataset from the 1930s; it is one of the first modern examples of statistical classification. The setting is that of Iris flowers, of which there are multiple species that can be identified by their morphology. Today, the species would be defined by their genomic signatures, but in the 1930s, DNA had not even been identified as the carrier of genetic information. The following four attributes of each plant were measured: Sepal length Sepal width Petal length Petal width In general, we will call any measurement from our data as features. Additionally, for each plant, the species was recorded. The question now is: if we saw a new flower out in the field, could we make a good prediction about its species from its measurements? This is the supervised learning or classification problem; given labeled examples, we can design a rule that will eventually be applied to other examples. This is the same setting that is used for spam classification; given the examples of spam and ham (non-spam e-mail) that the user gave the system, can we determine whether a new, incoming message is spam or not? For the moment, the Iris dataset serves our purposes well. It is small (150 examples, 4 features each) and can easily be visualized and manipulated. The first step is visualization Because this dataset is so small, we can easily plot all of the points and all two-dimensional projections on a page. We will thus build intuitions that can then be extended to datasets with many more dimensions and datapoints. Each subplot in the following screenshot shows all the points projected into two of the dimensions. The outlying group (triangles) are the Iris Setosa plants, while Iris Versicolor plants are in the center (circle) and Iris Virginica are indicated with "x" marks. We can see that there are two large groups: one is of Iris Setosa and another is a mixture of Iris Versicolor and Iris Virginica. We are using Matplotlib; it is the most well-known plotting package for Python. We present the code to generate the top-left plot. The code for the other plots is similar to the following code: from matplotlib import pyplot as plt from sklearn.datasets import load_iris import numpy as np # We load the data with load_iris from sklearn data = load_iris() features = data['data'] feature_names = data['feature_names'] target = data['target'] for t,marker,c in zip(xrange(3),">ox","rgb"): # We plot each class on its own to get different colored markers plt.scatter(features[target == t,0], features[target == t,1], marker=marker, c=c) Building our first classification model If the goal is to separate the three types of flower, we can immediately make a few suggestions. For example, the petal length seems to be able to separate Iris Setosa from the other two flower species on its own. We can write a little bit of code to discover where the cutoff is as follows: plength = features[:, 2] # use numpy operations to get setosa features is_setosa = (labels == 'setosa') # This is the important step: max_setosa =plength[is_setosa].max() min_non_setosa = plength[~is_setosa].min() print('Maximum of setosa: {0}.'.format(max_setosa)) print('Minimum of others: {0}.'.format(min_non_setosa)) This prints 1.9 and 3.0. Therefore, we can build a simple model: if the petal length is smaller than two, this is an Iris Setosa flower; otherwise, it is either Iris Virginica or Iris Versicolor. if features[:,2] < 2: print 'Iris Setosa' else: print 'Iris Virginica or Iris Versicolour' This is our first model, and it works very well in that it separates the Iris Setosa flowers from the other two species without making any mistakes. What we had here was a simple structure; a simple threshold on one of the dimensions. Then we searched for the best dimension threshold. We performed this visually and with some calculation; machine learning happens when we write code to perform this for us. The example where we distinguished Iris Setosa from the other two species was very easy. However, we cannot immediately see what the best threshold is for distinguishing Iris Virginica from Iris Versicolor. We can even see that we will never achieve perfect separation. We can, however, try to do it the best possible way. For this, we will perform a little computation. We first select only the non-Setosa features and labels: features = features[~is_setosa] labels = labels[~is_setosa] virginica = (labels == 'virginica') Here we are heavily using NumPy operations on the arrays. is_setosa is a Boolean array, and we use it to select a subset of the other two arrays, features and labels. Finally, we build a new Boolean array, virginica, using an equality comparison on labels. Now, we run a loop over all possible features and thresholds to see which one results in better accuracy. Accuracy is simply the fraction of examples that the model classifies correctly: best_acc = -1.0 for fi in xrange(features.shape[1]): # We are going to generate all possible threshold for this feature thresh = features[:,fi].copy() thresh.sort() # Now test all thresholds: for t in thresh: pred = (features[:,fi] > t) acc = (pred == virginica).mean() if acc > best_acc: best_acc = acc best_fi = fi best_t = t The last few lines select the best model. First we compare the predictions, pred, with the actual labels, virginica. The little trick of computing the mean of the comparisons gives us the fraction of correct results, the accuracy. At the end of the for loop, all possible thresholds for all possible features have been tested, and the best_fi and best_t variables hold our model. To apply it to a new example, we perform the following: if example[best_fi] > t: print 'virginica' else: print 'versicolor' What does this model look like? If we run it on the whole data, the best model that we get is split on the petal length. We can visualize the decision boundary. In the following screenshot, we see two regions: one is white and the other is shaded in grey. Anything that falls in the white region will be called Iris Virginica and anything that falls on the shaded side will be classified as Iris Versicolor: In a threshold model, the decision boundary will always be a line that is parallel to one of the axes. The plot in the preceding screenshot shows the decision boundary and the two regions where the points are classified as either white or grey. It also shows (as a dashed line) an alternative threshold that will achieve exactly the same accuracy. Our method chose the first threshold, but that was an arbitrary choice. Evaluation – holding out data and cross-validation The model discussed in the preceding section is a simple model; it achieves 94 percent accuracy on its training data. However, this evaluation may be overly optimistic. We used the data to define what the threshold would be, and then we used the same data to evaluate the model. Of course, the model will perform better than anything else we have tried on this dataset. The logic is circular. What we really want to do is estimate the ability of the model to generalize to new instances. We should measure its performance in instances that the algorithm has not seen at training. Therefore, we are going to do a more rigorous evaluation and use held-out data. For this, we are going to break up the data into two blocks: on one block, we'll train the model, and on the other—the one we held out of training—we'll test it. The output is as follows: Training error was 96.0%. Testing error was 90.0% (N = 50). The result of the testing data is lower than that of the training error. This may surprise an inexperienced machine learner, but it is expected and typical. To see why, look back at the plot that showed the decision boundary. See if some of the examples close to the boundary were not there or if one of the ones in between the two lines was missing. It is easy to imagine that the boundary would then move a little bit to the right or to the left so as to put them on the "wrong" side of the border. The error on the training data is called a training error and is always an overly optimistic estimate of how well your algorithm is doing. We should always measure and report the testing error; the error on a collection of examples that were not used for training. These concepts will become more and more important as the models become more complex. In this example, the difference between the two errors is not very large. When using a complex model, it is possible to get 100 percent accuracy in training and do no better than random guessing on testing! One possible problem with what we did previously, which was to hold off data from training, is that we only used part of the data (in this case, we used half of it) for training. On the other hand, if we use too little data for testing, the error estimation is performed on a very small number of examples. Ideally, we would like to use all of the data for training and all of the data for testing as well. We can achieve something quite similar by cross-validation. One extreme (but sometimes useful) form of cross-validation is leave-one-out. We will take an example out of the training data, learn a model without this example, and then see if the model classifies this example correctly: error = 0.0 for ei in range(len(features)): # select all but the one at position 'ei': training = np.ones(len(features), bool) training[ei] = False testing = ~training model = learn_model(features[training], virginica[training]) predictions = apply_model(features[testing], virginica[testing], model) error += np.sum(predictions != virginica[testing]) At the end of this loop, we will have tested a series of models on all the examples. However, there is no circularity problem because each example was tested on a model that was built without taking the model into account. Therefore, the overall estimate is a reliable estimate of how well the models would generalize. The major problem with leave-one-out cross-validation is that we are now being forced to perform 100 times more work. In fact, we must learn a whole new model for each and every example, and this will grow as our dataset grows. We can get most of the benefits of leave-one-out at a fraction of the cost by using x-fold cross-validation; here, "x" stands for a small number, say, five. In order to perform five-fold cross-validation, we break up the data in five groups, that is, five folds. Then we learn five models, leaving one fold out of each. The resulting code will be similar to the code given earlier in this section, but here we leave 20 percent of the data out instead of just one element. We test each of these models on the left out fold and average the results: The preceding figure illustrates this process for five blocks; the dataset is split into five pieces. Then for each fold, you hold out one of the blocks for testing and train on the other four. You can use any number of folds you wish. Five or ten fold is typical; it corresponds to training with 80 or 90 percent of your data and should already be close to what you would get from using all the data. In an extreme case, if you have as many folds as datapoints, you can simply perform leave-one-out cross-validation. When generating the folds, you need to be careful to keep them balanced. For example, if all of the examples in one fold come from the same class, the results will not be representative. We will not go into the details of how to do this because the machine learning packages will handle it for you. We have now generated several models instead of just one. So, what final model do we return and use for the new data? The simplest solution is now to use a single overall model on all your training data. The cross-validation loop gives you an estimate of how well this model should generalize. A cross-validation schedule allows you to use all your data to estimate if your methods are doing well. At the end of the cross-validation loop, you can use all your data to train a final model. Although it was not properly recognized when machine learning was starting out, nowadays it is seen as a very bad sign to even discuss the training error of a classification system. This is because the results can be very misleading. We always want to measure and compare either the error on a held-out dataset or the error estimated using a cross-validation schedule. Building more complex classifiers In the previous section, we used a very simple model: a threshold on one of the dimensions. Throughout this article, you will see many other types of models, and we're not even going to cover everything that is out there. What makes up a classification model? We can break it up into three parts: The structure of the model: In this, we use a threshold on a single feature. The search procedure: In this, we try every possible combination of feature and threshold. The loss function: Using the loss function, we decide which of the possibilities is less bad (because we can rarely talk about the perfect solution). We can use the training error or just define this point the other way around and say that we want the best accuracy. Traditionally, people want the loss function to be minimum. We can play around with these parts to get different results. For example, we can attempt to build a threshold that achieves minimal training error, but we will only test three values for each feature: the mean value of the features, the mean plus one standard deviation, and the mean minus one standard deviation. This could make sense if testing each value was very costly in terms of computer time (or if we had millions and millions of datapoints). Then the exhaustive search we used would be infeasible, and we would have to perform an approximation like this. Alternatively, we might have different loss functions. It might be that one type of error is much more costly than another. In a medical setting, false negatives and false positives are not equivalent. A false negative (when the result of a test comes back negative, but that is false) might lead to the patient not receiving treatment for a serious disease. A false positive (when the test comes back positive even though the patient does not actually have that disease) might lead to additional tests for confirmation purposes or unnecessary treatment (which can still have costs, including side effects from the treatment). Therefore, depending on the exact setting, different trade-offs can make sense. At one extreme, if the disease is fatal and treatment is cheap with very few negative side effects, you want to minimize the false negatives as much as you can. With spam filtering, we may face the same problem; incorrectly deleting a non-spam e-mail can be very dangerous for the user, while letting a spam e-mail through is just a minor annoyance. What the cost function should be is always dependent on the exact problem you are working on. When we present a general-purpose algorithm, we often focus on minimizing the number of mistakes (achieving the highest accuracy). However, if some mistakes are more costly than others, it might be better to accept a lower overall accuracy to minimize overall costs. Finally, we can also have other classification structures. A simple threshold rule is very limiting and will only work in the very simplest cases, such as with the Iris dataset. A more complex dataset and a more complex classifier We will now look at a slightly more complex dataset. This will motivate the introduction of a new classification algorithm and a few other ideas. Learning about the Seeds dataset We will now look at another agricultural dataset; it is still small, but now too big to comfortably plot exhaustively as we did with Iris. This is a dataset of the measurements of wheat seeds. Seven features are present, as follows: Area (A) Perimeter (P) Compactness () Length of kernel Width of kernel Asymmetry coefficient Length of kernel groove There are three classes that correspond to three wheat varieties: Canadian, Koma, and Rosa. As before, the goal is to be able to classify the species based on these morphological measurements. Unlike the Iris dataset, which was collected in the 1930s, this is a very recent dataset, and its features were automatically computed from digital images. This is how image pattern recognition can be implemented: you can take images in digital form, compute a few relevant features from them, and use a generic classification system. Later, we will work through the computer vision side of this problem and compute features in images. For the moment, we will work with the features that are given to us. UCI Machine Learning Dataset Repository The University of California at Irvine (UCI) maintains an online repository of machine learning datasets (at the time of writing, they are listing 233 datasets). Both the Iris and Seeds dataset used in this article were taken from there. The repository is available online: http://archive.ics.uci.edu/ml/ Features and feature engineering One interesting aspect of these features is that the compactness feature is not actually a new measurement, but a function of the previous two features, area and perimeter. It is often very useful to derive new combined features. This is a general area normally termed feature engineering; it is sometimes seen as less glamorous than algorithms, but it may matter more for performance (a simple algorithm on well-chosen features will perform better than a fancy algorithm on not-so-good features). In this case, the original researchers computed the "compactness", which is a typical feature for shapes (also called "roundness"). This feature will have the same value for two kernels, one of which is twice as big as the other one, but with the same shape. However, it will have different values for kernels that are very round (when the feature is close to one) as compared to kernels that are elongated (when the feature is close to zero). The goals of a good feature are to simultaneously vary with what matters and be invariant with what does not. For example, compactness does not vary with size but varies with the shape. In practice, it might be hard to achieve both objectives perfectly, but we want to approximate this ideal. You will need to use background knowledge to intuit which will be good features. Fortunately, for many problem domains, there is already a vast literature of possible features and feature types that you can build upon. For images, all of the previously mentioned features are typical, and computer vision libraries will compute them for you. In text-based problems too, there are standard solutions that you can mix and match. Often though, you can use your knowledge of the specific problem to design a specific feature. Even before you have data, you must decide which data is worthwhile to collect. Then, you need to hand all your features to the machine to evaluate and compute the best classifier. A natural question is whether or not we can select good features automatically. This problem is known as feature selection. There are many methods that have been proposed for this problem, but in practice, very simple ideas work best. It does not make sense to use feature selection in these small problems, but if you had thousands of features, throwing out most of them might make the rest of the process much faster. Nearest neighbor classification With this dataset, even if we just try to separate two classes using the previous method, we do not get very good results. Let me introduce, therefore, a new classifier: the nearest neighbor classifier. If we consider that each example is represented by its features (in mathematical terms, as a point in N-dimensional space), we can compute the distance between examples. We can choose different ways of computing the distance, for example: def distance(p0, p1): 'Computes squared euclidean distance' return np.sum( (p0-p1)**2) Now when classifying, we adopt a simple rule: given a new example, we look at the dataset for the point that is closest to it (its nearest neighbor) and look at its label: def nn_classify(training_set, training_labels, new_example): dists = np.array([distance(t, new_example) for t in training_set]) nearest = dists.argmin() return training_labels[nearest] In this case, our model involves saving all of the training data and labels and computing everything at classification time. A better implementation would be to actually index these at learning time to speed up classification, but this implementation is a complex algorithm. Now, note that this model performs perfectly on its training data! For each point, its closest neighbor is itself, and so its label matches perfectly (unless two examples have exactly the same features but different labels, which can happen). Therefore, it is essential to test using a cross-validation protocol. Using ten folds for cross-validation for this dataset with this algorithm, we obtain 88 percent accuracy. As we discussed in the earlier section, the cross-validation accuracy is lower than the training accuracy, but this is a more credible estimate of the performance of the model. We will now examine the decision boundary. For this, we will be forced to simplify and look at only two dimensions (just so that we can plot it on paper). In the preceding screenshot, the Canadian examples are shown as diamonds, Kama seeds as circles, and Rosa seeds as triangles. Their respective areas are shown as white, black, and grey. You might be wondering why the regions are so horizontal, almost weirdly so. The problem is that the x axis (area) ranges from 10 to 22 while the y axis (compactness) ranges from 0.75 to 1.0. This means that a small change in x is actually much larger than a small change in y. So, when we compute the distance according to the preceding function, we are, for the most part, only taking the x axis into account. If you have a physics background, you might have already noticed that we had been summing up lengths, areas, and dimensionless quantities, mixing up our units (which is something you never want to do in a physical system). We need to normalize all of the features to a common scale. There are many solutions to this problem; a simple one is to normalize to Z-scores. The Z-score of a value is how far away from the mean it is in terms of units of standard deviation. It comes down to this simple pair of operations: # subtract the mean for each feature: features -= features.mean(axis=0) # divide each feature by its standard deviation features /= features.std(axis=0) Independent of what the original values were, after Z-scoring, a value of zero is the mean and positive values are above the mean and negative values are below it. Now every feature is in the same unit (technically, every feature is now dimensionless; it has no units) and we can mix dimensions more confidently. In fact, if we now run our nearest neighbor classifier, we obtain 94 percent accuracy! Look at the decision space again in two dimensions; it looks as shown in the following screenshot: The boundaries are now much more complex and there is interaction between the two dimensions. In the full dataset, everything is happening in a seven-dimensional space that is very hard to visualize, but the same principle applies: where before a few dimensions were dominant, now they are all given the same importance. The nearest neighbor classifier is simple, but sometimes good enough. We can generalize it to a k-nearest neighbor classifier by considering not just the closest point but the k closest points. All k neighbors vote to select the label. k is typically a small number, such as 5, but can be larger, particularly if the dataset is very large. Binary and multiclass classification The first classifier we saw, the threshold classifier, was a simple binary classifier (the result is either one class or the other as a point is either above the threshold or it is not). The second classifier we used, the nearest neighbor classifier, was a naturally multiclass classifier (the output can be one of several classes). It is often simpler to define a simple binary method than one that works on multiclass problems. However, we can reduce the multiclass problem to a series of binary decisions. This is what we did earlier in the Iris dataset in a haphazard way; we observed that it was easy to separate one of the initial classes and focused on the other two, reducing the problem to two binary decisions: Is it an Iris Setosa (yes or no)? If no, check whether it is an Iris Virginica (yes or no). Of course, we want to leave this sort of reasoning to the computer. As usual, there are several solutions to this multiclass reduction. The simplest is to use a series of "one classifier versus the rest of the classifiers". For each possible label l, we build a classifier of the type "is this l or something else?". When applying the rule, exactly one of the classifiers would say "yes" and we would have our solution. Unfortunately, this does not always happen, so we have to decide how to deal with either multiple positive answers or no positive answers. Alternatively, we can build a classification tree. Split the possible labels in two and build a classifier that asks "should this example go to the left or the right bin?" We can perform this splitting recursively until we obtain a single label. The preceding diagram depicts the tree of reasoning for the Iris dataset. Each diamond is a single binary classifier. It is easy to imagine we could make this tree larger and encompass more decisions. This means that any classifier that can be used for binary classification can also be adapted to handle any number of classes in a simple way. There are many other possible ways of turning a binary method into a multiclass one. There is no single method that is clearly better in all cases. However, which one you use normally does not make much of a difference to the final result. Most classifiers are binary systems while many real-life problems are naturally multiclass. Several simple protocols reduce a multiclass problem to a series of binary decisions and allow us to apply the binary models to our multiclass problem. Summary In a sense, this was a very theoretical article, as we introduced generic concepts with simple examples. We went over a few operations with a classic dataset. This, by now, is considered a very small problem. However, it has the advantage that we were able to plot it out and see what we were doing in detail. This is something that will be lost when we move on to problems with many dimensions and many thousands of examples. The intuitions we gained here will all still be valid. Classification means generalizing from examples to build a model (that is, a rule that can automatically be applied to new, unclassified objects). It is one of the fundamental tools in machine learning. We also learned that the training error is a misleading, over-optimistic estimate of how well the model does. We must, instead, evaluate it on testing data that was not used for training. In order to not waste too many examples in testing, a cross-validation schedule can get us the best of both worlds (at the cost of more computation). We also had a look at the problem of feature engineering. Features are not something that is predefined for you, but choosing and designing features is an integral part of designing a machine-learning pipeline. In fact, it is often the area where you can get the most improvements in accuracy as better data beats fancier methods. In this article, we wrote all of our own code (except when we used NumPy, of course). We needed to build up intuitions on simple cases to illustrate the basic concepts. Resources for Article: Further resources on this subject: Python Testing: Installing the Robot Framework [Article] Getting Started with Spring Python [Article] Creating Skeleton Apps with Coily in Spring Python [Article]
Read more
  • 0
  • 0
  • 3477

article-image-what-hazelcast
Packt
22 Aug 2013
10 min read
Save for later

What is Hazelcast?

Packt
22 Aug 2013
10 min read
(For more resources related to this topic, see here.) Starting out as usual In most modern software systems, data is the key. For more traditional architectures, the role of persisting and providing access to your system's data tends to fall to a relational database. Typically this is a monolithic beast, perhaps with a degree of replication, although this tends to be more for resilience rather than performance. For example, here is what a traditional architecture might look like (which hopefully looks rather familiar) This presents us with an issue in terms of application scalability, in that it is relatively easy to scale our application layer by throwing more hardware at it to increase the processing capacity. But the monolithic constraints of our data layer would only allow us to do this so far before diminishing returns or resource saturation stunted further performance increases; so what can we do to address this? In the past and in legacy architectures, the only solution would be to increase the performance capability of our database infrastructure, potentially by buying a bigger, faster server or by further tweaking and fettling the utilization of currently available resources. Both options are dramatic, either in terms of financial cost and/or manpower; so what else could we do? Data deciding to hang around In order for us to gain a bit more performance out of our existing setup, we can hold copies of our data away from the primary database and use these in preference wherever possible. There are a number of different strategies we could adopt, from transparent second-level caching layers to external key-value object storage. The detail and exact use of each varies significantly depending on the technology or its place in the architecture, but the main desire of these systems is to sit alongside the primary database infrastructure and attempt to protect it from an excessive load. This would then tend to lead to an increased performance of the primary database by reducing the overall dependency on it. However, this strategy tends to be only particularly valuable as a short-term solution, effectively buying us a little more time before the database once again starts to reach saturation. The other downside is that it only protects our database from read-based load; if our application is predominately write-heavy, this strategy has very little to offer. So our expanded architecture could look a bit like the following figure: Therein lies the problem However, in insulating the database from the read load, we have introduced a problem in the form of a cache consistency issue, in that, how does our local data cache deal with changing data underneath it within the primary database? The answer is rather depressing: it can't! The exact manifestation of any issues will largely depend on the data needs of the application and how frequently the data changes; but typically, caching systems will operate in one of the two following modes to combat the problem: Time bound cache: Holds entries for a defined period (time-to-live or TTL) Write through cache: Holds entries until they are invalidated by subsequent updates Time bound caches almost always have consistency issues, but at least the amount of time that the issue would be present is limited to the expiry time of each entry. However, we must consider the application's access to this data, because if the frequency of accessing a particular entry is less than the cache expiry time of it, the cache is providing no real benefit. Write through caches are consistent in isolation and can be configured to offer strict consistency, but if multiple write through caches exist within the overall architecture, then there will be consistency issues between them. We can avoid this by having a more intelligent cache, which features a communication mechanism between nodes, that can propagate entry invalidations to each other. In practice, an ideal cache would feature a combination of both features, so that entries would be held for a known maximum time, but also passes around invalidations as changes are made. So our evolved architecture would look a bit like the following figure: So far we've had a look through the general issues in scaling our data layer, and introduced strategies to help combat the trade-offs we will encounter along the way; however, the real world isn't quite as simple. There are various cache servers and in-memory database products in this area: however, most of these are stand-alone single instances, perhaps with some degree of distribution bolted on or provided by other supporting technologies. This tends to bring about the same issues we experienced with just our primary database, in that we could encounter resource saturation or capacity issues if the product is a single instance, or if the distribution doesn't provide consistency control, perhaps inconsistent data, which might harm our application. Breaking the mould Hazelcast is a radical new approach to data, designed from the ground up around distribution. It embraces a new scalable way of thinking; in that data should be shared around for both resilience and performance, while allowing us to configure the trade-offs surrounding consistency as the data requirements dictate. The first major feature to understand about Hazelcast is its master less nature; each node is configured to be functionally the same. The oldest node in the cluster is the de facto leader and manages the membership, automatically delegating as to which node is responsible for what data. In this way as new nodes join or dropout, the process is repeated and the cluster rebalances accordingly. This makes Hazelcast incredibly simple to get up and running, as the system is self-discovering, self-clustering, and works straight out of the box. However, the second feature to remember is that we are persisting data entirely in-memory; this makes it incredibly fast but this speed comes at a price. When a node is shutdown, all the dta that was held on it is lost. We combat this risk to resilience through replication, by holding enough copies of a piece of data across multiple nodes. In the event of failure, the overall cluster will not suffer any data loss. By default, the standard backup count is 1, so we can immediately enjoy basic resilience. But don't pull the plug on more than one node at a time, until the cluster has reacted to the change in membership and reestablished the appropriate number of backup copies of data. So when we introduce our new master less distributed cluster, we get something like the following figure: We previously identified that multi-node caches tend to suffer from either saturation or consistency issues. In the case of Hazelcast, each node is the owner of a number of partitions of the overall data, so the load will be fairly spread across the cluster. Hence, any saturation would be at the cluster level rather than any individual node. We can address this issue simply by adding more nodes. In terms of consistency, by default the backup copies of the data are internal to Hazelcast and not directly used, as such we enjoy strict consistency. This does mean that we have to interact with a specific node to retrieve or update a particular piece of data; however, exactly which node that is an internal operational detail and can vary over time — we as developers never actually need to know. If we imagine that our data is split into a number of partitions, that each partition slice is owned by one node and backed up on another, we could then visualize the interactions like the following figure: This means that for data belonging to Partition 1, our application will have to communicate to Node 1, Node 2 for data belonging to Partition 2, and so on. The slicing of the data into each partition is dynamic; so in practice, where there are more partitions than nodes, each node will own a number of different partitions and hold backups for others. As we have mentioned before, all of this is an internal operational detail, and our application does not need to know it, but it is important that we understand what is going on behind the scenes. Moving to new ground So far we have been talking mostly about simple persisted data and caches, but in reality, we should not think of Hazelcast as purely a cache, as it is much more powerful than just that. It is an in-memory data grid that supports a number of distributed collections and features. We can load in data from various sources into differing structures, send messages across the cluster; take out locks to guard against concurrent activity, and listen to the goings on inside the workings of the cluster. Most of these implementations correspond to a standard Java collection, or function in a manner comparable to other similar technologies, but all with the distribution and resilience capabilities already built in. Standard utility collections Map: Key-value pairs List: Collection of objects? Set: Non-duplicated collection Queue: Offer/poll FIFO collection Specialized collection Multi-Map: Key-list of values collection Lock: Cluster wide mutex Topic: Publish/subscribe messaging Concurrency utilities AtomicNumber: Cluster-wide atomic counter IdGenerator: Cluster-wide unique identifier generation Semaphore: Concurrency limitation CountdownLatch: Concurrent activity gate-keeping Listeners: Application notifications as things happen In addition to data storage collections, Hazelcast also features a distributed executor service allowing runnable tasks to be created that can be run anywhere on the cluster to obtain, manipulate, and store results. We could have a number of collections containing source data, then spin up a number of tasks to process the disparate data (for example, averaging or aggregating) and outputting the results into another collection for consumption. Again, just as we could scale up our data capacities by adding more nodes, we can also increase the execution capacity in exactly the same way. This essentially means that by building our data layer around Hazelcast, if our application needs rapidly increase, we can continuously increase the number of nodes to satisfy seemingly extensive demands, all without having to redesign or re-architect the actual application. With Hazelcast, we are dealing more with a technology than a server product, a library to build a system around rather than retrospectively bolting it on, or blindly connecting to an off-the-shelf commercial system. While it is possible (and in some simple cases quite practical) to run Hazelcast as a separate server-like cluster and connect to it remotely from our application; some of the greatest benefits come when we develop our own classes and tasks run within it and alongside it. With such a large range of generic capabilities, there is an entire world of problems that Hazelcast can help solve. We can use the technology in many ways; in isolation to hold data such as user sessions, run it alongside a more long-term persistent data store to increase capacity, or shift towards performing high performance and scalable operations on our data. By moving more and more responsibility away from monolithic systems to such a generic scalable one, there is no limit to the performance we can unlock. This will allow us to keep our application and data layers separate, but enabling the ability to scale them up independently as our application grows. This will avoid our application becoming a victim of its own success, while hopefully taking the world by storm. Summary In this article, we learned about Hazelcast. With such a large range of generic capabilities, Hazelcast can solve a world of problems. Resources for Article: Further resources on this subject: JBoss AS Perspective [Article] Drools Integration Modules: Spring Framework and Apache Camel [Article] JBoss RichFaces 3.3 Supplemental Installation [Article]
Read more
  • 0
  • 0
  • 5980

Packt
22 Aug 2013
7 min read
Save for later

Pentaho – Using Formulas in Our Reports

Packt
22 Aug 2013
7 min read
(For more resources related to this topic, see here.) At the end of the article, we propose that you make some modifications to the report created in this article. Starting practice In this article, we will create a copy of the report, then we will do the necessary changes in its layout; the final result is as follows: As we can observe in the previous screenshot, the rectangle that is to the left of each title changes color. We'll see how to do this, and much more, shortly. Time for action – making a copy of the previous report In this article, we will use an already created report. To do so, we will open it and save it with the name 09_Using_Formulas.prpt. Then we will modify its layout to fit this article. Finally, we will establish default values for our parameters. The steps for making a copy of the previous report are as follows: We open the report 07_Adding_Parameters.prpt that we created. Next, we create a copy by going to File | Save As... and saving it with the name 09_Using_Formulas.prpt. We will modify our report so that it looks like the following screenshot: As you can see, we have just added a rectangle in the Details section, a label (Total) in the Details Header section, and we have modified the name of the label found in the Report Header section. To easily differentiate this report from the one used previously, we have also modified its colors to grayscale. Later in this article, we will make the color of the rectangle vary according to the formula, so itis important that the rest of the report does not have too many colors so the result are easy for the end user to see. We will establish default values in our parameters so we can preview the report without delays caused by having to choose the values for ratings, year, and month. We go to the Data tab, select the SelectRating parameter, right-click on it, and choose the Edit Parameter... option: In Default Value, we type the value [G]: Next, we click on OK to continue. We should do something similar for SelectYear and SelectMonth: For SelectYear, the Default Value will be 2005. For SelectMonth, the Default Value will be 5. Remember that the selector shows the names of the months, but internally the months' numbers are used; so, 5 represents May. What just happened? We created a copy of the report 07_Adding_Parameters.prpt and saved it with the name 09_Using_Formulas.prpt. We changed the layout of the report, adding new objects and changing the colors. Then we established default values for the parameters SelectRating, SelectYear, and SelectMonth. Formulas To manage formulas, PRD implements the open standard OpenFormula. According to OpenFormula's specifications: "OpenFormula is an open format for exchanging recalculated formulas between office application implementations, particularly for spreadsheets. OpenFormula defines the types, syntax, and semantics for calculated formulas, including many predefined functions and operations, so that formulas can be exchanged between applications and produce substantively equal outputs when recalculated with equal inputs. Both closed and open source software can implement OpenFormula." For more information on OpenFormula, refer to the following links: Wikipedia: http://en.wikipedia.org/wiki/OpenFormula Specifications: https://www.oasis-open.org/committees/download.php/16826/openformula-spec-20060221.html Web: http ://www.openformula.org/ Pentaho wiki: http://wiki.pentaho.com/display/Reporting/Formula+Expressions Formulas are used for greatly varied purposes, and their use depends on the result one wants to obtain. Formulas let us carry out simple and complex calculations based on fixed and variable values and include predefined functions that let us work with text, databases, date and time, let us make calculations, and also include general information functions and user-defined functions. They also use logical operators (AND, OR, and so on) and comparative operators (>, <, and so on). Creating formulas There are two ways to create formulas: By creating a new function and by going to Common | Open Formula By pressing the button in a section's / an object's Style or Attributes tab, or to configure some feature In the report we are creating in this article, we will create formulas using both methods. Using the first method, general-use formulas can be created. That is, the result will be an object that can either be included directly in our report or used as a value in another function, style, or attribute. We can create objects that make calculations at a general level to be included in sections that include Report Header, Group Footer, and so on, or we can make calculations to be included in the Details section. In this last case, the formula will make its calculation row by row. With this last example, we can make an important differentiation with respect to aggregate functions as they usually can only calculate totals and subtotals. Using the second method, we create specific-use functions that affect the value of the style or attribute of an individual object. The way to use these functions is simple. Just choose the value you want to modify in the Style and Attributes tabs and click on the button that appears on their right. In this way, you can create formulas that dynamically assign values to an object's color, position, width, length, format, visibility, and so on. Using this technique, stoplights can be created by assigning different values to an object according to a calculation, progress bars can be created by changing an object's length, and dynamic images can be placed in the report using the result of a formula to calculate the image's path. As we have seen in the examples, using formulas in our reports gives us great flexibility in applying styles and attributes to objects and to the report itself, as well as the possibility of creating our own objects based on complex calculations. By using formulas correctly, you will be able to give life to your reports and adapt them to changing contexts. For example, depending on which user executes the report, a certain image can appear in the Report Header section, or graphics and subreports can be hidden if the user does not have sufficient permissions. The formula editor The formula editor has a very intuitive and easy-to-use UI that in addition to guiding us in creating formulas, tells us, whenever possible, the value that the formula will return. In the following screenshot, you can see the formula editor: We will explain its layout with an example. Let's suppose that we added a new label and we want to create a formula that returns the value of Attributes.Value. For this purpose, we do the following: Select the option to the right of Attributes.Value. This will open the formula editor. In the upper-left corner, there is a selector where we can specify the category of functions that we want to see. Below this, we find a list of the functions that we can use to create our own formulas. In the lower-left section, we can see more information about the selected function; that is, the type of value that it will return and a general description: We choose the CONCATENATE function by double-clicking on it, and in the lower-right section, we can see the formula (Formula:) that we will use. We type in =CONCATENATE(Any), and an assistant will open in the upper-right section that will guide us in entering the values we want to concatenate. We could complete the CONCATENATE function by adding some fixed values and some variables; take the following example: If there is an error in the text of the formula, text will appear to warn us. Otherwise, the formula editor will try to show us the result that our formula will return. When it is not possible to visualize the result that a formula will return, this is usually because the values used are calculated during the execution of the report. Formulas should always begin with the = sign. Initially, one tends to use the help that the formula editor provides, but later, with more practice, it will become evident that it is much faster to type the formula directly. Also, if you need to enter complex formulas or add various functions with logical operators, the formula editor will not be of use.
Read more
  • 0
  • 0
  • 2977

article-image-planning-your-sprints-using-greenhopper
Packt
21 Aug 2013
9 min read
Save for later

Planning Your Sprints using GreenHopper

Packt
21 Aug 2013
9 min read
(For more resources related to this topic, see here.) Creating an Epic An Epic is a large functionality of a product which needs to be delivered and which can further be divided into user stories. An Epic can span over multiple Sprints, until it is all finished. As shown in the following screenshot, click on the + icon in the Epics panel to create an Epic: Create an Epic using the Epic issue type and enter the relevant details for your Epic. In the Epics panel, the Epic name you entered while creating the Epic is displayed along with issue details representing the Epic. The panel also displays total issues (Story, improvements, bugs, and so on) assigned to an Epic along with the total estimates (here, Story points in our case for the Epic). For Epics created with no Epic name or missing Epic name, unlabelled Epic text is displayed. Use drag-and-drop functionality on the Epics panel to rank the Epic within your backlog. Keep the high-priority Epic, which you will be working on first, on top. To edit an Epic name, click on Edit name, which allows inline editing, as shown in the following screenshot: You can also distinguish an Epic with specific colors, and the corresponding Epic name will always be highlighted with that color in view. Creating a Story A user Story in Scrum is a user/actor conversation/requirement or a small functionality which can easily be unit tested and delivered in a limited time of a single Sprint. The Story belongs to an Epic. To create a Story for an Epic, click on the create issue in epic link on Epics panel. As shown in the preceding screenshot, to create a Story, select the relevant issue type and enter Story details. The newly created Story will be listed under an Epic and will also be visible in the Plan mode. If you select an Epic in the Epics panel, all the issues related to that Epic will be displayed on backlog panel. If you select one of the newly created stories, the Story panel is displayed on the right-hand side. As shown in the preceding screenshot, the issue details panel is opened. You can perform all the relevant operations available to the issue on the same panel. The tag panel on the left-hand side allows you to do corresponding operations on the selected Story. The Story details panel helps you with inline editing to edit the Story and related details. Using the Actions panel, you can edit and perform multiple operations related to Story. Creating subtasks The technical tasks are deliverable tasks performed by the developers to deliver a Story in a Sprint. To add subtasks to a Story in the Plan mode, select a Story to add subtasks to, and the details panel of the Story will be visible. As shown in the preceding screenshot, click on the Create Sub-Task button on the issue details panel to add a subtask to an issue. The same panel also displays the list of existing subtasks for an issue. Based on time tracking enabled for the Jira system, you will be able to add hour estimations for the technical tasks. If time tracking is enabled, the Story details panel also displays the total efforts required for all the subtasks, as shown at the bottom in the previous screenshot. Ranking the backlog By now you have your backlog ready with most of the required Epics which are further divided into different user stories to be delivered. One of the important tasks in managing and grooming backlog is the ranking or ordering of different backlog items. From a business value perspective, not all functionality is of the same business value. Some functionalities are must have and some are good to have, having less business value. As stated in the earlier section, you can rank Epics by drag-and-drop in the Epics panel and Epics will be relatively ranked in the panel. You will be able to focus on the Epic in the backlog which you are currently working upon. To rank Story and other issue types in the backlog list, drag-and-drop vertically in the list based on the priority. As shown in the preceding screenshot, you should be able to drag-and-drop each issue to prioritize it relatively. You can select multiple items in the backlog list by using Ctrl + Click or Shift + Click to move in the list or also to assign to a Sprint. As shown in the preceding screenshot, you can do bulk operations on the selected issues. The following options are available: Send to: It is used to move multiple items to a selected Sprint during the Sprint planning event. Top of Backlog: It is used to bulk prioritize the selected items by moving to the top of the backlog with highest ranking. Bottom of Backlog: It is used to bulk prioritize the selected items by moving to the bottom of the backlog with lowest ranking. View in Issue Navigator: It is used to view the selected items in Jira issue navigator. Bulk Change: It's the functionality to bulk change the selected items, which can be editing issue details. You can also rank the technical tasks in the Work mode to move items based on priority, and teams can accordingly work on technical tasks based on pre-set priority. Creating a Sprint A Sprint in Scrum is an iteration to deliver a committed set of functionality for a product in a time box of nearly a month or less. We have our prioritized backlog ready with us in the proper prioritized order. The next step is to estimate and pick a set of backlog items from product backlog to deliver in a particular Sprint To create a Sprint, click on the Create Sprint button in the Plan mode under Backlog panel, as shown in the following screenshot: It will create a blank Sprint for you. Click on the Sprint name to edit it with inline editing as per your current Sprint number. To plan the Sprint, click on the date fields for inline editing to set start and end dates for a Sprint. To add Story items to a Sprint, drag stories based on preset ranking order of Story and drop it in the Sprint panel, the Sprint you are currently planning. You can also select multiple items from the backlog panel, right-click, and send those to the newly created Sprint. To start a Sprint, click on the Start Sprint link on the Sprint panel header in Plan mode, as shown in the following screenshot: If you haven't set the Sprint timelines yet, once you start a Sprint, the start date and end date for the Sprint will be asked for in a Sprint start popup. The Sprint start and end date values will be referenced as the Sprint timelines to generate different reports like Burndown chart in the report panel. Starting the Sprint will move you from the Plan mode to the Work mode in the current board. You can have only one Sprint as an active Sprint. For a single board, currently you can have only one Sprint as running or as an active Sprint. You can still create multiple Sprints in the Plan mode and those will still be inactive Sprints (you can't start working on those Sprints). One of the practical scenarios for different projects is running multiple teams, and some teams do work on the same project backlog but also create team backlog out of a big project backlog. To achieve this, you can use multiple ways to create a team backlog, as well as multiple boards for multiple Sprints to run multiple teams. For example, you can use Labels field, Component field, or Custom field to store team information. Take an example to run following teams, Orange, Green, and Blue. As shown in the preceding screenshot, you can create multiple team boards designated specifically for each team. For example, Team Blue Scrum Board, Team Green Scrum Board, and Team Orange Scrum Board are displayed in the screenshot. You will be able to start separate Sprints for each team working on team backlog which is part of the project backlog. You can update each backlog item to team backlog during Sprint planning only, so that same backlog item is not available to other teams. Different teams use different approaches to use Jira, and GreenHopper's customizable and flexible nature helps teams to achieve what suits best to their requirements. One additional feature of working with boards to access a GreenHopper view of items and a Jira view is the interchangeable nature. You can access your backlog items from boards directly in the issue navigator in Jira. There are multiple selection options available to switch to an issue navigator from the GreenHopper view. In a similar way, while browsing an issue in Jira, you can also switch to the GreenHopper view. As shown in the preceding screenshot, the Issues in Epic panel and Agile panel in Jira view are displayed for an Epic issue type. The issues in Epic panel lists down all the issues associated to the Epic in view. Agile panel option View on Board allows you to select a board that issue is listed in GreenHopper, and you can easily switch to your board view. Summary We created Epic and Story issues as part of the product backlog. We also covered creation of technical task items as part of Sprint planning meeting. We also moved the items in the Plan mode to rank the items based on business prioritization. Using Plan mode, the team created Sprint and committed on backlog items to be completed in a Sprint. The team started a Sprint and set the Sprint timelines to continue the Sprint. Resources for Article : Further resources on this subject: Advanced JIRA 5.2 Features [Article] Getting Started on UDK with iOS [Article] Mission Running in EVE Online [Article]
Read more
  • 0
  • 0
  • 1632

article-image-romeo-and-juliet
Packt
21 Aug 2013
10 min read
Save for later

Romeo and Juliet

Packt
21 Aug 2013
10 min read
(For more resources related to this topic, see here.) Mission Briefing To create the Processing sketches for this project, we will need to install the Processing library ttslib. This library is a wrapper around the FreeTTS Java library that helps us to write a sketch that reads out text. We will learn how to change the voice parameters of the kevin16 voice of the FreeTTS package to make our robot's voices distinguishable. We will also create a parser that is able to read the Shakespeare script and which generates text-line objects that allow our script to know which line is read by which robot. A Drama thread will be used to control the text-to-speech objects, and the draw() method of our sketch will print the script on the screen while our robots perform it, just in case one of them forgets a line. Finally, we will use some cardboard boxes and a pair of cheap speakers to create the robots and their stage. The following figure shows how the robots work: Why Is It Awesome? Since the 18th century, inventors have tried to build talking machines (with varying success). Talking toys swamped the market in the 1980s and 90s. In every decent Sci-Fi novel, computers and robots are capable of speaking. So how could building talking robots not be awesome? And what could be more appropriate to put these speaking capabilities to test than performing a Shakespeare play? So as you see, building actor robots is officially awesome, just in case your non-geek family members should ask. Your Hotshot Objectives We will split this project into four tasks that will guide you through the general on of the robots from beginning to end. Here is a short overview of what we are going to do: Making Processing talk Reading Shakespeare Adding more actors Building robots Making Processing talk Since Processing has no speaking capabilities out of the box, our first task is adding an external library using the new Processing Library Manager. We will use the ttslib package, which is a wrapper library around the FreeTTS library. We will also create a short, speaking Processing sketch to check the installation. Engage Thrusters Processing can be extended by contributed libraries. Most of these additional libraries can be installed by navigating to Sketch | Import Library… | Add Library..., as shown in the following screenshot: In the Library Manager dialog, enter ttslib in the search field to filter the list of libraries. Click on the ttslib entry and then on the Install button, as shown in the following screenshot, to download and install the library: To use the new library, we need to import it to our sketch. We do this by clicking on the Sketch menu and choosing Import Library... and then ttslib. We will now add the setup() and draw() methods to our sketch. We will leave the draw() method empty for now and instantiate a TTS object in the setup() method. Your sketch should look like the following code snippet: import guru.ttslib.*;TTS tts;void setup() { tts = new TTS();}void draw() {} Now we will add a mousePressed() method to our sketch, which will get called if someone clicks on our sketch window. In this method, we are calling the speak() method of the TTS object we created in the setup() method. void mousePressed() { tts.speak("Hello, I am a Computer");} Click on the Run button to start the Processing sketch. A little gray window should appear. Turn on your speakers or put on your headphones, and click on the gray window. If nothing went wrong, a friendly male computer voice named kevin16 should greet you now. Objective Complete - Mini Debriefing In steps 1 to 3, we installed an additional library to Processing. The ttslib is a wrapper library around the FreeTTS text-to-speech engine. Then we created a simple Processing sketch that imports the installed library and creates an instance of the TTS class. The TTS objects match the speakers we need in our sketches. In this case, we created only one speaker and added a mousePressed() method that calls the speak() method of our tts object. Reading Shakespeare In this part of the project, we are going to create a Drama thread and teach Processing how to read a Shakespeare script. This thread runs in the background and is controlling the performance. We focus on reading and executing the play in this task, and add the speakers in the next one. Prepare for Lift Off Our sketch needs to know which line of the script is read by which robot. So we need to convert the Shakespeare script into a more machine-readable format. For every line of text, we need to know which speaker should read the line. So we take the script and add the letter J and a separation character that is used nowhere else in the script, in front of every line our Juliet-Robot should speak, and we add R and the separation letter for every line our Romeo-Robot should speak. After all these steps, our text file looks something like the following: R# Lady, by yonder blessed moon I vow,R# That tips with silver all these fruit-tree tops --J# O, swear not by the moon, the inconstant moon,J# That monthly changes in her circled orb,J# Lest that thy love prove likewise variable.R# What shall I swear by?J# Do not swear at all.J# Or if thou wilt, swear by thy gracious self,J# Which is the god of my idolatry,J# And I'll believe thee. Engage Thrusters Let's write our parser: Let's start a new sketch by navigating to File | New. Add a setup() and a draw() method. Now add the prepared script to the Processing sketch by navigating to Sketch | Add File and selecting the file you just downloaded. Add the following line to your setup() method: void setup() { String[] rawLines = loadStrings ( "romeo_and_juliet.txt" );} If you renamed your text file, change the filename accordingly. Create a new tab by clicking on the little arrow icon on the right and choosing New Tab. Name the class Line. This class will hold our text lines and the speaker. Add the following code to the tab we just created: public class Line { String speaker; String text; public Line( String speaker, String text ) { this.speaker = speaker; this.text = text; }} Switch back to our main tab and add the following highlighted lines of code to the setup() method: void setup() { String[] rawLines = loadStrings ( "romeo_and_juliet.txt" ); ArrayList lines = new ArrayList(); for ( int i=0; i<rawLines.length; i++) { if (!"".equals(rawLines[i])) { String[] tmp = rawLines[i].split("#"); lines.add( new Line( tmp[0], tmp[1].trim() )); } }} We have read our text lines and parsed them into the lines array list, but we still need a class that does something with our text lines. So create another tab by clicking on the arrow icon and choosing New Tab from the menu; name it Drama. Our Drama class will be a thread that runs in the background and tells each of the speaker objects to read one line of text. Add the following lines of code to your Drama class: public class Drama extends Thread { int current; ArrayList lines; boolean running; public Drama( ArrayList lines ) { this.lines = lines; current = 0; running = false; } public int getCurrent() { return current; } public Line getLine( int num ) { if ( num >=0 && num < lines.size()) { return (Line)lines.get( num ); } else { return null; } } public boolean isRunning() { return running; }} Now we add a run() method that gets executed in the background if we start our thread. Since we have no speaker objects yet, we will print the lines on the console and include a little pause after each line. public void run() { running = true; for ( int i =0; i < lines.size(); i++) { current = i; Line l = (Line)lines.get(i); System.out.println( l.text ); delay( 1 ); } running = false; } Switch back to the main sketch tab and add the highlighted code to the setup() method to create a drama thread object, and then feed it the parsed text-lines. Drama drama;void setup() { String[] rawLines = loadStrings ( "romeo_and_juliet.txt" ); ArrayList lines = new ArrayList(); for ( int i=0; i<rawLines.length; i++) { if (!"".equals(rawLines[i])) { String[] tmp = rawLines[i].split("#"); lines.add( new Line( tmp[0], tmp[1].trim() )); } } drama = new Drama( lines );} So far our sketch parses the text lines and creates a Drama thread object. What we need next is a method to start it. So add a mousePressed() method to start the drama thread. void mousePressed() { if ( !drama.isRunning()) { drama.start(); }} Now add a little bit of text to the draw() method to tell the user what to do. Add the following code to the draw() method: void draw() { background(255); textAlign(CENTER); fill(0); text( "Click here for Drama", width/2, height/2 );} Currently, our sketch window is way too small to contain the text, and we also want to use a bigger font. To change the window size, we simply add the following line to the setup() method: void setup() { size( 800, 400 ); String[] rawLines = loadStrings ( "romeo_and_juliet.txt" ); ArrayList lines = new ArrayList(); for ( int i=0; i<rawLines.length; i++) { if (!"".equals(rawLines[i])) { String[] tmp = rawLines[i].split("#"); lines.add( new Line( tmp[0], tmp[1].trim() )); } } drama = new Drama( lines );} To change the used font, we need to tell Processing which font to use. The easiest way to find out the names of the fonts that are currently installed on the computer is to create a new sketch, type the following line, and run the sketch: println(PFont.list()); Copy one of the font names you like and add the following line to the Romeo and Juliet sketch: void setup() { size( 800, 400 ); textFont( createFont( "Georgia", 24 ));... Replace the font name in the code lines with one of the fonts on your computer. Objective Complete - Mini Debriefing In this section, we wrote the code that parses a text file and generates a list of Line objects. These objects are then used by a Drama thread that runs in the background as soon as anyone clicks on the sketch window. Currently, the Drama thread prints out the text line on the console. In steps 6 to 8, we created the Line class. This class is a very simple, so-called Plain Old Java Object (POJO) that holds our text lines, but it doesn't add any functionality. The code that is controlling the performance of our play was created in steps 10 to 12. We created a thread that is able to run in the background, since in the next step we want to be able to use the draw() method and some TTS objects simultaneously. The code block in step 12 defines a Boolean variable named running, which we used in the mousePressed() method to check if the sketch is already running or should be started. Classified Intel In step 17, we used the list() method of the PFont class to get a list of installed fonts. This is a very common pattern in Processing. You would use the same approach to get a list of installed midi-interfaces, web-cams, serial-ports, and so on.
Read more
  • 0
  • 0
  • 2255

article-image-citrix-xenapp-performance-essentials
Packt
21 Aug 2013
18 min read
Save for later

Citrix XenApp Performance Essentials

Packt
21 Aug 2013
18 min read
(For more resources related to this topic, see here.) Optimizing Session Startup The most frequent complaint that system administrators receive from users about XenApp is definitely that the applications start slowly. They certainly do not consider that at least the first time you launch an application published by XenApp, an entire login process takes place. In this article you'll learn: Which steps form the login process and which systems are involved The most common causes of logon delays and how to mitigate them The use of some advanced XenApp features, like session pre-launch The logon process Let's briefly review the logon process that starts when a user launches an application through the Web Interface or through a link created by the Receiver. The following diagram explains the logon process: The logon process Resolution The user launches an application (A) and the Web Interface queries the Data Collector (B) that returns the least-loaded server for the requested application (C). The Web Interface generates an ICA file and passes it to the client (D). Connection The Citrix client running on the user's PC establishes a connection to the session-host server specified in the ICA file. In the handshake process, client and server agree on the security level and capabilities. Remote Desktop Services (RDS) license Windows Server validates that an RDS/Terminal Server (TS) license is available. AD authentication Windows Server authenticates the user against the Active Directory (AD). If the authentication is successful, the server queries account details from the AD, including Group Policies (GPOs) and roaming profiles. Citrix license XenApp validates that a Citrix license is available. Session startup If the user has a roaming profile, Windows downloads it from the specified location (usually a file server). Windows then applies any GPOs and XenApp applies any Citrix policies. Windows executes applications included in the Startup menu and finally launches the requested application. Some other steps may be necessary if other Citrix components (for example, the Citrix Access Gateway) are included in your infrastructure. Analysing the logon process Users perceive the overall duration of the process from the time when they click on the icon until the appearance of the application on their desktops. To troubleshoot slowness, a system administrator must know the duration of the individual steps. Citrix EdgeSight Citrix EdgeSight is a performance and availability management solution for XenApp and XenDesktop. If you own an Enterprise or Platinum XenApp license, you're entitled to install EdgeSight Basic (for Enterprise licensing) or Advanced (for Platinum licensing). You can also license it as a standalone product. If you deployed Citrix EdgeSight in your farm, you can run the Session Startup Duration Detail report, which includes information on both, the duration of server-side and client-side steps. This report is available only with EdgeSight Advanced. For each session, you can drill down the report to display information about server-side and client-side startup processes. An example is shown in the following screenshot: EdgeSight's Session Startup Duration Detail report The columns report the time (in milliseconds) spent by the startup process in the different steps. SSD is the total server-side time, while CSD the total client-side time. You can find a full description of the available reports and the meaning of the different acronyms in the EdgeSight Report List at http://community.citrix.com/display/edgesight/EdgeSight+5.4+Report+List. In the preceding example most of the time was spent in the Profile Load (PLSD) and Login Script Execution (LSESD) steps on the server and in the Session Creation (SCCD) step on the client. EdgeSight is a very valuable tool to analyze your farm. The available reports cover all the critical areas and gives detailed information about the "hidden" work of Citrix XenApp. With the Session Startup Duration Detail report you can identify which steps cause a slow session startup. If you want to understand why server-side steps, like the Profile Load step in the preceding example that lasted more than 15 seconds, take too much time, you need a different tool. Windows Performance Toolkit Windows Performance Toolkit (WPT) is a tool included in the Windows ADK, freely downloadable from the Microsoft website (http://www.microsoft.com/en-us/download/details.aspx?id=30652). You need an Internet connection to install Windows ADK. You can run the setup on a client with Internet access and configure it to download all the required components in a folder. Move the folder on your server and perform an offline installation. WPT has two components: Windows Performance Recorder, which is used to record all the performance data in an .etl file Windows Performance Analyzer, a graphical program to analyze the recorded data Run the following command from the WPT installed folder to capture slow logons: C:WPT>xperf -on base+latency+dispatcher+NetworkTrace+Registry+File IO -stackWalk CSwitch+ReadyThread+ThreadCreate+Profile -BufferSize 128 -start UserTrace -on "Microsoft-Windows-Shell-Core+Microsoft-Windows-Wininit+Microsoft-Windows-Folder Redirection+Microsoft-Windows-User Profiles Service+Microsoft-Windows-GroupPolicy+Microsoft-Windows-Winlogon+Microsoft-Windows-Security-Kerberos+Microsoft-Windows-User Profiles General+e5ba83f6-07d0-46b1-8bc7-7e669a1d31dc+63b530f8-29c9-4880-a5b4-b8179096e7b8+2f07e2ee-15db-40f1-90ef-9d7ba282188a" -BufferSize 1024 -MinBuffers 64 -MaxBuffers 128 -MaxFile 1024 After having recorded at least one slow logon, run the following command to stop recording and save the performance data to an .etl file: C:WPT>xperf -stop -stop UserTrace -d merged.etl This command creates a file called merged.etl in the WPT install folder. You can open this file with Windows Performance Analyzer. The Windows Performance Analyzer timeline is shown in the following screenshot: Windows Performance Analyzer timeline Windows Performance Analyzer shows a timeline with the duration of each step; for any point in time you can view the running processes, the usage of CPU and memory, the number of I/O operations, and the bytes sent or received through the network. WPT is a great tool to identify the reason for delays in Windows; it, however, has no visibility of Citrix processes. This is why EdgeSight is still necessary for complete troubleshooting. Common causes of logon delays After having analyzed many logon problems, I found that the slowness was usually caused by one or more of the following reasons: Authentication issues Profile issues GPO and logon script issues In the next paragraphs, you'll learn how to identify those issues and how to mitigate them. Even if you can't use the advanced tools (EdgeSight, WPT, and so on) described in the preceding sections, you can follow the guidelines in the next sections and best practices to solve or prevent most of the problems related to the logon process. Authentication issues During the logon process, authentication happens at multiple stages; at minimum when a user logs on to the Web Interface and when the session-host server creates a session for launching the requested application. Citrix XenApp integrates with Active Directory. The authentication is therefore performed by a Domain Controller (DC) server of your domain. Slowness in the Domain Controller response, caused by an overloaded server, can slow down the entire process. Worse, if the Domain Controller is unavailable, a domain member server may try to connect for 30 seconds before timing out and choosing a different DC. Domain member servers choose the Domain Controller for authenticating users based on their membership to Active Directory Sites. If sites are not correctly configured or don't reflect the real topology of your network, a domain member server may decide to use a remote Domain Controller, through a slow WAN link, instead of using a Domain Controller on the same LAN. Profile issues Each user has a profile that is a collection of personal files and settings. Windows offers different types of profiles, with advantages and disadvantages as shown in the following table: Type Description Local The profile folder is local to each server. Roaming The profile folder is saved on a central storage (usually a file server). Mandatory A read-only profile is assigned to users; changes are not saved across sessions. From the administrator's point of view, mandatory profiles are the best option because they are simple to maintain, allow fast logon, and users can't modify Windows or application settings. This option however is not often feasible. I could use mandatory profiles only in specific cases, for example; when users have to run only a single application without the need to customize it. Local profiles are almost never used in a XenApp environment because even if they offer the fastest logon time, they are not consistent across servers and sessions. Furthermore, you'll end up with all your session-host servers storing local profiles for all your users, and that is a waste of disk space. Finally, if you're provisioning your servers with Provisioning Server, this option, if not applicable as local profiles. would be saved in the local cache, which is deleted every time the server reboots. System administrators usually choose roaming profiles for their users. Roaming profiles indeed allow consistency across servers and sessions and preserve user. Roaming profiles are, however, the most significant cause of slow logons. Without a continuous control, they can rapidly grow to a large size. A small profile with a large number of files, for example, a profile with many cookies, can cause delays too. Roaming profiles also suffer of the last write wins problem. In a distributed environment like a XenApp farm, it is not unlikely that users are connected to different servers at the same time. Profiles are updated when users log off, so with different sessions on different servers, some settings could be overwritten, or worse, the profile could be corrupted. Folder redirection To reduce the size of roaming profiles, you can redirect most of the user folders to a different location. Instead of saving files in the user's profile, you can configure Windows to save them on a file sharing system. The advantages of using folder redirection are: The data in the redirected folders is not included in the synchronization job of the roaming profile, making the user logon and logoff processes faster Using disk quotas and redirecting folders to different disks, you can limit how much space is taken up by single folders instead of the whole profile Windows Offline Files technology allows users to access their files even when no network connection is available You can redirect some folders (for example, the Start Menu) to a read-only share, giving all your users the same content Folder redirection is configured through group policies as shown in the following screenshot: Configuring Folder Redirection For each folder, you can choose to redirect it to a fixed location (useful if you want to provide the same content to all your users), to a subfolder (named as the username) under a fixed root path to the user's home directory, or to the local user profile location. You may also apply different redirections based on group membership and define advanced settings for the Documents folder. In my experience, folder redirection plays a key role in speeding up the logon process. You should enable it at least for the Desktop and My Documents folder if you're using roaming profiles. Background upload With Windows 2008 R2, Microsoft added the ability to perform periodic upload of the user's profile registry file (NTUSER.DAT) on the file share. Even if this option wasn't added to address the last write wins problem, it may help to avoid profile corruption and Microsoft recommends enabling it through a GPO as shown in the following screenshot: Enabling Background upload Citrix Profile Management Citrix developed its own solution for managing profiles, Citrix Profile Management. You're entitled to use Citrix Profile Management if you have an active Subscription Advantage for the following products: XenApp Enterprise and Platinum edition XenDesktop Advanced, Enterprise, and Platinum edition You need to install the software on each computer whose user profiles you want to manage. In a XenApp farm install it on your session-host servers. Features Citrix Profile Management was designed specifically to solve some of the problems Windows roaming profiles introduced. Its main features are: Support for multiple sessions, without the last write wins problem Ability to manage large profiles, without the need to perform a full sync when the user logs on Support for v1 (Windows XP/2003) and v2 (Windows Vista/Seven/2008) profiles Ability to define inclusion/exclusion lists Extended synchronization can include files and folders external to the profile to support legacy applications Configuring Citrix Profile Management is configured using Windows Group Policy. In the Profile Management package, downloadable from the Citrix website, you can find the administrative template (.admx) and its language file (.adml). Copy the ADMX file in C:WindowsPolicyDefintions and the ADML file in C:WindowsPolicyDefintionslang (for example, on English operating systems the lang folder is en-US). A new Profile Management folder in Citrix is then available in your GPOs as shown in the following screenshot: Profile Management's settings in Windows GPOs Profile Management settings are in the Computer section, therefore, link the GPO to the Organizational Unit (OU) that contains your session-host servers. Profiles priority order If you deployed Citrix Profile Management, it takes precedence over any other profile assignment method. The priority order on a XenApp server is the following: Citrix Profile Management Remote Desktop Services profile assigned by a GPO Remote Desktop Services profile assigned by a user property Roaming profile assigned by a GPO Roaming profile assigned by a user property Troubleshooting Citrix Profile Management includes a logging functionality, you can enable via GPO as shown in the following screenshot: Enabling the logging functionality With the Log settings setting, you can also enable verbose logging for specific events or actions. Logs are usually saved in %SystemRoot%System32LogfilesUserProfileManager but you can change the path with the Path to log file property. Profile Management's logs are also useful to troubleshoot slow logons. Each step is logged with a timestamp so analyzing those logs you can find which steps last for a long time. GPO and logon script issues In a Windows environment, it's common to apply settings and customizations via Group Policy Objects (GPOs) or using logon scripts. Numerous GPOs and long-running scripts can significantly impact the speed of the logon process. It's sometimes hard to find which GPOs or scripts are causing delays. A suggestion is to move the XenApp server or a test user account in a new Organizational Unit, with no policies applied and blocked inheritance. Comparing the logon time in this scenario with the normal logon time can help you understand how GPOs and scripts affect the logon process. The following are some of the best practices when working with GPOs and logon scripts: Reduce the number of GPOs by merging them when possible. The time Windows takes to apply 10 GPOs is much more than the time to apply a single GPO including all their settings. Disable unused GPOs sections. It's common to have GPOs with only computer or user settings. Explicitly disabling the unused sections can speed up the time required to apply the GPOs. Use GPOs instead of logon scripts. Windows 2008 introduced Group Policy Preferences, which can be used to perform common tasks (map network drives, change registry keys, and so on) previously performed by logon scripts. The following screenshot displays configuring drive maps through GPOs. Configuring drive maps through GPO Session pre-launch, sharing, and lingering Setting up a session is the most time-consuming task Citrix and Windows have to perform when a user requests an application. In the latest version of XenApp, Citrix added some features to anticipate the session setup (pre-launch) and to improve the sharing of the same session between different applications (lingering). Session pre-launch Session pre-launch is a new feature of XenApp 6.5. Instead of waiting for the user to launch an application, you can configure XenApp to set up a session as soon as the user logs on to the farm. At the moment, session pre-launch works only if the user logs on using the Receiver, not through the Web Interface. This means that when the user requests an application, a session is already loaded and all the steps of the logon process you've learned have already taken place. The application can start without any delay. From my experience, this is a feature much appreciated by users and I use it in the production farms. Please note that if you enable session pre-launch, a license is consumed at the time the user logs on. Configuring A session pre-launch is based on a published application on your farm. A common mistake is thinking that when you configure a pre-launch application, Citrix effectively launches that application when the user logs on. The application is actually used as a template for the session. Citrix uses some of its settings, like users, servers/worker groups, color depth, and so on. To create a pre-launch session, right-click on the application and choose Other Tasks | Create pre-launch application as shown in the following screenshot: Creating pre-launch application To avoid confusion, I suggest renaming the configured pre-launch session removing the actual application name, for example, you can name it Pre-launch WGProd. A pre-launched session can be used to run applications that have the same settings of the application you chose when you created the session. For example, it can be used by applications that run on the same servers. If you published different groups of applications, usually assigned to different worker groups, you should create pre-launch sessions choosing one application for each group to be sure that all your users' benefit from this feature. Life cycle of a session If you configured a pre-launch session, when the Receiver passes the user credentials to the XenApp server a new session is created. The server that will host the session is chosen in the usual way by the Data Collector. In Citrix AppCenter, you can identify pre-launched sessions from the value in the Application State column as shown in the following screenshot: Pre-launched session Using Citrix policies, you can set the maximum time a pre-launch session is kept: Pre-launch Disconnect Timer Interval, is the time after which the pre-launch session is put in disconnected state Pre-launch Terminate Timer Interval, is the time after which the pre-launch session is terminated Session sharing Session sharing occurs when a user has an open session on a server and launches an application that is published on the same server. The launch time for the second application is quicker because Citrix doesn't need to set up a new session for it. Session sharing is enabled by default if you publish your applications in seamless window mode. In this mode, applications appear in their own windows without containing an ICA session window. They seem physically installed on the client. Session sharing fails if applications are published with different settings (for example, color depth, encryption, and so on). Make sure to publish your applications using the same settings if possible. Session sharing takes precedence over load balancing; the only exception is if the server reports full load. You can force XenApp to override the load check and to also use fully loaded servers for session sharing. Refer to CTX126839 for the requested registry changes. This is, however, not a recommended configuration; a fully loaded server can lead to poor performance. Session lingering If a user closes all the applications running in a session, the session is ended too. Sometimes it would be useful to keep the session open to speed up the start of new applications. With XenApp 6.5 you can configure a lingering time. XenApp waits before closing a session even if all the running applications are closed. Configuring With Citrix user policies, you can configure two timers for session lingering: Linger Disconnect Timer Interval, is the time after which a session without applications is put in disconnected state LingerTerminate Timer Interval, is the time after which a session without applications is terminated If you're running an older version of XenApp, you can keep a session open even if users close all the running applications with the KeepMeLoggedIn tool; refer to CTX128579. Summary The optimization of the logon process can greatly improve the user experience. With EdgeSight and Windows Performance Toolkit you can perform a deep analysis and detect any causes of delay. If you can't use those tools, you are still able to implement some guidelines and best practices that will surely make users' logon faster. Setting up a session is a time-consuming task. With XenApp 6.5, Citrix implemented some new features to improve session management. With session pre-launch and session lingering you can maximize the reuse of existing sessions when users request an application, speeding up its launch time. Resources for Article: Further resources on this subject: Managing Citrix Policies [Article] Getting Started with XenApp 6 [Article] Getting Started with the Citrix Access Gateway Product Family [Article]
Read more
  • 0
  • 0
  • 6369
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-testing-xtext-and-xtend
Packt
20 Aug 2013
20 min read
Save for later

Testing with Xtext and Xtend

Packt
20 Aug 2013
20 min read
(For more resources related to this topic, see here.) Introduction to testing Writing automated tests is a fundamental technology / methodology when developing software. It will help you write quality software where most aspects (possibly all aspects) are somehow verified in an automatic and continuous way. Although successful tests do not guarantee that the software is bug free, automated tests are a necessary condition for professional programming (see Beck 2002, Martin 2002, 2008, 2011 for some insightful reading about this subject). Tests will also document your code, whether it is a framework, a library, or an application; tests are form of documentation that does not risk to get stale with respect to the implementation itself. Javadoc comments will likely not be kept in synchronization with the code they document, manuals will tend to become obsolete if not updated consistently, while tests will fail if they are not up-to-date. The Test Driven Development (TDD) methodology fosters the writing of tests even before writing production code. When developing a DSL one can relax this methodology by not necessarily writing the tests first. However, one should write tests as soon as a new functionality is added to the DSL implementation. This must be taken into consideration right from the beginning, thus, you should not try to write the complete grammar of a DSL, but proceed gradually; write a few rules to parse a minimal program, and immediately write tests for parsing some test input programs. Only when these tests pass you should go on to implementing other parts of the grammar. Moreover, if some validation rules can already be implemented with the current version of the DSL, you should write tests for the current validator checks as well. Ideally, one does not have to run Eclipse to manually check whether the current implementation of the DSL works as expected. Using tests will then make the development much faster. The number of tests will grow as the implementation grows, and tests should be executed each time you add a new feature or modify an existing one. You will see that since tests will run automatically, executing them over and over again will require no additional effort besides triggering their execution (think instead if you should manually check that what you added or modified did not break something). This also means that you will not be scared to touch something in your implementation; after you did some changes, just run the whole test suite and check whether you broke something. If some tests fail,you will just need to check whether the failure is actually expected (and in case fix the test) or whether your modifications have to be fixed. It is worth noting that using a version control system (such as Git) is essential to easily get back to a known state; just experimenting with your code and finding errors using tests does not mean you can easily backtrack. You will not even be scared to port your implementation to a new version of the used frameworks. For example, when a new version of Xtext is released, it is likely that some API has changed and your DSL implementation might not be built anymore with the new version. Surely, running the MWE2 workflow is required. But after your sources compile again, your test suite will tell you whether the behavior of your DSL is still the same. In particular, if some of the tests fail, you can get an immediate idea of which parts need to be changed to conform to the new version of Xtext. Moreover, if your implementation relies on a solid test suite, it will be easier for contributors to provide patches and enhancements for your DSL; they can run the test suite themselves or they can add further tests for a specific bugfix or for a new feature. It will also be easy for the main developers to decide whether to accept the contributions by running the tests. Last but not the least, you will discover that writing tests right from the beginning will force you to write modular code (otherwise you will not be able to easily test it) and it will make programming much more fun. Xtext and Xtend themselves are developed with a test driven approach. Junit 4 Junit is the most popular unit test framework for Java and it is shipped with the Eclipse JDT. In particular, the examples in this article are based on Junit version 4. To implement Junit tests, you just need to write a class with methods annotated with @org.junit.Test. We will call such methods simply test methods. Such Java (or Xtend) classes can then be executed in Eclipse using the "Junit test" launch configuration; all methods annotated with @Test will be then executed by Junit. In test methods you can use assert methods provided by Junit to implement a test. For example, assertEquals (expected, actual) checks whether the two arguments are equal; assertTrue(expression) checks whether the passed expression evaluates to true. If an assertion fails, Junit will record such failure; in particular, in Eclipse, the Junit view will provide you with a report about tests that failed. Ideally, no test should fail (and you should see the green bar in the Junit view). All test methods can be executed by Junit in any order, thus, you should never write a test method which depends on another one; all test methods should be executable independently from each other. If you annotate a method with @Before, that method will be executed before each test method in that class, thus, it can be used to prepare a common setup for all the test methods in that class. Similarly, a method annotated with @After will be executed after each test method (even if it fails), thus, it can be used to cleanup the environment. A static method annotated with @BeforeClass will be executed only once before the start of all test methods (@AfterClass has the complementary intuitive functionality). The ISetup interface Running tests means we somehow need to bootstrap the environment to make it support EMF and Xtext in addition to the implementation of our DSL. This is done with a suitable implementation of ISetup. We need to configure things differently depending on how we want to run tests; with or without Eclipse and with or without Eclipse UI being present. The way to set up the environment is quite different when Eclipse is present, since many services are shared and already part of the Eclipse environment. When setting up the environment for non-Eclipse use (also referred to as standalone) there are a few things that must be configured, such as creating a Guice injector and registering information required by EMF. The method createInjectorAndDoEMFRegistration in the ISetup interface is there to do exactly this. Besides the creation of an Injector, this method also performs all the initialization of EMF global registries so that after the invocation of that method, the EMF API to load and store models of your language can be fully used, even without a running Eclipse. Xtext generates an implementation of this interface, named after your DSL, which can be found in the runtime plugin project. For our Entities DSL it is called EntitiesStandaloneSetup. The name "standalone" expresses the fact that this class has to be used when running outside Eclipse. Thus, the preceding method must never be called when running inside Eclipse (otherwise the EMF registries will become inconsistent). In a plain Java application the typical steps to set up the DSL (for example, our Entities DSL) can be sketched as follows: Injector injector = new EntitiesStandaloneSetup().createInjectorAndDoEMFRegistration();XtextResourceSet resourceSet = injector.getInstance(XtextResourceSet.class);resourceSet.addLoadOption (XtextResource.OPTION_RESOLVE_ALL, Boolean.TRUE);Resource resource = resourceSet.getResource (URI.createURI("/path/to/my.entities"), true);Model model = (Model) resource.getContents().get(0); This standalone setup class is especially useful also for Junit tests that can then be run without an Eclipse instance. This will speed up the execution of tests. Of course, in such tests you will not be able to test UI features. As we will see in this article, Xtext provides many utility classes for testing which do not require us to set up the runtime environment explicitly. However, it is important to know about the existence of the setup class in case you either need to tweak the generated standalone compiler or you need to set up the environment in a specific way for unit tests. Implementing tests for your DSL Xtext highly fosters using unit tests, and this is reflected by the fact that, by default, the MWE2 workflow generates a specific plug-in project for testing your DSL. In fact, usually tests should reside in a separate project, since they should not be deployed as part of your DSL implementation. This additional project ends with the .tests suffix, thus, for our Entities DSL, it is org.example.entities.tests. The tests plug-in project has the needed dependencies on the required Xtext utility bundles for testing. We will use Xtend to write Junit tests. In the src-gen directory of the tests project, you will find the injector p roviders for both headless and UI tests. You can use these providers to easily write Junit test classes without having to worry about the injection mechanisms setup. The Junit tests that use the injector provider will typically have the following shape (using the Entities DSL as an example): @RunWith(typeof(XtextRunner))@InjectWith(typeof(EntitiesInjectorProvider))class MyTest { @Inject MyClass ... As hinted in the preceding code, in this class you can rely on injection; we used @InjectWith and declared that EntitiesInjectorProvider has to be used to create the injector. EntitiesInjectorProvider will transparently provide the correct configuration for a standalone environment. As we will see later in this article, when we want to test UI features, we will use EntitiesUiInjectorProvider (note the "Ui" in the name). Testing the parser The first tests you might want to write are the ones which concern parsing. This reflects the fact that the grammar is the first thing you must write when implementing a DSL. You should not try to write the complete grammar before starting testing: you should write only a few rules and soon write tests to check if those rules actually parse an input test program as you expect. The nice thing is that you do not have to store the test input in a file (though you could do that); the input to pass to the parser can be a string, and since we use Xtend, we can use multi-line strings. The Xtext test framework provides the class ParseHelper to easily parse a string. The injection mechanism will automatically tell this class to parse the input string with the parser of your DSL. To parse a string, we inject an instance of ParseHelper<T>, where T is the type of the root class in our DSL's model – in our Entities example, this class is called Model. The method ParseHelper.parse will return an instance of T after parsing the input string given to it. By injecting the ParseHelper class as an extension, we can directly use its methods on the strings we want to parse. Thus, we can write: @RunWith(typeof(XtextRunner))@InjectWith(typeof(EntitiesInjectorProvider))class EntitiesParserTest { @Inject extension ParseHelper<Model> @Test def void testParsing() { val model = ''' entity MyEntity { MyEntity attribute; } '''.parse val entity = model.entities.get(0) Assert::assertEquals("MyEntity", entity.name) val attribute = entity.attributes.get(0) Assert::assertEquals("attribute", attribute.name); Assert::assertEquals("MyEntity", (attribute.type.elementType as EntityType). entity.name); } ... In this test, we parse the input and test that the expected structure was constructed as a result of parsing. These tests do not add much value in the Entities DSL, but in a more complex DSL you do want to test that the structure of the parsed EMF model is as you expect. You can now run the test: right-click on the Xtend file and select Run As | JUnit Test as shown in the following screenshot. The test should pass and you should see the green bar in the Junit view. Note that the parse method returns an EMF model even if the input string contains syntax errors (it tries to parse as much as it can); thus, if you want to make sure that the input string is parsed without any syntax error, you have to check that explicitly. To do that, you can use another utility class, ValidationTestHelper. This class provides many assert methods that take an EObject argument. You can use an extension field and simply call assertNoErrors on the parsed EMF object. Alternatively, if you do not need the EMF object but you just need to check that there are no parsing errors, you can simply call it on the result of parse, for example: class EntitiesParserTest { @Inject extension ParseHelper<Model> @Inject extension ValidationTestHelper... @Test def void testCorrectParsing() { ''' entity MyEntity { MyEntity attribute } '''.parse.assertNoErrors } If you try to run the tests again, you will get a failure for this new test, as shown in the following screenshot: The reported error should be clear enough: we forgot to add the terminating ";" in our input program, thus we can fix it and run the test again; this time the green bar should be back. You can now write other @Test methods for testing the various features of the DSL (see the sources of the examples). Depending on the complexity of your DSL you may have to write many of them. Tests should test one specific thing at a time; lumping things together (to reduce the overhead of having to write many test methods) usually makes it harder later. Remember that you should follow this methodology while implementing your DSL, not after having implemented all of it. If you follow this strictly, you will not have to launch Eclipse to manually check that you implemented a feature correctly, and you will note that this methodology will let you program really fast. Ideally, you should start with the grammar with a single rule, especially if the grammar contains nonstandard terminals. The very first task is to write a grammar that just parses all terminals. Write a test for that to ensure there are no overlapping terminals before proceeding; this is not needed if terminals are not added to the standard terminals. After that add as few rules as possible in each round of development/testing until the grammar is complete. Testing the validator Earlier we used the ValidationTestHelper class to test that it was possible to parse without errors. Of course, we also need to test that errors and warnings are detected. In particular, we should test any error situation handled by our own validator. The ValidationTestHelper class contains utility methods (besides assertNoErrors) that allow us to test whether the expected errors are correctly issued. For instance, for our Entities DSL, we wrote a custom validator method that checks that the entity hierarchy is acyclic. Thus, we should write a test that, given an input program with a cycle in the hierarchy, checks that such an error is indeed raised during validation. Although not strictly required, it is better to separate Junit test classes according to the tested features, thus, we write another Junit class, EntitiesValidatorTest, which contains tests related to validation. The start of this new Junit test class should look familiar: @RunWith(typeof(XtextRunner))@InjectWith(typeof(EntitiesInjectorProvider))class EntitiesValidatorTest { @Inject extension ParseHelper<Model> @Inject extension ValidationTestHelper ... We are now going to use the assertError method from ValidationTestHelper, which, besides the EMF model element to validate, requires the following arguments: EClass of the object which contains the error (which is usually retrieved through the EMF EPackage class generated when running the MWE2 workflow) The expected Issue Code An optional string describing the expected error message Thus, we parse input containing an entity extending itself and we pass the arguments to assertError according to the error generated by checkNoCycleInEntityHierarchy in EntitiesValidator: @Testdef void testEntityExtendsItself() { ''' entity MyEntity extends MyEntity { } '''.parse.assertError(EntitiesPackage::eINSTANCE.entity, EntitiesValidator::HIERARCHY_CYCLE, "cycle in hierarchy of entity 'MyEntity'" )} Note that the EObject argument is the one returned by the parse method (we use assertError as an extension method). Since the error concerns an Entity object, we specify the corresponding EClass (retrieved using EntitiesPackage), the expected Issue Code, and finally, the expected error message. This test should pass. We can now write another test which tests the same validation error on a more complex input with a cycle in the hierarchy involving more than one entity; in this test we make sure that our validator issues an error for each of the entities involved in the hierarchy cycle: @Testdef void testCycleInEntityHierarchy() { val model = ''' entity A extends B {} entity B extends C {} entity C extends A {} '''.parse model.assertError(EntitiesPackage::eINSTANCE.entity, EntitiesValidator::HIERARCHY_CYCLE, "cycle in hierarchy of entity 'A'" ) model.assertError(EntitiesPackage::eINSTANCE.entity, EntitiesValidator::HIERARCHY_CYCLE, "cycle in hierarchy of entity 'B'" ) model.assertError(EntitiesPackage::eINSTANCE.entity, EntitiesValidator::HIERARCHY_CYCLE, "cycle in hierarchy of entity 'C'" )} Note that this time we must store the parsed EMF model into a variable since we will call assertError many times. We can also test that the NamesAreUniqueValidator method detects elements with the same name: @Testdef void testDuplicateEntities() { val model = ''' entity MyEntity {} entity MyEntity {} '''.parse model.assertError(EntitiesPackage::eINSTANCE.entity, null, "Duplicate Entity 'MyEntity'" )} In this case, we pass null for the issue argument, since no Issue Code is reported by NamesAreUniqueValidator. Similarly, we can write a test where the input has two attributes with the same name: @Testdef void testDuplicateAttributes() { val model = ''' entity MyEntity { MyEntity attribute; MyEntity attribute; } '''.parse model.assertError(EntitiesPackage::eINSTANCE.attribute, null, "Duplicate Attribute 'attribute'" )} Note that in this test we pass the EClass corresponding to Attribute, since duplicate attributes are involved in the expected error. Do not worry if it seems tricky to get the arguments for assertError right the first time; writing a test that fails the first time it is executed is expected in Test Driven Development. The error of the failing test should put you on the right track to specify the arguments correctly. However, by inspecting the error of the failing test, you must first make sure that the actual output is what you expected, otherwise something is wrong either with your test or with the implementation of the component that you are testing. Testing the formatter As we said in the previously, the formatter is also used in a non-UI environment (indeed, we implemented that in the runtime plug-in project), thus, we can test the formatter for our DSL with plain Junit tests. At the moment, there is no helper class in the Xtext framework for testing the formatter, thus we need to do some additional work to set up the tests for the formatter. This example will also provide some more details on Xtext and EMF, and it will introduce unit test methodologies that are useful in many testing scenarios where you need to test whether a string output is as you expect. First of all, we create another Junit test class for testing the formatter; this time we do not need the helper for the validator; we will inject INodeModelFormatter as an extension field since this is the class internally used by Xtext to perform formatting. One of the main principles of unit testing (which is also its main strength) is that you should test a single functionality in isolation. Thus, to test the formatter, we must not run a UI test that opens an Xtext editor on an input file and call the menu item which performs the formatting; we just need to test the class to which the formatting is delegated and we do not need a running Eclipse for that. import static extension org.junit.Assert.*@RunWith(typeof(XtextRunner))@InjectWith(typeof(EntitiesInjectorProvider))class EntitiesFormatterTest { @Inject extension ParseHelper<Model> @Inject extension INodeModelFormatter; Note that we import all the static methods of the Junit Assert class as extension methods. Then, we write the code that actually performs the formatting given an input string. Since we will write several tests for formatting, we isolate such code in a reusable method. This method is not annotated with @Test, thus it will not be automatically executed by Junit as a test method. This is the Xtend code that returns the formatted version of the input string: (input.parse.eResource as XtextResource).parseResult. rootNode.format(0, input.length).formattedText The method ParseHelper.parse returns the EMF model object, and each EObject has a reference to the containing EMF resource; we know that this is actually XtextResource (a specialized version of an EMF resource). We retrieve the result of parsing, that is, an IParseResult object, from the resource. The result of parsing contains the node model; recall from, that the node model carries the syntactical information that is, offsets and spaces of the textual input. The root of the node model, ICompositeNode, can be passed to the formatter to get the formatted version (we can even specify to format only a part of the input program). Now we can write a reusable method that takes an input char sequence and an expected char sequence and tests that the formatted version of the input program is equal to what we expect: def void assertFormattedAs(CharSequence input, CharSequence expected) { expected.toString.assertEquals( (input.parse.eResource as XtextResource).parseResult. rootNode.format(0, input.length).formattedText)} The reason why we convert the expected char sequence into a string will be clear in a minute. Note the use of Assert.assertEquals as an extension method. We can now write our first formatting test using our extension method assertFormattedAs: @Testdef void testEntities() { ''' entity E1 { } entity E2 {} '''.assertFormattedAs( '''...''' )} Why did we specify "…" as the expected formatted output? Why did we not try to specify what we really expect as the formatted output? Well, we could have written the expected output, and probably we would have gotten it right on the first try, but why not simply make the test fail and see the actual output? We can then copy that in our test once we are convinced that it is correct. So let's run the test, and when it fails, the Junit view tells us what the actual result is, as shown in the following screenshot: If you now double-click on the line showing the comparison failure in the Junit view, you will get a dialog showing a line by line comparison, as shown in the following screenshot: You can verify that the actual output is correct, copy that, and paste it into your test as the expected output. The test will now succeed: @Testdef void testEntities() { ''' entity E1 { } entity E2 {} '''.assertFormattedAs('''entity E1 {}entity E2 {}''' )} We did not indent the expected output in the multi-line string since it is easy to paste it like that from the Junit dialog. Using this technique you can easily write Junit tests that deal with comparisons. However, the "Result Comparison" dialog appears only if you pass String objects to assertEquals; that is why we converted the char sequence into a string in the implementation of assertFormattedAs. We now add a test for testing the formatting of attributes; the final result will be: @Testdef void testAttributes() { ''' entity E1 { int i ; string s; boolean b ;} '''.assertFormattedAs(''' entity E1 { int i; string s; boolean b; }''' )} Summary In this article we introduced unit testing for languages implemented with Xtext. Being able to test most of the DSL aspects without having to start an Eclipse environment really speeds up development.Test Driven Development is an important programming methodology that helps you make your implementations more modular, more reliable, and resilient to changes of the libraries used by your code. Resources for Article: Further resources on this subject: Making Money with Your Game [Article] Getting started with Kinect for Windows SDK Programming [Article] Installing Alfresco Software Development Kit (SDK) [Article]
Read more
  • 0
  • 0
  • 7179

article-image-highcharts
Packt
20 Aug 2013
5 min read
Save for later

Highcharts

Packt
20 Aug 2013
5 min read
(For more resources related to this topic, see here.) Creating a line chart with a time axis and two Y axes We will now create the code for this chart: You start the creation of your chart by implementing the constructor of your Highcharts' chart: var chart = $('#myFirstChartContainer').highcharts({}); We will now set the different sections inside the constructor. We start by the chart section. Since we'll be creating a line chart, we define the type element with the value line. Then, we implement the zoom feature by setting the zoomType element. You can set the value to x, y, or xy depending on which axes you want to be able to zoom. For our chart, we will implement the possibility to zoom on the x-axis: chart: {type: 'line',zoomType: 'x'}, We define the title of our chart: title: {text: 'Energy consumption linked to the temperature'}, Now, we create the x axis. We set the type to datetime because we are using time data, and we remove the title by setting the text to null. You need to set a null value in order to disable the title of the xAxis: xAxis: {type: 'datetime',title: {text: null}}, We then configure the Y axes. As defined, we add two Y axes with the titles Temperature and Electricity consumed (in KWh), which we override with a minimum value of 0. We set the opposite parameter to true for the second axis in order to have the second y axis on the right side: yAxis: [{title: {text: 'Temperature'},min:0},{title: {text: 'Energy consumed (in KWh)'},opposite:true,min:0}], We will now customize the tooltip section. We use the crosshairs option in order to have a line for our tooltip that we will use to follow values of both series. Then, we set the shared value to true in order to have values of both series on the same tooltip. tooltip: {crosshairs: true,shared: true}, Further, we set the series section. For the datetime axes, you can set your series section by using two different ways. You can use the first way when your data follow a regular time interval and the second way when your data don't necessarily follow a regular time interval. We will use both the ways by setting the two series with two different options. The first series follows a regular interval. For this series, we set the pointInterval parameter where we define the data interval in milliseconds. For our chart, we set an interval of one day. We set the pointStart parameter with the date of the first value. We then set the data section with our values. The tooltip section is set with the valueSuffix element, where we define the suffix to be added after the value inside our tool tip. We set our yAxis element with the axis we want to associate with our series. Because we want to set this series to the first axis, we set the value to 0(zero). For the second series, we will use the second way because our data is not necessarily following the regular intervals. But you can also use this way, even if your data follows a regular interval. We set our data by couple, where the first element represents the date and the second element represents the value. We also override the tooltip section of the second series. We then set the yAxis element with the value 1 because we want to associate this series to the second axis. For your chart, you can also set your date values with a timestamp value instead of using the JavaScript function Date.UTC. series: [{name: 'Temperature',pointInterval: 24 * 3600 * 1000,pointStart: Date.UTC(2013, 0, 01),data: [17.5, 16.2, 16.1, 16.1, 15.9, 15.8, 16.2],tooltip: {valueSuffix: ' °C'},yAxis: 0},{name: 'Electricity consumption',data: [[Date.UTC(2013, 0, 01), 8.1],[Date.UTC(2013, 0, 02), 6.2],[Date.UTC(2013, 0, 03), 7.3],[Date.UTC(2013, 0, 05), 7.1],[Date.UTC(2013, 0, 06), 12.3],[Date.UTC(2013, 0, 07), 10.2]],tooltip: {valueSuffix: ' KWh'},yAxis: 1}] You should have this as the final code: $(function () {var chart = $(‘#myFirstChartContainer’).highcharts({chart: {type: ‘line’,zoomType: ‘x’},title: {text: ‘Energy consumption linked to the temperature’},xAxis: {type: ‘datetime’,title: {text: null}},yAxis: [{title: {text: ‘Temperature’},min:0},{title: {text: ‘Electricity consumed’},opposite:true,min:0}],tooltip: {crosshairs: true,shared: true},series: [{name: ‘Temperature’,pointInterval: 24 * 3600 * 1000,pointStart: Date.UTC(2013, 0, 01),data: [17.5, 16.2, 16.1, 16.1, 15.9, 15.8, 16.2],tooltip: {valueSuffix: ‘ °C’},yAxis: 0},{name: ‘Electricity consumption’,data: [[Date.UTC(2013, 0, 01), 8.1],[Date.UTC(2013, 0, 02), 6.2],[Date.UTC(2013, 0, 03), 7.3],[Date.UTC(2013, 0, 05), 7.1],[Date.UTC(2013, 0, 06), 12.3],[Date.UTC(2013, 0, 07), 10.2]],tooltip: {valueSuffix: ‘ KWh’},yAxis: 1}]});}); You should have the expected result as shown in the following screenshot: Summary In this article, we learned how to perform a task with the most important features of Highcharts. We created a line chart with a time axis and two Y-axes and realized that there are a wide variety of things that you can do with it. Also, we learned about the most commonly performed tasks and most commonly used features in Highcharts. Resources for Article : Further resources on this subject: Converting tables into graphs (Advanced) [Article] Line, Area, and Scatter Charts [Article] Data sources for the Charts [Article]
Read more
  • 0
  • 0
  • 3155

article-image-lucenenet-optimizing-and-merging-index-segments
Packt
20 Aug 2013
3 min read
Save for later

Lucene.NET: Optimizing and merging index segments

Packt
20 Aug 2013
3 min read
(For more resources related to this topic, see here.) How to do it… Index optimization is accomplished by calling the Optimize method on an instance of IndexWriter. The example for this recipe demonstrates the use of the Optimize method to clean up the storage of the index data on the physical disk. The general steps in the process to optimize and index segments are the following: Create/open an index. Add or delete documents from the index. Examine the MaxDoc and NumDocs properties of the IndexWriter class. If the index is deemed to be too dirty, call the Optimize method of the IndexWriter class. The following example for this recipe demonstrates taking these steps to create, modify, and then optimize an index. namespace Lucene.NET.HowTo._12_MergeAndOptimize {// ...// build facade and an initial index of 5 documentsvar facade = new LuceneDotNetHowToExamplesFacade().buildLexicographicalExampleIndex(maxDocs: 5).createIndexWriter();// report MaxDoc and NumDocsTrace.WriteLine(string.Format("MaxDoc=={0}", facade.IndexWriter.MaxDoc()));Trace.WriteLine(string.Format("NumDocs=={0}",facade.IndexWriter.NumDocs()));// delete one documentfacade.IndexWriter.DeleteDocuments(new Term("filename", "0.txt"));facade.IndexWriter.Commit();// report MaxDoc and NumDocsTrace.WriteLine("After delete / commit");Trace.WriteLine(string.Format("MaxDoc=={0}", facade.IndexWriter.MaxDoc()));Trace.WriteLine(string.Format("NumDocs=={0}", facade.IndexWriter.NumDocs()));// optimize the indexfacade.IndexWriter.Optimize();// report MaxDoc and NumDocsTrace.WriteLine("After Optimize");Trace.WriteLine(string.Format("MaxDoc=={0}", facade.IndexWriter.MaxDoc()));Trace.WriteLine(string.Format("NumDocs=={0}", facade.IndexWriter.NumDocs()));Trace.Flush();// ...} How it works… When this program is run, you will see output similar to that in the following screenshot: This program first creates an index with five files. It then reports the values of the MaxDoc and NumDocs properties of the instance of IndexWriter. MaxDoc represents the maximum number of documents that have been stored in the index. It is possible to add more documents, but that may incur a performance penalty by needing to grow the index. NumDocs is the current number of documents stored in the index. At this point these values are 5 and 5, respectively. The next step deletes a single document named 0.txt from the index, and the changes are committed to disk. MaxDoc and NumDocs are written to the console again and now report 5 and 4 respectively. This makes sense as one file has been deleted and there is now "slop" in the index where space is being taken up from a previously deleted document. The reference to the document index information has been removed, but the space is still used on the disk. The final two steps are to call Optimize and to write MaxDoc and NumDocs values to the console, for the final time. These now are 4 and 4, respectively, as Lucene.NET has merged any index segments and removed any empty disk space formerly used by deleted document index information. Summary A Lucene.NET index physically contains one or more segments, each of which is its own index and holds a subset of the overall indexed content. As documents are added to the index, new segments are created as index writer's flush-buffered content into the index's directory and file structure. Over time this fragmentation will cause searches to slow, requiring a merge/optimization to be performed to regain performance. Resources for Article : Further resources on this subject: Extending Your Structure and Search [Article] Advanced Performance Strategies [Article] Creating your first collection (Simple) [Article]
Read more
  • 0
  • 0
  • 7341

article-image-working-remote-data
Packt
20 Aug 2013
4 min read
Save for later

Working with remote data

Packt
20 Aug 2013
4 min read
(For more resources related to this topic, see here.) Getting ready Create a new document in your editor. How to do it... Copy the following code into your new document: <!DOCTYPE html> <html> <head> <title>Kendo UI Grid How-to</title> <link rel="stylesheet" type="text/css" href="kendo/styles/kendo.common.min.css"> <link rel="stylesheet" type="text/css" href="kendo/styles/kendo.default.min.css"> <script src = "kendo/js/jquery.min.js"></script> <script src = "kendo/js/kendo.web.min.js"></script> </head> <body> <h3 style="color:#4f90ea;">Exercise 12- Working with Remote Data</h3> <p><a href="index.html">Home</a></p> <script type="text/javascript"> $(document).ready(function () { var serviceURL = "http://gonautilus.com/kendogen/KENDO.cfc?method="; var myDataSource = new kendo.data.DataSource({ transport: { read: { url: serviceURL + "getArt", dataType: "JSONP" } }, pageSize: 20, schema: { model: { id: "ARTISTID", fields: { ARTID: { type: "number" }, ARTISTID: { type: "number" }, ARTNAME: { type: "string" }, DESCRIPTION: { type: "CLOB" }, PRICE: { type: "decimal" }, LARGEIMAGE: { type: "string" }, MEDIAID: { type: "number" }, ISSOLD: { type: "boolean" } } } } } ); $("#myGrid").kendoGrid({ dataSource: myDataSource, pageable: true, sortable: true, columns: [ { field: "ARTID", title: "Art ID"}, { field: "ARTISTID", title: "Artist ID"}, { field: "ARTNAME", title: "Art Name"}, { field: "DESCRIPTION", title: "Description"}, { field: "PRICE", title: "Price", template: '#= kendo.toString(PRICE,"c") #'}, { field: "LARGEIMAGE", title: "Large Image"}, { field: "MEDIAID", title: "Media ID"}, { field: "ISSOLD", title: "Sold"}] } ); } ); </script> <div id="myGrid"></div> </body> </html> How it works... This example shows you how to access a JSONP remote datasource. JSONP allows you to work with cross-domain remote datasources. The JSONP format is like JSON except it adds padding, which is what the "P" in JSONP stands for. The padding can be seen if you look at the result of the AJAX call being made by the Kendo Grid. It simply responds back with the callback argument that is passed and wraps the JSON in parentheses. You'll notice that we created a serviceURL variable that points to the service we are calling to return our data. On line 19, you'll see that we are calling the getArt method and specifying the value of dataType as JSONP. Everything else should look familiar. There's more... Generally, the most common format used for remote data is JavaScript Object Notation (JSON). You'll find several examples of using ODATA on the Kendo UI demo website. You'll also find examples of performing create, update, and delete operations on that site. Outputting JSON with ASP MVC In an ASP MVC or ASP.NET application, you'll want to set up your datasource like the following example. ASP has certain security requirements that force you to use POST instead of the default GET request when making AJAX calls. ASP also requires that you explicitly define the value of contentType as application/json when requesting JSON. By default, when you create a service as ASP MVC that has JsonResultAction, ASP will nest the JSON data in an element named d: var dataSource = new kendo.data.DataSource({ transport: { read: { type: "POST", url: serviceURL, dataType: "JSON", contentType: "application/json", data: serverData }, parameterMap: function (data, operation) { return kendo.stringify(data); } }, schema: { data: "d" } }); Summary This article discussed about how to work with aggregates with the help of an example of counting the number of items in a column. Resources for Article: Further resources on this subject: Constructing and Evaluating Your Design Solution [Article] Data Manipulation in Silverlight 4 Data Grid [Article] Quick start – creating your first grid [Article]
Read more
  • 0
  • 0
  • 2361
article-image-load-balancing-and-ha-owncloud
Packt
20 Aug 2013
13 min read
Save for later

Load Balancing and HA for ownCloud

Packt
20 Aug 2013
13 min read
(For more resources related to this topic, see here.) The key strategy If we look closely for the purpose of load balancing, we will see three components in an ownCloud instance, which are as follows: A user data storage (till now we were using system hard disk) A web server, for example Apache or IIS A database, MySQL would be a good choice for demonstration The user data storage Whenever user creates any file or directory in ownCloud or uploads something, the data gets stored in the data directory. If we have to ensure that our ownCloud instance is capable to store the data then we have to make this redundant. Lucky for us, ownCloud supports a lot of other options out of the box, other than the local disk storage. We can use a Samba backend, an ftp backend, an OpenStack Swift backend, Amazon S3, Web DAV, and a lot more. Configuring WebDAV Web Distributed Authoring and Versioning (WebDAV) is an extension of HTTP. It is described by the IETF in RFC 4918 at http://tools.ietf.org/html/rfc4918. It provides the functionality of editing and managing documents over the web. It essentially makes the web readable and writable. To enable custom backend support, we will first have to go to the Familiar Apps section, and need to enable the External Storage Support app. After this app is enabled, when we open the ownCloud admin panel, we will see an external storage section on the page. Just choose WebDAV from the drop-down menu and fill in the credentials. Choose mount point as 0 and put the root as $user/. We are doing this so that for each user, a directory will be created on the WebDAV with their username and whenever users log in, they will be sent to this directory. Just to verify, check out the config/mount.php fi le for ownCloud. The web server Assuming that we have taken care of backend storage, let's now handle the frontend web server. A very obvious way is to do the DNS level load balancing by round robin or geographical distribution. In round-robin DNS scheme the resolution of a name returns a list IP addresses instead of a single IP. These IP addresses may be returned in the round-robin fashion, which means that every time the IP addresses will be permuted in the list. This helps in distribution of the traffic since usually the first IP is used. Another way to give out the list is to match the IP address of the client to the closest IP in the list, and then make that the first IP in the response of the DNS query. The biggest advantage of DNS-based load distribution is that it is application agnostic. It does not care if the request is for an Apache server running PHP or an IIS server running ASP. It just rotates the IP, and the server is responsible to handle the request appropriately. So far, it sounds all good but then why don't we use it all the time? Is it sufficient to balance the entire load? Well, this strategy is great for load distribution, but what will happen in case one of the servers fails? We will run into a major problem then, because usually DNS servers do not do health checks. So in case one if our servers fail, we have to either fix it very fast, which is not easy always or we have to remove that IP from the DNS, but then the DNS answers are cached by several intermediate caching (only DNS servers). They will continue to serve the stale IPs and our clients will continue visiting bad server. Another way is to move the IP from the bad server to the good server. So now this good server will have two IP addresses. That means that it has to handle twice the load, since DNS will keep on sending traffic after permuting the IPs in round-robin fashion. Due to these and several other problems with DNS level load balancing, we generally either avoid using it or use it along with other load-balancing mechanisms. Load balancing Apache is quite easy using Windows GUI For the sake of this example, let's assume that we have ownCloud served by two Apache web servers at 192.168.10.10 and 192.168.10.11. Starting with Apache 2.1, a module known as mod_proxy_balancer was introduced. For CentOS, the default apache package ships this module with itself, so installing is not a problem. If we have Apache running from the yum repo, then we already have this module with us. Now, mod_proxy_balancer supports three algorithms for load distribution, which are as follows: Request Counting With this algorithm, incoming requests are distributed among backend workers in such a way that each backend gets a proportional number of requests defined in the configuration by the loadfactor variable. For example, consider this Apache config snippet: <Proxy balancer://ownCloud>BalancerMember http://192.168.10.11/ loadfactor=1 # Balancer member 1BalancerMember http://192.168.10.10/ loadfactor=3 # Balancer member 2ProxySet lbmethod=byrequests</Proxy> In this example, one request out of every four will be sent to 192.168.10.11, and three will be sent to 192.168.10.10. This might be an appropriate configuration for a site with two servers, one of which is more powerful than the other. Weighted Traffic Counting The Weighted Traffic Counting algorithm is similar to Request Counting algorithm with a minor difference, that is, Weighted Traffic Counting considers the number of bytes instead of number of requests. In the following configuration example, the number of bytes processed by 192.168.10.10 will be three times that of 192.168.10.11: <Proxy balancer://ownCloud>BalancerMember http://192.168.10.11/ loadfactor=1 # Balancer member 1BalancerMember http://192.168.10.10/ loadfactor=3 # Balancer member 2ProxySet lbmethod=bytraffic</Proxy> Pending Request Counting The Pending Request Counting algorithm is the latest and the most sophisticated algorithm provided by Apache for load balancing. It is available from Apache 2.2.10 onward. In this algorithm, the scheduler keeps track of the number of requests that are assigned to each backend worker at any given time. Each new incoming request will be sent to the backend that has a least number of pending requests. In other words, to the backend worker that is relatively least loaded. This helps in keeping the request queues even among the backend workers, and each request generally goes to the worker that can process it the fastest. If two workers are equally light-loaded, the scheduler uses the Request Counting algorithm to break the tie, which is as follows: <Proxy balancer://ownCloud>BalancerMember http://192.168.10.11/ # Balancer member 1BalancerMember http://192.168.10.10/ # Balancer member 2ProxySet lbmethod=bybusyness</Proxy> Enable the Balancer Manager Sometimes, we may need to change our load balancing configuration, but that may not be easy to do without affecting the running servers. For such situations, the Balancer Manager module provides a web interface to change the status of backend workers on the fly. We can use Balancer Manager to put a worker in offline mode or change its loadfactor, but we must have mod_status installed in order to use Balance Manager. A sample config, which should be defined in /etc/httpd/httpd.conf, might look similar to the following code: <Location /balancer-manager>SetHandler balancer-managerOrder Deny,AllowDeny from allAllow from .owncloudbook.com</Location> Once we add directives similar to the preceding ones to httpd.conf, and then restart Apache, we can open the Balancer Manager by pointing a browser at http://owncloudbook.com/balancer-manager. Load balancing IIS Load balancing IIS quite easily uses Windows GUI. Windows Server editions come with a set of nifty tools for this known as Network Load Balancer(NLB). It balances the load by distributing incoming requests among a cluster of servers. Each server in a cluster emits a heartbeat, a kind of "I am operational" message. NLB ensures that no request goes to a server which is not sending this heartbeat, thereby ensuring that all that the requests are processed by operational servers. Let's now configure the NLB by performing the following steps: We need to turn it on first. We can do so by following the given steps: Go to Server Manager. Click on the Features section in the left-side bar. Then click on the Add Features. Select Network Load Balancing from the list. Once we have chosen Network Load Balancing, we will click on Next >, and then click on the Install to get this feature on the servers. Once we are done here, we will open Network Load Balancing Manager from the Administrative Tools section in the Start menu. In the manager window, we need to right-click on the Network Load Balancing Clusters option to create a new cluster, as shown in the following screenshot: Now we need to give the address of the server which is actually running the web server, and then connect to it, as shown in the following screenshot: Choose the appropriate interface. In this example, we have only one, and then click on the Next > button. On the next window, we will be shown host parameters, where we have to assign a priority to this host, as shown in the following screenshot: Now click on the Add button, and a dialogue will open where we have to assign an IP, which will be shared by all the hosts, as shown in the following screenshot.(Network Load Balancing Manager will configure this IP on all the machines.) On the next dialogue choose a cluster IP, as shown in the following screenshot. This will be the IP, which will be used by the users to log in to the ownCloud. Now that we have given it an IP, we will define cluster parameters to use unicast. Multicasts and broadcasts can be used, but they are not supported by all vendors and require more effort. Now everything is done. We are ready to use the Network Load Balancing feature. These steps are to be repeated on all the machines which are going to be a part of this cluster. So there! We have also loaded balanced IIS. The MySQL database MySQL Cluster is a separate component of MySQL, which is not shipped with the standard MySQL server but can be downloaded freely from http://dev.mysql.com/downloads/cluster/. MySQL Cluster helps in better scalability and ensuring high uptime. It is write scalable and ACID compliant, and doesn't have a single disadvantage because of the way it is designed with multi masters and high distribution of data. This is perfect for our requirements, so let's start with its installation. Basic terminologies Management node: This node performs the basic management functions. It starts and stops other nodes and performs backup. It is always a good idea to start this node before starting anything else in the cluster. Data node: This node will store the cluster data. They should always be more than one to provide redundancy. SQL node: This node accesses the cluster data. It uses the NDBCLUSTER storage engine. The default MySQL server does not ship with the NDBCLUSTER storage engine and other required features. So it is mandatory to download a server binary, which can support MySQL Cluster feature. We have to download the appropriate source for MySQL Cluster from http://dev.mysql.com/downloads/cluster/, if Linux is the host OS or the binary if Windows is in consideration. For the purpose of this demonstration, we will assume one Management node, one SQL node, and two Data nodes. We will also make a note that node is a logical word here. It need not be a physical machine. In fact, they can reside on the same machine as separate processes, but then the whole purpose of high availability will be defeated. Let's start by installing the MySQL cluster nodes. Data node Setting up Data node is fairly simple. Just copy the ndbd and ndbmtd binaries from the bin directory of the archive to /usr/loca/bin/ and make them executable as follows: cp bin/ndbd /usr/local/bin/ndbdcp bin/ndbmtd /usr/local/bin/ndbmtdchmod +x bin/ndbd /usr/local/bin/ndbdchmod +x bin/ndbmtd /usr/local/bin/ndbmtd Management node Management node needs only two binaries, ndb_mgmd and ndb_mgm cp bin/ndb_mgm* /usr/local/binchmod +x /usr/local/bin/ndb_mgm* SQL node First of all, we need to create a user for MySQL as follows: useradd mysql Now extract the tar.gz archive file downloaded before. Conventionally, MySQL documentation uses /usr/local/ directory to unpack the archive, but it can be done anywhere. We'll follow MySQL conventions here and also create a symbolic link to ease the access and better manageability as follows: tar -C /usr/local -xzvf mysql-cluster-gpl-7.2.12-linux2.6.tar.gzln -s /usr/local/mysql-cluster-gpl-7.2.12-linux2.6-i686 /usr/local/mysql We need to set write permissions for MySQL user, which we created before, as follows: chown -R root /usr/local/mysqlchown -R mysql /usr/local/mysql/datachgrp -R mysql /usr/local/mysql The preceding commands will ensure that the permission to start and stop the MySQL instance's remains with the root user, but MySQL user can write data to the data directory. Now, change the directory to the scripts directory and create the system databases as follows: scripts/mysql_install_db --user=mysql Configuring the Data node and SQL node We can configure the Data node and SQL node as follows: vim /etc/my.cnf[mysqld]# Options for mysqld process:ndbcluster # run NDB storage engine[mysql_cluster]# Options for MySQL Cluster processes:ndb-connectstring=192.168.20.10 # location of management server Configuring the Management node We can configure the Management node as follows: vim /var/lib/mysql-cluster/config.ini[ndbd default]# Options affecting ndbd processes on all data nodes:NoOfReplicas=2 # Number of replicasDataMemory=200M # How much memory to allocate for data storageIndexMemory=50M # How much memory to allocate for index storage # For DataMemory and IndexMemory, we have used the # default values. Since the "world" database takes up # only about 500KB, this should be more than enough for # this example Cluster setup.[tcp default]# TCP/IP options:portnumber=2202 [ndb_mgmd]# Management process options:hostname=192.168.20.10 # Hostname or IP address of MGM nodedatadir=/var/lib/mysql-cluster # Directory for MGM node log files[ndbd]# Options for data node "A": # (one [ndbd] section per data node)hostname=192.168.20.12 # Hostname or IP addressdatadir=/usr/local/mysql/data # Directory for this data node's data files[ndbd]# Options for data node "B":hostname=192.168.0.40 # Hostname or IP addressdatadir=/usr/local/mysql/data # Directory for this data node's data files[mysqld]# SQL node options:hostname=192.168.20.11 # Hostname or IP address Summary Now we have gained an idea about how to ensure high availability of ownCloud server components. We have seen the load balancing for backend data store as well as frontend web server, and the database. We have seen some common ways and we can now provide a reliable ownCloud service to our users. Resources for Article: Further resources on this subject: Introduction to Cloud Computing with Microsoft Azure [Article] Cross-premise Connectivity [Article] Cloud-enabling Your Apps [Article]
Read more
  • 0
  • 0
  • 7704

article-image-creating-sheet-objects-and-starting-new-list-using-qlikview-11
Packt
20 Aug 2013
6 min read
Save for later

Creating sheet objects and starting new list using Qlikview 11

Packt
20 Aug 2013
6 min read
(For more resources related to this topic, see here.) How it works... To add the list box for a company, right-click in the blank area of the sheet, and choose New Sheet Object | List Box as shown in the following screenshot: As you can see in the drop-down menu, there are multiple types of sheet objects to choose from such as List Box, Statistics Box, Chart, Input Box, Current Selections Box, Multi Box, Table Box, Button, Text Object, Line/Arrow Object, Slider/Calendar Object, and Bookmark Object. We will only cover a few of them in the course of this article. The Help menu and extended examples that are available on the QlikView website will allow you to explore ideas beyond the scope of this article. The Help documentation for any item can be obtained by using the Help menu present on the top menu bar. Choose the List Box sheet object to add the company dimension to our analysis. The New List Box wizard has eight tabs: General, Expressions, Sort, Presentation, Number, Font, Layout, and Caption, as shown in the following screenshot: Give the new List Box the title Company. The Object ID will be system generated. We choose the Company field from the fields available in the datafile that we loaded. We can check the Show Frequency box to show frequency in percent, which will only tell us how many account lines in October were loaded for each company. In the Expressions tab, we can add formulas for analyzing the data. Here, click on Add and choose Average. Since, we only have numerical data in the Amount field, we will use the Average aggregation for the Amount field. Don't forget to click on the Paste button to move your expression into the expression checker. The expression checker will tell you if the expression format is valid or if there is a syntax problem. If you forget to move your expression into the expression checker with the Paste button, the expression will not be saved and will not appear in your application. The Sort tab allows you to change the Sort criteria from text to numeric or dates. We will not change the Sort criteria here. The Presentation tab allows you to adjust things such as column or row header wrap, cell borders, and background pictures. The Number tab allows us to override the default format to tell the sheet to format the data as money, percentage, or date for example. We will use this tab on our table box currently labeled Sum(Amount) to format the amount as money after we have finished creating our new company list box. The Font tab lets us choose the font that we want to use, its display size, and whether to make our font bold. The Layout tab allows us to establish and apply themes, and format the appearance of the sheet object, in this case, the list box. The Caption tab further formats the sheet object and, in the case of the list box, allows you to choose the icons that will appear in the top menu of the list box so that we can use those icons to select and clear selections in our list box. In this example, we have selected search, select all, and clear. We can see that the percentage contribution to the amount and the average amount is displayed in our list box. Now, we need to edit our straight table sheet object along with the amount. Right-click on the straight table sheet object and choose Properties from the pop-up menu. In the General tab, give the table a suitable name. In this case, use Sum of Accounts. Then move over to the Number tab and choose Money for the number format. Click on Apply to immediately apply the number format, and click on OK to close the wizard. Now our straight table sheet object has easier to read dollar amounts. One of the things we notice immediately in our analysis is that we are out of balance by one dollar and fifty-nine cents, as shown in the following screenshot: We can analyze our data just using the list boxes, by selecting a company from the Company list and seeing which account groups and which cost centers are included (white) and which are excluded (gray). Our selected Company shows highlighted in green: By selecting Cheyenne Holding, we can see that it is indeed a holding company and has no manufacturing groups, sales accounting groups, or cost centers. Also the company is in balance. But what about a more graphic visual analysis? To create a chart to further visualize and analyze our data, we are going to create a new sheet object. This time we are going to create a bar chart so that we can see various company contributions to administrative costs or sales by the Acct.5 field, and the account number. Just as when we created the company list box, we right-click on the sheet and choose New Sheet Object | Chart. This opens the following Chart Properties wizard for us: We follow the steps through the chart wizard by giving the chart a name, selecting the chart type, and the dimensions we want to use. Again our expression is going to be SUM(Amount), but we will use the Label option and name it Total Amount in the Expression tab. We have selected the Company and Acct.5 dimensions in the Dimension tab, and we take the defaults for the rest of the wizard tabs. When we close the wizard, the new bar chart appears on our sheet, and we can continue our analysis. In the following screenshot, we have chosen Cheyenne Manufacturing for our Company and all Sales/COS Trade to Mexico Branch as Account Groups. These two selection then show us in our straight table the cost centers that are associated with sales/COS trade to Mexico branch. In our bar chart, we see the individual accounts associated with sales/COS trade to Mexico branch and Cheyenne Manufacturing along with the related amounts posted for these accounts. Summary We created more sheet objects, started with a new list box to begin analyzing our loaded data. We alson added dimensions for analysis. Resources for Article: Further resources on this subject: Meet QlikView [Article] Linking Section Access to multiple dimensions [Article] Creating the first Circos diagram [Article]
Read more
  • 0
  • 0
  • 3593

article-image-appfog-top-features-you-need-know
Packt
20 Aug 2013
17 min read
Save for later

AppFog Top Features You Need to Know

Packt
20 Aug 2013
17 min read
(For more resources related to this topic, see here.) Auto reconfigure Most application's life cycle will involve using different databases in different environments. For example, you may use one database locally for development environments, but when you deploy to production, you will most likely have a production database in high-end machines. It can be a very tedious task to manage these changes during each deployment. AppFog supports the auto configure feature that automatically detects the database settings in your application and rewrites them using the bound service's credentials and settings. However, only some of the frameworks, such as Ruby on Rails and the Java Spring framework, are supported by AppFog for auto reconfigure. Enabling auto reconfigure AppFog will turn on auto configure automatically if you deploy a Spring application with the javax.sql.DataSource bean defined in the spring context XML file. AppFog will parse this file and override the driver class, URL, username, and password that form to match the service bound to the application. The following is an example snippet of the Spring context XML that will enable the AppFog auto reconfigure feature during deployment: <bean id="dataSource" class="org.apache.commons.dbcp.BasicDataSource"destroy-method="close"><property name="driverClassName" value="com.mysql.jdbc.Driver" /><property name="url" value="jdbc:mysql://127.0.0.1:3306/test" /><property name="username" value="spring" /><property name="password" value="spring" /></bean> This is because this file includes a reference to the org.apache.commons.dbcp.BasicDataSource bean that implements the javax.sql.DataSource interface, and therefore turns on the auto reconfigure feature. This feature is very helpful because it enables developers to deploy to AppFog without changing a single line of code. AppFog supports more auto recon figure features for Spring applications for other services such as MongoDB, Redis, and RabbitMQ. There are a couple of requirements to enable auto reconfigure on AppFog, and they are: Only one javax.sql.DataSource bean definition should be allowed in the Spring context XML file. Only one type of service should be bound. For example, for a relational database, only either one of the bound MySQL or PostgreSQL will enable auto reconfigure. Disabling auto reconfigure In some situations, you may not want to use the auto reconfigure feature; for example, you may want to use other database solutions such as MongoDB from MongoLab, MySQL from RDS by Amazon Web Services, or your own database installation on the same infrastructure. Java Spring application If you don't want to enable the AppFog auto reconfigure feature for your Java Spring application while creating the project, you can just select JavaWeb instead of choosing Spring. Ruby on Rails If you do not want to enable the AppFog auto reconfigure feature for your Rails application database, you can disable it easily by creating a new file config/cloudfoundry.yml and then add the following line to disable the auto reconfigure feature: autoconfig: false So all in all, AppFog's auto reconfigure feature is a great time-saving option that allows you to deploy your app without even knowing the details involved, and you still remain in control, so if you don't want to use it you can just disable it as we have seen earlier. Custom SSL If you are dealing with any sensitive information such as login credentials or credit card information, then having SSL is essential. SSL encrypts your data before it is sent, which can prevent man-in-the-middle attacks, where people intercept the data that otherwise would be transferred in plain text. AppFog provides a default SSL for applications that use an AppFog-provided subdomain under *.af.cm. The AppFog platform enables developers to deploy applications that easily enable custom SSL. This feature is only available for a paid plan. At the time of writing, the cheapest plan that offers SSL is priced at $50 per month with one end point. Be aware that AppFog custom SSL support is currently available only for those applications on AppFog that are hosted in the Amazon Web Service infrastructure; thus, applications that are deployed to Rackspace, Windows Azure, and HP Cloud cannot use this feature even when you are using a paid plan. The Install tool To generate an RSA private key, you need to have OpenSSL installed on your machine. Most of the Linux distro install OpenSSL by default. To install OpenSSL in Windows, you need to download the installer from http://gnuwin32.sourceforge.net/packages/openssl.htm and then install it according to the installer instructions. With OpenSSL installed, we can move on to generating our own private key. Generating a private key It is easy to generate an RSA private key using OpenSSL. openssl genrsa is the command to generate an RSA private key. Make sure OpenSSL's bin folder is in the PATH environment variable, or you can use the console to navigate to OpenSSL's bin folder. The location of the bin folder of my machine is c:Program Files (x86)GnuWin32bin. c:Program Files (x86)GnuWin32bin>openssl genrsa -des3 -out server.key1024Loading 'screen' into random state - doneGenerating RSA private key, 1024 bit long modulus...........++++++..........++++++e is 65537 (0x10001)Enter pass phrase for server.key:Verifying - Enter pass phrase for server.key: The preceding command will generate the RSA private key with 1024 bit strength. The pass phrase was required to generate the key but we will remove it later. Generating Certificate Signing Request In Public Key Infrastructure (PKI) systems, a Certificate Signing Request is a message sent from an applicant to the Certificate Authority in order to apply for a digital identity certificate. So we need to generate a Certificate Signing Request for the Certificate Authority to sign. To generate a Certificate Signing Request, use the openssl req command: c:Program Files (x86)GnuWin32bin>openssl req -new -key server.key -outserver.csrLoading 'screen' into random state - doneYou are about to be asked to enter information that will be incorporatedinto your certificate request.What you are about to enter is what is called a Distinguished Name or aDN.There are quite a few fields but you can leave some blankFor some fields there will be a default value,If you enter '.', the field will be left blank.-----Country Name (2 letter code) [AU]:MYState or Province Name (full name) [Some-State]:Kuala LumpurLocality Name (eg, city) []:Kuala LumpurOrganization Name (eg, company) [Internet Widgits Pty Ltd]:Dream and MeOrganizational Unit Name (eg, section) []:ITCommon Name (eg, YOUR name) []:Dream and MeEmail Address []:appfog@deamand.mePlease enter the following 'extra' attributesto be sent with your certificate requestA challenge password []:An optional company name []: If you are on a Windows machine and encounter the following error: Unable to load config info from /usr/local/ssl/openssl.cnf Then you need to add the following for the command to load the config file from the custom location: -config "C:Program Files (x86)GnuWin32shareopenssl.cnf" Please note that the path to the config file might be different based on your installation. You can send the created server.csr file to the SSL certificate provider to sign it. The next step is to remove the pass phrase protection so that we can use it on AppFog: c:Program Files (x86)GnuWin32bin>openssl rsa -in server.key -outserver.keyEnter pass phrase for server.key:writing RSA key The new private key file will be created without the pass phrase protection. You will need to upload this new private key to AppFog. Installing the SSL certificate To install our new SSL certificate to AppFog, we will need to log in to the AppFog web console. On the main page, open the SSL tab and click on the Get Started button. On the new page, you will need to upload the server.csr file along with your server.key private key. Once they are uploaded, AppFog will provide you with an SSL terminator that will look like the following: af-ssl-term-0-000000000.us-east-1.elb.amazonaws.com You will need to sign in to your domain provider and create/modify CNAME to point to the SSL terminator provided to you. It could take a while for the DNS to propagate, but once done, you will have your custom SSL set up! This will give your users more confidence in your application, and is a lot more secure than the HTTP protocol. Teams Most applications are developed by teams, and deploying them is no exception. As such, AppFog allows you to create and manage teams with permissions for starting, stopping, and restarting applications. This feature is still in beta and only available to the paid plans, as of the time of writing this article. Once you start using the paid plan, you can navigate to the Teams tab and start to invite people to join your team. Once the invitation is approved by the user, then he/she can start to manage your application! You can also manage team members from the console as follows: c:optappfog-starterappfog-blog>af -u team@dreamand.me update appfogblog Currently, the team feature only supports basic permission controls, but in the future, AppFog will implement more complex authorizations, such as roles and groups, which will allow different permissions for different environments, such as allowing QA engineers to be able to manage only the QA environment but not production applications. Third-party add-ons AppFog provides third-party services that you can easily install and tie into your application. A list of add-ons can be found in the Add-ons tab of the application's details page. These add-ons can be very useful for developers, such as the Mailgun add-on, which provides a service for developers to send e-mails via the cloud without setting up a mail server or relying on the Gmail SMTP that will limit the request. Another useful add-on is Blitz, which is a cloud-based load testing tool for developers to find performance bottlenecks. This add-on allows you to easily set it up and start load testing in minutes! Installing an add-on A great example for showing off the process of using an add-on is the Logentries add-on. The Logentries add-on allows you to manage your application's logs from the cloud. To install it, just go to the Add-ons page and hit the Install button, which you will find right under the description, as shown in the following screenshot: Managing add-ons After successfully installing an add-on, you will see two buttons being displayed that will help you to manage the add-ons: After clicking on the Manage button, you will be able to sign in to Logentries with a single sign-on feature. AppFog and the AppFog add-on provider provide good integration that allows you to sign in to the add-on console without a username and password as they are integrated using a single sign-on. Configuring Rails to use Logentries The next step is to set up your application to use Logentries. For a Rails application, you need to add the le gem into the Gemfile and then install it with a bundler using the bundle install command. Once installed, we need to configure the logger by modifying the config/environment.rb file. All you have to do is just add the following lines: if Rails.env.development?Rails.logger = Le.new('LOGENTRIES_TOKEN', true)elseRails.logger = Le.new('LOGENTRIES_TOKEN')end Replace LOGENTRIES_TOKEN with the token created in the Logentries UI. The second parameter tells the app if it should be dumped to the console instead. So for development, we will just be printing our errors to the console, whereas in production, we will be logging to the Logentries service. Rails.logger.info("information message")Rails.logger.warn("warning message")Rails.logger.debug("debug message") For more information on Logentries, you can view its documentation page at https://logentries.com/doc/. There are many other add-ons such as Redis Cloud, IronWorker, Blitz, Mailgun, and more. All of these add-ons provide good documentation on how to install, configure, and use them in your application. This is just another great example of how AppFog speeds up the development process for developers besides just providing a great infrastructure to work on. Tunnel AppFog secures its services, such as databases, from outside access, which is great in most situations, as only your application should have access to the database. However, there are situations where you will need remote access, for example, while running ad-hoc queries against your database for one-time analysis. For these types of situations, you will need to tunnel into the AppFog environment to locally access the resources. Install Caldecott Gem To begin with, we need to install the caldecott gem that will allow us to connect through TCP over an HTTP tunnel. c:optappfog-starterappfog-blog>af tunnel appfog-blog-data To use af tunnel, you must first install caldecott: gem install caldecott Note that you'll need a C compiler. If you're on OS X, Xcode will provide one. If you're on Windows, try DevKit. To install caldecott, simply run gem install caldecott from the console. With it installed, we are ready to create a tunnel. Tunnel to service You can use the af tool to create a tunnel and bind a local port to the remote port on the AppFog infrastructure. Just run af tunnel <servicename> [--port], where <servicename> is the name of the service you want to tunnel to, and you can optionally specify the port number to bind to. c:optappfog-starterappfog-blog>af tunnel appfog-blog-dataGetting tunnel connection info: OKService connection info:username : uf52effea2387407ba14bb0d94b820af1password : pf79b4c9e197841298386f2543a5d7857name : d5198ab07a6434a68adeaa9162e31e8d5infra : rsStarting tunnel to appfog-blog-data on port 10000.1: none2: psqlWhich client would you like to start?: 1Open another shell to run command-line clients oruse a UI tool to connect using the displayed information.Press Ctrl-C to exit... During the tunneling process, you can choose to either run the psql client or none. In my example, I have chosen none since I will use PG Admin3 to manage. The following table shows the clients that can start by caldecott. You need to make sure the client executable is in the PATH environment variable. Service   Client   MongoDB   mongo   MySQL   mysql   PostgreSQL   psql   If your favorite client is not in the list, you can choose none. The af tool will output the details of the credentials, so you can paste them into your favorite client to manage the databases. The following is an example of me using PG Admin3 to sign in. Once connected, you can use it as if the database was local, view data, and even create new tables. AppFog provides a secure channel for you to manage and tunnel your data service. Moreover, you can use your favorite database client, such as a MySQL workbench or pgAdmin3. Export/import service One of the features you must know is the export/import service. This feature allows you to export existing services' data and import this data to new services. This is very helpful for developers to clone production data to another service for other purposes, such as to analyze data or use as a development database. At the time of writing, AppFog only provides the af tool to export/import services. You can export a service using the af export-service <service> command: c:optappfog-starterappfog-blog>af export-service appfog-blog-dataExporting data from 'appfog-blog-data': OKhttp://dl.rs.af.cm/serialized/postgresql/dcb9c83b851524c17bfc9778ba8f5c1ac/snapshots/1629?token=PEXKgYJy9B8e After running the export command, you will be provided with a link to the snapshot. You can download it and take its backup. Using this link, you can import into a new service and initialize with the data. To import a service, you can use the af import-service <service> <url> command, where <service> is the new service's name and <url> is the link you exported from another service. For example, if you want to name the new service as appfog-blog-data-singapore, you can simply use the following command: c:optappfog-starterappfog-blog>af import-service appfogblog-data-singapore http://dl.rs.af.cm/serialized/postgresql/dcb9c83b851524c17bfc9778ba8f5c1ac/snapshots/1629?token=PEXKgYJy9B8eImporting data into 'appfog-blog-data-singapore': OK It's worth noting that you can only create a service of the same type with the snapshot tools. For example, you cannot create a MySQL database from the snapshot of a PostgreSQL database. Cloning We have just seen some features for cloning the database services. AppFog also offers a similar feature for your application itself. The cloning abilities allow you to replicate your application, optionally including the services. The difference between this and the previous export/import method is, of course, that here you clone the application as well. When cloning your application, you can choose a different infrastructure. So for instance, you may have deployed your app to the HP infrastructure, but the clone feature allows you to replicate it—let's say, on the AWS cloud—with zero downtime. To clone a complete application including its services, you can use the Clone tab on the application's admin section, as shown in the following screenshot: Choosing an infrastructure The first step is to choose the infrastructure, which, at the time of writing, had the following options: AWS Asia Southeast AWS Europe West AWS US East HP Openstack AZ 2 MS Azure AZ 1 Choosing a subdomain Your new application needs a new subdomain to map to. Currently, the AppFog clone feature is only able to map to the *.af.cm subdomain when you clone, but once the application is set up, you can map your own custom domain. To clone an application from the command line, you can use the af clone <src-app> <dest-app> [infra] command. To view a list of the available infrastructures, you can just run the af infras command: c:optappfog-starterappfog-blog>af infras+--------+-------------------------+| Name | Description |+--------+-------------------------+| aws | AWS US East - Virginia || eu-aws | AWS EU West - Ireland || ap-aws | AWS Asia SE - Singapore || hp | HP AZ 2 - Las Vegas |+--------+-------------------------+ So, to clone your application to AWS Singapore, just execute the following: c:optappfog-starterappfog-blog>af clone appfog-blog appfog-blogsingapore-clone ap-aws1: AWS US East - Virginia2: AWS EU West - Ireland3: AWS Asia SE - Singapore4: HP AZ 2 - Las VegasSelect Infrastructure: 3Application Deployed URL [appfog-blog-singapore-clone.ap01.aws.af.cm]:Pulling last pushed source code: OKCloning 'appfog-blog' to 'appfog-blog-singapore-clone':Uploading Application:Checking for available resources: OKPacking application: OKUploading (33K): OKPush Status: OKExporting data from appfog-blog-data: OKCreating service appfog-blog-singapore-clone-data: OKBinding service appfog-blog-singapore-clone-data: OKImporting data to appfog-blog-singapore-clone-data: OKStaging Application 'appfog-blog-singapore-clone': OKStarting Application 'appfog-blog-singapore-clone': OK This simple one-line command just cloned your entire application from one infrastructure to another within minutes! Cool, right? You can then view the new application from the web console and check its status. AppFog provides an awesome clone feature that allows you to clone an application from one infrastructure to another. While this needs to be carefully done on a production application, this feature still has many use cases that will ease the developer's workload. As I hope you have now seen, AppFog is not just a simple PaaS that allows you to deploy applications. AppFog extends this basic functionality with tons of features such as custom domains, app-cloning, and multiple data center setup options. Besides offering amazing features for developers, AppFog also offers features for your customers, such as allowing you to deploy to their location, for example Singapore, which will decrease the latency across Asia. Third-party add-on is yet another cool feature that is available on the AppFog platform. For example, MongoLab/MongoHQ provides free add-ons to an AppFog user with maximum 500 MB of storage, which is a huge amount of storage and is enough for small productions. Moreover, Logentries allows you to rapidly develop and test your backend with load testing and logging features Summary This article introduced you to the AppFog features and showed how to use them in a real-world environment. The features included load balancing, SSL, add-ons, teams, clones, tunnels, and so on. Resources for Article : Further resources on this subject: Introduction to Cloud Computing with Microsoft Azure [Article] Apache CloudStack Architecture [Article] Troubleshooting in OpenStack Cloud Computing [Article]
Read more
  • 0
  • 0
  • 2549
article-image-monitoring-additional-servers
Packt
19 Aug 2013
7 min read
Save for later

Monitoring additional servers

Packt
19 Aug 2013
7 min read
(For more resources related to this topic, see here.) Step 1 – Installing munin-node First we need to connect to the server we want to monitor and install munin-node. In our examples, we will be using the name muninnode as the name of our additional server. Your server will probably have a different name, so every time you see muninnode in an example, you should replace that with the name of the server you are using. Examples will also use the term username, which you should replace with your username. But first, let's install munin-node. For Debian or Ubuntu, use the following commands: ssh username@muninnodesudo apt-get install munin-node For Red Hat or Fedora, use: ssh username@muninnodesudo yum install munin-node Next, we will take a look at the generated configuration file. It is located at /etc/munin/ munin-node.conf. Please open it up in your favorite editor. The first thing we have to take care of is the fact that we want our master to be able to connect to this node. For security reasons, munin-node defaults to allowing only connections from the localhost to query its data. So, let's scroll down to the allow section and add a line beneath it. If your master has a static IP address, please enter it in the allow section in the following format: allow 10.0.0.200 This will grant the master at 10.0.0.200 access to the data of this node. If your server has a dynamic IP or you want to trust your entire network range, you can either add a single line for every possible IP addresses or use a cidr_allow section. Please note that you can only use this if you have the Net::CIDR Perl module installed. Most systems will have this by default, but if you are having problems, you should check that. cidr_allow 10.0.0.0/24 This will grant anyone connecting from any IP from 10.0.0.0 to 10.0.0.255 to fetch all the information available in this node. After you have done this, you need to save the file and restart the munin-node daemon. For older versions of Debian or Ubuntu, use the following command: sudo invoke-rc.d munin-node restart For Debian or Ubuntu and Red Hat or Fedora, use: sudo service munin-node restart Step 2 – Testing your munin-node installation Now that we have installed the node, it is a good idea to check if it functions correctly. We will do this by connecting to the node and fetching some information. ssh username@muninnodetelnet localhost 4949versionlistquit You should get the following output: ssh username@muninnodeWelcome to muninnodeusername@muninnode:~$ telnet muninnode 4949Trying 127.0.0.1...Connected to localhost.Escape character is '^]'.# munin node at muninnode.versionmunins node on muninnode. version: 2.0.9-2listcpu df df_inode entropy forks fw_packets http_loadtime if_err_eth0if_eth0 interrupts iostat iostat_ios irqstats load memorymunin_stats ntp_kernel_err ntp_kernel_pll_freq ntp_kernel_pll_offntp_offset open_files open_inodes proc_pri processes sensors swapthreads uptime users vmstatquitConnection closed by foreign host.username@muninnode:~$ Please note that the node might be a bit impatient with you. If you connect to it using Telnet and then give no further instructions for a few seconds, munin-node will automatically disconnect you, thinking you are just wasting it's time. If this happens, just go ahead and try again. Now that we know that munin-node is running, we want to make sure it is functioning correctly. Munin-node keeps its log file at /var/log/munin/munin-node.log. Let's take a look at that. ssh username@muninnodetail /var/log/munin/munin.log You should be able to see your connection attempt in the log; it should look something like the following: 2013/01/01-12:30:10 CONNECT TCP Peer: "127.0.0.1:44363" Local: "127.0.0.1:4949" If you have a node that is experiencing problems with connections or a plugin, make sure to look at this log file for exceptions or error messages. Step 3 – Installing additional plugins When munin-node was installed, it ran its autodetect script to enable plugins from its standard library if they were applicable to your system. If you have installed new software on this machine, you can easily re-run this script to see if Munin can help you monitor the new software. If you, for example, have installed MySQL or PostgreSQL, then this is what you do: ssh username@muninnodesudo munin-node-configure --suggestsudo munin-node-configure --shell The first command will show you all the plugins your munin-node has out of the box and whether they apply to your system. The second command will display the commands you will have to execute to create the symbolic links in order to enable those suggestions. Please note that not all plugins support this, and therefore, not all applicable plugins will automatically be enabled. Munin-node has to be restarted after you've added new plugins; otherwise, these changes will not take effect. Step 4 – Adding the new node to the master Now that we've completely configured the node and tested to see if it works, we are ready to add the node to our master. To do this, we have to go to our master and test whether we can connect back to our munin-node. ssh username@muninmastertelnet 4949 muninnodeversionlistquit This should display the version and the capabilities of the munin-node running on the muninnode server. If this does not work, make sure you have started the munin-node on the muninnode server and also check whether firewalls allow you to connect to it on port 4949. Also go ahead and recheck the allowed IP addresses in the munin-node configuration as mentioned in step 2. If this is working correctly, go ahead and open up the file at /etc/munin/munin.conf. Here, we scroll down until we see the following host tree: # a simple host tree[localhost.localdomain]address 127.0.0.1use_node_name yes We need to add our new munin-node to this host tree as follows: # the host tree of our local network[localhost.localdomain]address 127.0.0.1use_node_name yes[muninnode.localdomain]address 10.0.0.200use_node_name yes Now, we'll have to wait at least 10 minutes before we will be able to see our new node on the Munin master's website. Go ahead and point your browser to your Munin master at http: //localhost/munin or at http://your_munin_master/munin; you should see something like the following screenshot: After a couple of minutes, you should be able to see graphs for your node and even compare the nodes of your cluster side by side. Troubleshooting Now it could very well be possible that it isn't working for you. Here are the few steps you should check first: Check the Munin master log at /var/log/munin/munin.log for errors. Check the Munin node log at /var/log/munin/munin-node.log on the munin server for access calls and errors. Try to connect from your Munin master to your node using Telnet 4949. If you can connect, type nodes and check whether the name of your node is there. Still in Telnet, type list munninnode.localdomain and check whether you get a list of plugins. If not, add your hostname to /etc/munin/munin-node.conf (see the Munin node configuration section). Summary We looked at the first step in expanding your munin cluster. Once you know how to add one server, you will be able to add all of them! Resources for Article : Further resources on this subject: Device Management in Zenoss Core Network and System Monitoring: Part 1 [Article] HP Network Node Manager 9: Understanding Smart Plug-Ins [Article] An Introduction to Flash Builder 4-Network Monitor [Article]
Read more
  • 0
  • 0
  • 15527

article-image-mailbox-database-management
Packt
19 Aug 2013
10 min read
Save for later

Mailbox Database Management

Packt
19 Aug 2013
10 min read
(For more resources related to this topic, see here.) Determining the average mailbox size per database PowerShell is very flexible and gives you the ability to generate very detailed reports. When generating mailbox database statistics, we can utilize data returned from multiple cmdlets provided by the Exchange Management Shell. This section will show you an example of this, and you will learn how to calculate the average mailbox size per database using PowerShell. How to do it... To determine the average mailbox size for a given database, use the following one-liner: Get-MailboxStatistics -Database DB1 | ForEach-Object {$_.TotalItemSize.value.ToMB()} | Measure-Object -Average | Select-Object –ExpandProperty Average How it works... Calculating an average is as simple as performing some basic math, but PowerShell gives us the ability to do this quickly with the Measure-Object cmdlet. The example uses the Get-MailboxStatistics cmdlet to retrieve all the mailboxes in the DB1 database. We then loop through each one, retrieving only the TotalItemSize property, and inside the ForEach-Object script block we convert the total item size to megabytes. The result from each mailbox can then be averaged using the Measure-Object cmdlet. At the end of the command, you can see that the Select-Object cmdlet is used to retrieve only the value for the Average property. The number returned here will give us the average mailbox size in total for regular mailboxes, archive mailboxes, as well as any other type of mailbox that has been disconnected. If you want to be more specific, you can filter out these mailboxes after running the Get-MailboxStatistics cmdlet: Get-MailboxStatistics -Database DB1 | Where-Object{!$_.DisconnectDate -and !$_.IsArchive} | ForEach-Object {$_.TotalItemSize.value.ToMB()} | Measure-Object -Average | Select-Object –ExpandProperty Average Notice that, in the preceding example, we have added the Where-Object cmdlet to filter out any mailboxes that have a DisconnectDate defined or where the IsArchive property is $true. Another thing that you may want to do is round the average. Let's say the DB1 database contained 42 mailboxes and the total size of the database was around 392 megabytes. The value returned from the preceding command would roughly look something like 2.39393939393939. Rarely are all those extra decimal places of any use. Here are a couple of ways to make the output a little cleaner: $MBAvg = Get-MailboxStatistics -Database DB1 | ForEach-Object {$_.TotalItemSize.value.ToMB()} | Measure-Object -Average | Select-Object –ExpandProperty Average[Math]::Round($MBAvg,2) You can see that this time, we stored the result of the one-liner in the $MBAvg variable. We then use the Round method of the Math class in the .NET Framework to round the value, specifying that the result should only contain two decimal places. Based on the previous information, the result of the preceding command would be 2.39. We can also use string formatting to specify the number of decimal places to be used: [PS] "{0:n2}" -f $MBAvg2.39 Keep in mind that this command will return a string, so if you need to be able to sort on this value, cast it to double: [PS] [double]("{0:n2}" -f $MBAvg)2.39 The -f format operator is documented in PowerShell's help system in about_operators. There's more... The previous examples have only shown how to determine the average mailbox size for a single database. To determine this information for all mailbox databases, we can use the following code (save it to a file called size.ps1): foreach($DB in Get-MailboxDatabase) { Get-MailboxStatistics -Database $DB | ForEach-Object{$_.TotalItemSize.value.ToMB()} |Measure-Object -Average | Select-Object @{n="Name";e={$DB.Name}}, @{n="AvgMailboxSize";e={[Math] ` ::Round($_.Average,2)}} | Sort-Object ` AvgMailboxSize -Desc} The result of this command would look something like this: This example is very similar to the one we looked at previously. The difference is that, this time, we are running our one-liner using a foreach loop for every mailbox database in the organization. When each mailbox database has been processed, we sort the output based on the AvgMailboxSize property. Restoring data from a recovery database When it comes to recovering data from a failed database, you have several options depending on what kind of backup product you are using or how you have deployed Exchange 2013. The ideal method for enabling redundancy is to use a DAG, which will replicate your mailbox databases to one or more servers and provide automatic failover in the event of a disaster. However, you may need to pull old data out of a database restored from a backup. In this section, we will take a look at how you can create a recovery database and restore data from it using the Exchange Management Shell. How to do it... First, restore the failed database using the steps required by your current backup solution. For this example, let's say that we have restored the DB1 database file to E:RecoveryDB1 and the database has been brought to a clean shutdown state. We can use the following steps to create a recovery database and restore mailbox data: Create a recovery database using the New-MailboxDatabase cmdlet: New-MailboxDatabase -Name RecoveryDB `-EdbFilePath E:RecoveryDB1DB1.edb `-LogFolderPath E:RecoveryDB01 `-Recovery `-Server MBX1 When you run the preceding command, you will see a warning that the recovery database was created using the existing database file. The next step is to check the state of the database, followed by mounting the database: Eseutil /mh .DB1.edbEseutil /R E02 /DMount-Database -Identity RecoveryDB Next, query the recovery database for all mailboxes that reside in the database RecoveryDB: Get-MailboxStatistics –Database RecoveryDB | fl DisplayName,MailboxGUID,LegacyDN Lastly, we will use the New-MailboxRestoreRequest cmdlet to restore the data from the recovery database for a single mailbox: New-MailboxRestoreRequest -SourceDatabase RecoveryDB `-SourceStoreMailbox "Joe Smith" `-TargetMailbox joe.smith When running the eseutil commands, make sure you are in the folder where the restored mailbox database and logs are placed. How it works... When you restore the database file from your backup application, you may need to ensure that the database is in a clean shutdown state. For example, if you are using Windows Server Backup for your backup solution, you will need to use the Eseutil.exe database utility to play any uncommitted logs into the database to get it in a clean shutdown state. Once the data is restored, we can create a recovery database using the New-MailboxDatabase cmdlet, as shown in the first example. Notice that when we ran the command we used several parameters. First, we specified the path to the EDB file and the logfiles, both of which are in the same location where we restored the files. We have also used the -Recovery switch parameter to specify that this is a special type of database that will only be used for restoring data and should not be used for production mailboxes. Finally, we specified which mailbox server the database should be hosted on using the -Server parameter. Make sure to run the New-MailboxDatabase cmdlet from the mailbox server that you are specifying in the -Server parameter, and then mount the database using the Mount-Database cmdlet. The last step is to restore data from one or more mailboxes. As we saw in the previous example, New-MailboxRestoreRequest is the tool to use for this task. This cmdlet was introduced in Exchange 2010 SP1, so if you have used this process in the past, the procedure is the same with Exchange 2013. There's more… When you run the New-MailboxRestoreRequest cmdlet, you need to specify the identity of the mailbox you wish to restore using the -SourceStoreMailbox parameter. There are three possible values you can use to provide this information: DisplayName, MailboxGuid, and LegacyDN . To retrieve these values, you can use the Get-MailboxStatistics cmdlet once the recovery database is online and mounted: Get-MailboxStatistics -Database RecoveryDB | fl DisplayName,MailboxGUID,LegacyDN Here we have specified that we want to retrieve all three of these values for each mailbox in the RecoveryDB database. Understanding target mailbox identity When restoring data with the New-MailboxRestoreRequest cmdlet, you also need to provide a value for the -TargetMailbox parameter. The mailbox needs to already exist before running this command. If you are restoring data from a backup for an existing mailbox that has not changed since the backup was done, you can simply provide the typical identity values for a mailbox for this parameter. If you want to restore data to a mailbox that was not the original source of the data, you need to use the -AllowLegacyDNMismatch switch parameter. This will be useful if you are restoring data to another user's mailbox, or if you've recreated the mailbox since the backup was taken. Learning about other useful parameters The New-MailboxRestoreRequest cmdlet can be used to granularly control how data is restored out of a mailbox. The following parameters may be useful to customize the behavior of your restores: ConflictResolutionOption: This parameter specifies the action to take if multiple matching messages exist in the target mailbox. The possible values are KeepSourceItem, KeepLatestItem, or KeepAll. If no value is specified, KeepSourceItem will be used by default. ExcludeDumpster: Use this switch parameter to indicate that the dumpster should not be included in the restore. SourceRootFolder: Use this parameter to restore data only from a root folder of a mailbox. TargetIsArchive: You can use this switch parameter to perform a mailbox restore to a mailbox archive. TargetRootFolder: This parameter can be used to restore data to a specific folder in the root of the target mailbox. If no value is provided, the data is restored and merged into the existing folders, and, if they do not exist, they will be created in the target mailbox. These are just a few of the useful parameters that can be used with this cmdlet, but there are more. For a complete list of all the available parameters and full details on each one, run Get-Help New-MailboxRestoreRequest -Detailed. Understanding mailbox restore request cmdlets There is an entire cmdlet set for mailbox restore requests in addition to the New-MailboxRestoreRequest cmdlet. The remaining available cmdlets are outlined as follows: Get-MailboxRestoreRequest: Provides a detailed status of mailbox restore requests Remove-MailboxRestoreRequest : Removes fully or partially completed restore requests Resume-MailboxRestoreRequest : Resumes a restore request that was suspended or failed Set-MailboxRestoreRequest: Can be used to change the restore request options after the request has been created Suspend-MailboxRestoreRequest: Suspends a restore request any time after the request was created but before the request reaches the status of Completed For complete details and examples for each of these cmdlets, use the Get-Help cmdlet with the appropriate cmdlet using the -Full switch parameter. Taking it a step further Let's say that you have restored your database from backup, you have created a recovery database, and now you need to restore each mailbox in the backup to the corresponding target mailboxes that are currently online. We can use the following script to accomplish this: $mailboxes = Get-MailboxStatistics -Database RecoveryDBforeach($mailbox in $mailboxes) { New-MailboxRestoreRequest -SourceDatabase RecoveryDB ` -SourceStoreMailbox $mailbox.DisplayName ` -TargetMailbox $mailbox.DisplayName } Here you can see that first we use the Get-MailboxStatistics cmdlet to retrieve all the mailboxes in the recovery database and store the results in the $mailboxesvariable. We then loop through each mailbox and restore the data to the original mailbox. You can track the status of these restores using the Get-MailboxRestoreRequest cmdlet and the Get-MailboxRestoreRequestStatistics cmdlet. Summary Thus in this article, we covered a very small but an appetizing part of mailbox database management by determining the average mailbox size per database and restoring data from a recovery database. Resources for Article : Further resources on this subject: Connecting to Microsoft SQL Server Compact 3.5 with Visual Studio [Article] Microsoft SQL Azure Tools [Article] SQL Server 2008 R2: Multiserver Management Using Utility Explorer [Article]
Read more
  • 0
  • 0
  • 2010
Modal Close icon
Modal Close icon