Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7010 Articles
article-image-powershell-troubleshooting-replacing-foreach-loop-foreach-object-cmdlet
Packt
27 Nov 2014
8 min read
Save for later

PowerShell Troubleshooting: Replacing the foreach loop with the foreach-object cmdlet

Packt
27 Nov 2014
8 min read
In this article by Michael Shepard, author of PowerShell Troubleshooting Guide, we will see how to replace the foreach loop with the foreach-object cmdlet. (For more resources related to this topic, see here.) When you write a function to process a file, a typical approach might look like this: function process-file{param($filename)    $contents=get-content $filename   foreach($line in $contents){       # do something interesting   }} This pattern works well for small files, but for really large files this kind of processing will perform very badly and possibly crash with an out of memory exception. For instance, running this function against a 500 MB text file on my laptop took over five seconds despite the fact that the loop doesn't actually do anything. To determine the time it takes to run, we can use the measure-command cmdlet, as shown in the following screenshot: Note that the result is a Timespan object and the TotalSeconds object has the value we are looking for. You might not have any large files handy, so I wrote the following quick function to create large text files that are approximately the size you ask for: function new-bigfile{param([string]$path,     [int]$sizeInMB)   if(test-path $path){     remove-item $path   }   new-item -ItemType File -Path $path | out-null   $line='A'*78   $page="$line`r`n"*1280000   1..($sizeInMB/100) | foreach {$page | out-file $path -Append   -Encoding ascii}} The code works by creating a large string using string multiplication, which can be handy in situations like this. It then writes the string to the file the appropriate number of times that are necessary. The files come out pretty close to the requested size if the size is over 100 MB, but they are not exact. Fortunately, we aren't really concerned about the exact size, but rather just that the files are very large. A better approach would be to utilize the streaming functionality of the pipeline and use the ForEach-Object cmdlet instead of reading the contents into a variable. Since objects are output from Get-Content as they are being read, processing them one at a time allows us to process the file without ever reading it all into memory at one time. An example that is similar to the previous code is this: function process-file2{param($filename)   get-content $filename | foreach-object{       $line=$_       # do something interesting   }} Note that since we're using the ForEach-Object cmdlet instead of the foreach loop we have to use the $_ automatic variable to refer to the current object. By assigning that immediately to a variable, we can use exactly the same code as we would have in the foreach loop example (in place of the #do something interesting comment). In PowerShell Version 4.0, we could use the –PipelineVariable common parameter to simplify this code. As with all parameters where you supply the name of a variable, you don't use the dollar sign: function process-file3{param($filename)   get-content $filename -PipelineVariable line | foreach-object{       # do something interesting   }} With either of these constructions, I have been able to process files of any length without any noticeable memory usage. One way to measure memory usage (without simply watching the process monitor) is to use the Get-Process cmdlet to find the current process and report on the WorkingSet64 property. It is important to use the 64-bit version rather than the WorkingSet property or its alias: WS. A function to get the current shell's memory usage looks like this: function get-shellmemory{   (get-process -id $pid| select -expand WorkingSet64)/1MB}new-alias mem get-shellmemory I've included an alias (mem) for this function to make it quicker to call on the command line. I try to avoid using aliases in scripts as a practice because they can make code harder to understand, but for command line use, aliases really are a time-saver. Here's an example of using get-shellmemory via its alias, mem: This shows that although the function processed a 500 MB file, it only used a little over 3 MB of memory in doing so. Combining the function to determine memory usage with measure-command gives us a general purpose function to measure time and memory usage: function get-performance{param([scriptblock]$block);   $pre_mem=get-shellmemory   $elapsedTime=measure-command -Expression $block   $post_mem=get-shellmemory   write-output "the process took $($elapsedTime.TotalSeconds) seconds"   write-output "the process used $($post_mem - $pre_mem) megabytes of    memory"}new-alias perf get-performance One thing to note about measuring memory this way is that since the PowerShell host is a .NET process that is garbage-collected, it is possible that a garbage-collection operation has occurred during the time the process is running. If that happens, the process may end up using less memory than it was when it started. Because of this, memory usage statistics are only guidelines, not absolute indicators. Adding an explicit call to the garbage collector to tell it to collect will make it less likely that the memory readings will be unusual, but the situation is in the hands of the .NET framework, not ours. You will find that the memory used by a particular function will vary quite a bit, but the general performance characteristics are the important thing. In this section, we're concerned about whether the memory usage grows proportionally with the size of the input file. Using the first version of the code that used the foreach loop, the memory use did grow with the size of the input file, which limits the usefulness of that technique. For reference, a summary of the performance on my computer using the foreach loop and the ForEach-Object cmdlet is given in the following table: Input size Loop time Loop memory Cmdlet time Cmdlet memory 100 MB 1.1s 158 MB 1.5s 1.5 MB 500 MB 6.1s 979 MB 8.7s 12.9 MB 1 GB 38.5s 1987 MB 16.7s 7.4 MB 2 GB Failed   51.2s 8.6 MB 4 GB Failed   132s 12.7 MB While these specific numbers are highly dependent on the specific hardware and software configuration on my computer, the takeaway is that by using the ForEach-Object cmdlet you can avoid the high memory usage that is involved in reading large files into memory. Although the discussion here has been around the get-content cmdlet, the same is true about any cmdlet that returns objects in a streaming fashion. For example, Import-CSV can have exactly the same performance characteristics as Get-Content. The following code is a typical approach to reading CSV files, which works very well for small files: function process-CSVfile{param($filename)   $objects=import-CSV $filename   foreach($object in $objects){       # do something interesting   }} To see the performance, we will need some large CSV files to work with. Here's a simple function that creates CSV files with approximately the right size that will be appropriate to test. Note that the multipliers used in the function were determined using trial and error, but they give a reasonable 10-column CSV file that is close to the requested size: function new-bigCSVfile{param([string]$path,     [int]$sizeInMB)   if(test-path $path){     remove-item $path   }   new-item -ItemType File -Path $path | out-null   $header="Column1"   2..10 | foreach {$header+=",Column$_"}   $header+="`r`n"   $header | out-file $path -encoding Ascii   $page=$header*12500    1..($sizeInMB) | foreach {$page | out-file $path -      Append -Encoding ascii}} Rewriting the process-CSVfile function to use the streaming property of the pipeline looks similar to the rewritten get-content example, as follows: function process-CSVfile2{param($filename)   import-CSV $filename |       foreach-object -pipelinevariable object{       # do something interesting       }} Now that we have the Get-Performance function, we can easily construct a table of results for the two implementations: Input size Loop time Loop memory Cmdlet time Cmdlet memory 10 MB 9.4s 278 MB 20.9s 4.1 MB 50 MB 62.4s 1335 MB 116.4s 10.3 MB 100 MB 165.5s 2529 MB 361.0s 21.5 MB 200 MB Failed   761.8s 25.8 MB It's clear to see that trying to load the entire file into memory is not a scalable operation. In this case, the memory usage is even higher and the times much slower than with get-content. It would be simple to construct poorly executing examples with cmdlets such as Get-EventLog and Get-WinEvent, and replacing the foreach loop with the ForEach-Object cmdlet will have the same kinds of results in these as well. Having tools like the Get-Performance and Get-ShellMemory functions can be a great help to diagnosing memory scaling problems like this. Another thing to note is that using the pipeline is slower than using the loop, so if you know that the input file sizes are small the loop might be a better choice. Summary In this article we saw how to replace the foreach loop with the foreach-object cmdlet. Resources for Article: Further resources on this subject: Exchange Server 2010 Windows PowerShell: Working with Address Lists [article] Exchange Server 2010 Windows PowerShell: Managing Mailboxes [article] Exchange Server 2010 Windows PowerShell: Troubleshooting Mailboxes [article]
Read more
  • 0
  • 0
  • 18133

article-image-logistic-regression
Packt
27 Nov 2014
9 min read
Save for later

Logistic regression

Packt
27 Nov 2014
9 min read
This article is written by Breck Baldwin and Krishna Dayanidhi, the authors of Natural Language Processing with Java and LingPipe Cookbook. In this article, we will cover logistic regression. (For more resources related to this topic, see here.) Logistic regression is probably responsible for the majority of industrial classifiers, with the possible exception of naïve Bayes classifiers. It almost certainly is one of the best performing classifiers available, albeit at the cost of slow training and considerable complexity in configuration and tuning. Logistic regression is also known as maximum entropy, neural network classification with a single neuron, and others. The classifiers have been based on the underlying characters or tokens, but logistic regression uses unrestricted feature extraction, which allows for arbitrary observations of the situation to be encoded in the classifier. This article closely follows a more complete tutorial at http://alias-i.com/lingpipe/demos/tutorial/logistic-regression/read-me.html. How logistic regression works All that logistic regression does is take a vector of feature weights over the data, apply a vector of coefficients, and do some simple math, which results in a probability for each class encountered in training. The complicated bit is in determining what the coefficients should be. The following are some of the features produced by our training example for 21 tweets annotated for English e and non-English n. There are relatively few features because feature weights are being pushed to 0.0 by our prior, and once a weight is 0.0, then the feature is removed. Note that one category, n, is set to 0.0 for all the features of the n-1 category—this is a property of the logistic regression process that fixes once categories features to 0.0 and adjust all other categories features with respect to that: FEATURE e nI : 0.37 0.0! : 0.30 0.0Disney : 0.15 0.0" : 0.08 0.0to : 0.07 0.0anymore : 0.06 0.0isn : 0.06 0.0' : 0.06 0.0t : 0.04 0.0for : 0.03 0.0que : -0.01 0.0moi : -0.01 0.0_ : -0.02 0.0, : -0.08 0.0pra : -0.09 0.0? : -0.09 0.0 Take the string, I luv Disney, which will only have two non-zero features: I=0.37 and Disney=0.15 for e and zeros for n. Since there is no feature that matches luv, it is ignored. The probability that the tweet is English breaks down to: vectorMultiply(e,[I,Disney]) = exp(.37*1 + .15*1) = 1.68 vectorMultiply(n,[I,Disney]) = exp(0*1 + 0*1) = 1 We will rescale to a probability by summing the outcomes and dividing it: p(e|,[I,Disney]) = 1.68/(1.68 +1) = 0.62p(e|,[I,Disney]) = 1/(1.68 +1) = 0.38 This is how the math works on running a logistic regression model. Training is another issue entirely. Getting ready This example assumes the same framework that we have been using all along to get training data from .csv files, train the classifier, and run it from the command line. Setting up to train the classifier is a bit complex because of the number of parameters and objects used in training. The main() method starts with what should be familiar classes and methods: public static void main(String[] args) throws IOException {String trainingFile = args.length > 0 ? args[0]: "data/disney_e_n.csv";List<String[]> training= Util.readAnnotatedCsvRemoveHeader(new File(trainingFile));int numFolds = 0;XValidatingObjectCorpus<Classified<CharSequence>> corpus= Util.loadXValCorpus(training,numFolds);TokenizerFactory tokenizerFactory= IndoEuropeanTokenizerFactory.INSTANCE; Note that we are using XValidatingObjectCorpus when a simpler implementation such as ListCorpus will do. We will not take advantage of any of its cross-validation features, because the numFolds param as 0 will have training visit the entire corpus. We are trying to keep the number of novel classes to a minimum, and we tend to always use this implementation in real-world gigs anyway. Now, we will start to build the configuration for our classifier. The FeatureExtractor<E> interface provides a mapping from data to features; this will be used to train and run the classifier. In this case, we are using a TokenFeatureExtractor() method, which creates features based on the tokens found by the tokenizer supplied during construction. This is similar to what naïve Bayes reasons over: FeatureExtractor<CharSequence> featureExtractor= new TokenFeatureExtractor(tokenizerFactory); The minFeatureCount item is usually set to a number higher than 1, but with small training sets, this is needed to get any performance. The thought behind filtering feature counts is that logistic regression tends to overfit low-count features that, just by chance, exist in one category of training data. As training data grows, the minFeatureCount value is adjusted usually by paying attention to cross-validation performance: int minFeatureCount = 1; The addInterceptFeature Boolean controls whether a category feature exists that models the prevalence of the category in training. The default name of the intercept feature is *&^INTERCEPT%$^&**, and you will see it in the weight vector output if it is being used. By convention, the intercept feature is set to 1.0 for all inputs. The idea is that if a category is just very common or very rare, there should be a feature that captures just this fact, independent of other features that might not be as cleanly distributed. This models the category probability in naïve Bayes in some way, but the logistic regression algorithm will decide how useful it is as it does with all other features: boolean addInterceptFeature = true;boolean noninformativeIntercept = true; These Booleans control what happens to the intercept feature if it is used. Priors, in the following code, are typically not applied to the intercept feature; this is the result if this parameter is true. Set the Boolean to false, and the prior will be applied to the intercept. Next is the RegressionPrior instance, which controls how the model is fit. What you need to know is that priors help prevent logistic regression from overfitting the data by pushing coefficients towards 0. There is a non-informative prior that does not do this with the consequence that if there is a feature that applies to just one category it will be scaled to infinity, because the model keeps fitting better as the coefficient is increased in the numeric estimation. Priors, in this context, function as a way to not be over confident in observations about the world. Another dimension in the RegressionPrior instance is the expected variance of the features. Low variance will push coefficients to zero more aggressively. The prior returned by the static laplace() method tends to work well for NLP problems. There is a lot going on, but it can be managed without a deep theoretical understanding. double priorVariance = 2;RegressionPrior prior= RegressionPrior.laplace(priorVariance,noninformativeIntercept); Next, we will control how the algorithm searches for an answer. AnnealingSchedule annealingSchedule= AnnealingSchedule.exponential(0.00025,0.999);double minImprovement = 0.000000001;int minEpochs = 100;int maxEpochs = 2000; AnnealingSchedule is best understood by consulting the Javadoc, but what it does is change how much the coefficients are allowed to vary when fitting the model. The minImprovement parameter sets the amount the model fit has to improve to not terminate the search, because the algorithm has converged. The minEpochs parameter sets a minimal number of iterations, and maxEpochs sets an upper limit if the search does not converge as determined by minImprovement. Next is some code that allows for basic reporting/logging. LogLevel.INFO will report a great deal of information about the progress of the classifier as it tries to converge: PrintWriter progressWriter = new PrintWriter(System.out,true);progressWriter.println("Reading data.");Reporter reporter = Reporters.writer(progressWriter);reporter.setLevel(LogLevel.INFO); Here ends the Getting ready section of one of our most complex classes—next, we will train and run the classifier. How to do it... It has been a bit of work setting up to train and run this class. We will just go through the steps to get it up and running: Note that there is a more complex 14-argument train method as well the one that extends configurability. This is the 10-argument version: LogisticRegressionClassifier<CharSequence> classifier= LogisticRegressionClassifier.<CharSequence>train(corpus,featureExtractor,minFeatureCount,addInterceptFeature,prior,annealingSchedule,minImprovement,minEpochs,maxEpochs,reporter); The train() method, depending on the LogLevel constant, will produce from nothing with LogLevel.NONE to the prodigious output with LogLevel.ALL. While we are not going to use it, we show how to serialize the trained model to disk: AbstractExternalizable.compileTo(classifier, new File("models/myModel.LogisticRegression")); Once trained, we will apply the standard classification loop with: Util.consoleInputPrintClassification(classifier); Run the preceding code in the IDE of your choice or use the command-line command: java -cp lingpipe-cookbook.1.0.jar:lib/lingpipe-4.1.0.jar:lib/opencsv-2.4.jar com.lingpipe.cookbook.chapter3.TrainAndRunLogReg The result is a big dump of information about the training: Reading data.:00 Feature Extractor class=class com.aliasi.tokenizer.TokenFeatureExtractor:00 min feature count=1:00 Extracting Training Data:00 Cold start:00 Regression callback handler=null:00 Logistic Regression Estimation:00 Monitoring convergence=true:00 Number of dimensions=233:00 Number of Outcomes=2:00 Number of Parameters=233:00 Number of Training Instances=21:00 Prior=LaplaceRegressionPrior(Variance=2.0,noninformativeIntercept=true):00 Annealing Schedule=Exponential(initialLearningRate=2.5E-4,base=0.999):00 Minimum Epochs=100:00 Maximum Epochs=2000:00 Minimum Improvement Per Period=1.0E-9:00 Has Informative Prior=true:00 epoch= 0 lr=0.000250000 ll= -20.9648 lp=-232.0139 llp= -252.9787 llp*= -252.9787:00 epoch= 1 lr=0.000249750 ll= -20.9406 lp=-232.0195 llp= -252.9602 llp*= -252.9602 The epoch reporting goes on until either the number of epochs is met or the search converges. In the following case, the number of epochs was met: :00 epoch= 1998 lr=0.000033868 ll= -15.4568 lp= -233.8125 llp= -249.2693 llp*= -249.2693 :00 epoch= 1999 lr=0.000033834 ll= -15.4565 lp= -233.8127 llp= -249.2692 llp*= -249.2692 Now, we can play with the classifier a bit: Type a string to be classified. Empty string to quit. I luv Disney Rank Category Score P(Category|Input) 0=e 0.626898085027528 0.626898085027528 1=n 0.373101914972472 0.373101914972472 This should look familiar; it is exactly the same result as the worked example at the start. That's it! You have trained up and used the world's most relevant industrial classifier. However, there's a lot more to harnessing the power of this beast. Summary In this article, we learned how to do logistic regression. Resources for Article: Further resources on this subject: Installing NumPy, SciPy, matplotlib, and IPython [Article] Introspecting Maya, Python, and PyMEL [Article] Understanding the Python regex engine [Article]
Read more
  • 0
  • 0
  • 1775

article-image-about-mongodb
Packt
27 Nov 2014
17 min read
Save for later

About MongoDB

Packt
27 Nov 2014
17 min read
In this article by Amol Nayak, the author of MongoDB Cookbook, describes the various features of MongoDB. (For more resources related to this topic, see here.) MongoDB is a document-oriented database and is the most popular and favorite NoSQL database. The rankings given at http://db-engines.com/en/ranking shows us that MongoDB is sitting on the fifth rank overall as of August 2014 and is the first NoSQL product in this list. It is currently being used in production by a huge list of companies in various domains handling terabytes of data efficiently. MongoDB is developed to scale horizontally and cope up with the increasing data volumes. It is very simple to use and get started with, backed by a good support from its company MongoDB and has a vast array open source and proprietary tools build around it to improve developer and administrator's productivity. In this article, we will cover the following recipes: Single node installation of MongoDB with options from the config file Viewing database stats Creating an index and viewing plans of queries Single node installation of MongoDB with options from the config file As we're aware that providing options from the command line does the work, but it starts getting awkward as soon as the number of options we provide increases. We have a nice and clean alternative to providing the startup options from a configuration file rather than as command-line arguments. Getting ready Well, assuming that we have downloaded the MongoDB binaries from the download site, extracted it, and have the bin directory of MongoDB in the operating system's path variable (this is not mandatory but it really becomes convenient after doing it), the binaries can be downloaded from http://www.mongodb.org/downloads after selecting your host operating system. How to do it… The /data/mongo/db directory for the database and /logs/ for the logs should be created and present on your filesystem, with the appropriate permissions to write to it. Let's take a look at the steps in detail: Create a config file, which can have any arbitrary name. In our case, let's say we create the file at /conf/mongo.conf. We will then edit the file and add the following lines of code to it: port = 27000 dbpath = /data/mongo/db logpath = /logs/mongo.log smallfiles = true Start the Mongo server using the following command: > mongod --config /config/mongo.conf How it works… The properties are specified as <property name> = <value>. For all those properties that don't have values, for example, the smallfiles option, the value given is a Boolean value, true. If you need to have a verbose output, you will add v=true (or multiple v's to make it more verbose) to our config file. If you already know what the command-line option is, it is pretty easy to guess the value of the property in the file. It is the same as the command-line option, with just the hyphen removed. Viewing database stats In this recipe, we will see how to get the statistics of a database. Getting ready To find the stats of the database, we need to have a server up and running, and a single node is what should be ok. The data on which we would be operating needs to be imported into the database. Once these steps are completed, we are all set to go ahead with this recipe. How to do it… We will be using the test database for the purpose of this recipe. It already has the postalCodes collection in it. Let's take a look at the steps in detail: Connect to the server using the Mongo shell by typing in the following command from the operating system terminal (it is assumed that the server is listening to port 27017): $ mongo On the shell, execute the following command and observe the output: > db.stats() Now, execute the following command, but this time with the scale parameter (observe the output): > db.stats(1024) { "db" : "test", "collections" : 3, "objects" : 39738, "avgObjSize" : 143.32699179626553, "dataSize" : 5562, "storageSize" : 16388, "numExtents" : 8, "indexes" : 2, "indexSize" : 2243, "fileSize" : 196608, "nsSizeMB" : 16, "dataFileVersion" : {    "major" : 4,    "minor" : 5 }, "ok" : 1 } How it works… Let us start by looking at the collections field. If you look carefully at the number and also execute the show collections command on the Mongo shell, you shall find one extra collection in the stats as compared to those by executing the command. The difference is for one collection, which is hidden, and its name is system.namespaces. You may execute db.system.namespaces.find() to view its contents. Getting back to the output of stats operation on the database, the objects field in the result has an interesting value too. If we find the count of documents in the postalCodes collection, we see that it is 39732. The count shown here is 39738, which means there are six more documents. These six documents come from the system.namespaces and system.indexes collection. Executing a count query on these two collections will confirm it. Note that the test database doesn't contain any other collection apart from postalCodes. The figures will change if the database contains more collections with documents in it. The scale parameter, which is a parameter to the stats function, divides the number of bytes with the given scale value. In this case, it is 1024, and hence, all the values will be in KB. Let's analyze the output: > db.stats(1024) { "db" : "test", "collections" : 3, "objects" : 39738, "avgObjSize" : 143.32699179626553, "dataSize" : 5562, "storageSize" : 16388, "numExtents" : 8, "indexes" : 2, "indexSize" : 2243, "fileSize" : 196608, "nsSizeMB" : 16, "dataFileVersion" : {    "major" : 4,    "minor" : 5 }, "ok" : 1 } The following table shows the meaning of the important fields: Field Description db This is the name of the database whose stats are being viewed. collections This is the total number of collections in the database. objects This is the count of documents across all collections in the database. If we find the stats of a collection by executing db.<collection>.stats(), we get the count of documents in the collection. This attribute is the sum of counts of all the collections in the database. avgObjectSize This is simply the size (in bytes) of all the objects in all the collections in the database, divided by the count of the documents across all the collections. This value is not affected by the scale provided even though this is a size field. dataSize This is the total size of the data held across all the collections in the database. This value is affected by the scale provided. storageSize This is the total amount of storage allocated to collections in this database for storing documents. This value is affected by the scale provided. numExtents This is the count of all the number of extents in the database across all the collections. This is basically the sum of numExtents in the collection stats for collections in this database. indexes This is the sum of number of indexes across all collections in the database. indexSize This is the size (in bytes) for all the indexes of all the collections in the database. This value is affected by the scale provided. fileSize This is simply the addition of the size of all the database files you should find on the filesystem for this database. The files will be named test.0, test.1, and so on for the test database. This value is affected by the scale provided. nsSizeMB This is the size of the file in MBs for the .ns file of the database. Another thing to note is the value of the avgObjectSize, and there is something weird in this value. Unlike this very field in the collection's stats, which is affected by the value of the scale provided. In database stats, this value is always in bytes, which is pretty confusing and one cannot really be sure why this is not scaled according to the provided scale. Creating an index and viewing plans of queries In this recipe, we will look at querying data, analyzing its performance by explaining the query plan, and then optimizing it by creating indexes. Getting ready For the creation of indexes, we need to have a server up and running. A simple single node is what we will need. The data with which we will be operating needs to be imported in the database. Once we have this prerequisite, we are good to go. How to do it… We will trying to write a query that will find all the zip codes in a given state. To do this, perform the following steps: Execute the following query to view the plan of a query: > db.postalCodes.find({state:'Maharashtra'}).explain() Take a note of the cursor, n, nscannedObjects, and millis fields in the result of the explain plan operation Let's execute the same query again, but this time, we will limit the results to only 100 results: > db.postalCodes.find({state:'Maharashtra'}).limit(100).explain() Again, take a note of the cursor, n, nscannedObjects, and millis fields in the result We will now create an index on the state and pincode fields as follows: > db.postalCodes.ensureIndex({state:1, pincode:1}) Execute the following query: > db.postalCodes.find({state:'Maharashtra'}).explain() Again, take a note of the cursor, n, nscannedObjects, millis, and indexOnly fields in the result Since we want only the pin codes, we will modify the query as follows and view its plan: > db.postalCodes.find({state:'Maharashtra'}, {pincode:1, _id:0}).explain() Take a note of the cursor, n, nscannedObjects, nscanned, millis, and indexOnly fields in the result. How it works… There is a lot to explain here. We will first discuss what we just did and how to analyze the stats. Next, we will discuss some points to be kept in mind for the index creation and some gotchas. Analysis of the plan Let's look at the first step and analyze the output we executed: > db.postalCodes.find({state:'Maharashtra'}).explain() The output on my machine is as follows (I am skipping the nonrelevant fields for now): {        "cursor" : "BasicCursor",          "n" : 6446,        "nscannedObjects" : 39732,        "nscanned" : 39732,          …        "millis" : 55, …       } The value of the cursor field in the result is BasicCursor, which means a full collection scan (all the documents are scanned one after another) has happened to search the matching documents in the entire collection. The value of n is 6446, which is the number of results that matched the query. The nscanned and nscannedobjects fields have values of 39,732, which is the number of documents in the collection that are scanned to retrieve the results. This is the also the total number of documents present in the collection, and all were scanned for the result. Finally, millis is the number of milliseconds taken to retrieve the result. Improving the query execution time So far, the query doesn't look too good in terms of performance, and there is great scope for improvement. To demonstrate how the limit applied to the query affects the query plan, we can find the query plan again without the index but with the limit clause: > db.postalCodes.find({state:'Maharashtra'}).limit(100).explain()   { "cursor" : "BasicCursor", …      "n" : 100,      "nscannedObjects" : 19951,      "nscanned" : 19951,        …      "millis" : 30,        … } The query plan this time around is interesting. Though we still haven't created an index, we saw an improvement in the time the query took for execution and the number of objects scanned to retrieve the results. This is due to the fact that Mongo does not scan the remaining documents once the number of documents specified in the limit function is reached. We can thus conclude that it is recommended that you use the limit function to limit your number of results, whereas the maximum number of documents accessed is known upfront. This might give better query performance. The word "might" is important, as in the absence of index, the collection might still be completely scanned if the number of matches is not met. Improvement using indexes Moving on, we will create a compound index on state and pincode. The order of the index is ascending in this case (as the value is 1) and is not significant unless we plan to execute a multikey sort. This is a deciding factor as to whether the result can be sorted using only the index or whether Mongo needs to sort it in memory later on, before we return the results. As far as the plan of the query is concerned, we can see that there is a significant improvement: {        "cursor" : "BtreeCursor state_1_pincode_1", …          "n" : 6446,        "nscannedObjects" : 6446,        "nscanned" : 6446, …        "indexOnly" : false, …        "millis" : 16, … } The cursor field now has the BtreeCursor state_1_pincode_1 value , which shows that the index is indeed used now. As expected, the number of results stays the same at 6446. The number of objects scanned in the index and documents scanned in the collection have now reduced to the same number of documents as in the result. This is because we now used an index that gave us the starting document from which we could scan, and then, only the required number of documents were scanned. This is similar to using the book's index to find a word or scanning the entire book to search for the word. The time, millis has come down too, as expected. Improvement using covered indexes This leaves us with one field, indexOnly, and we will see what this means. To know what this value is, we need to look briefly at how indexes operate. Indexes store a subset of fields of the original document in the collection. The fields present in the index are the same as those on which the index is created. The fields, however, are kept sorted in the index in an order specified during the creation of the index. Apart from the fields, there is an additional value stored in the index; this acts as a pointer to the original document in the collection. Thus, whenever the user executes a query, if the query contains fields on which an index is present, the index is consulted to get a set of matches. The pointer stored with the index entries that match the query is then used to make another IO operation to fetch the complete document from the collection; this document is then returned to the user. The value of indexOnly, which is false, indicates that the data requested by the user in the query is not entirely present in the index, but an additional IO operation is needed to retrieve the entire document from the collection that follows the pointer from the index. Had the value been present in the index itself, an additional operation to retrieve the document from the collection will not be necessary, and the data from the index will be returned. This is called covered index, and the value of indexOnly, in this case, will be true. In our case, we just need the pin codes, so why not use projection in our queries to retrieve just what we need? This will also make the index covered as the index entry that just has the state's name and pin code, and the required data can be served completely without retrieving the original document from the collection. The plan of the query in this case is interesting too. Executing the following query results in the following plan: db.postalCodes.find({state:'Maharashtra'}, {pincode:1, _id:0}).explain() {        "cursor" : "BtreeCursor state_1_pincode_1", …        "n" : 6446,        "nscannedObjects" : 0,        "nscanned" : 6446, … "indexOnly" : true, …            "millis" : 15, … } The values of the nscannedobjects and indexOnly fields are something to be observed. As expected, since the data we requested in the projection in the find query is pin code only, which can be served from the index alone, the value of indexOnly is true. In this case, we scanned 6,446 entries in the index, and thus, the nscanned value is 6446. We, however, didn't reach out to any document in the collection on the disk, as this query was covered by the index alone, and no additional IO was needed to retrieve the entire document. Hence, the value of nscannedobjects is 0. As this collection in our case is small, we do not see a significant difference in the execution time of the query. This will be more evident on larger collections. Making use of indexes is great and gives good performance. Making use of covered indexes gives even better performance. Another thing to remember is that wherever possible, try and use projection to retrieve only the number of fields we need. The _id field is retrieved every time by default, unless we plan to use it set _id:0 to not retrieve it if it is not part of the index. Executing a covered query is the most efficient way to query a collection. Some gotchas of index creations We will now see some pitfalls in index creation and some facts where the array field is used in the index. Some of the operators that do not use the index efficiently are the $where, $nin, and $exists operators. Whenever these operators are used in the query, one should bear in mind a possible performance bottleneck when the data size increases. Similarly, the $in operator must be preferred over the $or operator, as both can be more or less used to achieve the same result. As an exercise, try to find the pin codes in the state of Maharashtra and Gujarat from the postalCodes collection. Write two queries: one using the $or operator and the other using the $in operator. Explain the plan for both these queries. What happens when an array field is used in the index? Mongo creates an index entry for each element present in the array field of a document. So, if there are 10 elements in an array in a document, there will be 10 index entries, one for each element in the array. However, there is a constraint while creating indexes that contain array fields. When creating indexes using multiple fields, not more than one field can be of the array type. This is done to prevent the possible explosion in the number of indexes on adding even a single element to the array used in the index. If we think of it carefully, for each element in the array, an index entry is created. If multiple fields of type array were allowed to be part of an index, we would have a large number of entries in the index, which would be a product of the length of these array fields. For example, a document added with two array fields, each of length 10, will add 100 entries to the index, had it been allowed to create one index using these two array fields. This should be good enough for now to scratch the surfaces of plain vanilla index. Summary This article provides detailed recipes that describe how to use the different features of MongoDB. MongoDB is a document-oriented, leading NoSQL database, which offers linear scalability, thus making it a good contender for high-volume, high-performance systems across all business domains. It has an edge over the majority of NoSQL solutions for its ease of use, high performance, and rich features. In this article, we learned how to start single node installations of MongoDB with options from the config file. We also learned how to create an index from the shell and viewing plans of queries. Resources for Article: Further resources on this subject: Ruby with MongoDB for Web Development [Article] MongoDB data modeling [Article] Using Mongoid [Article]
Read more
  • 0
  • 0
  • 3259

article-image-setting-qt-creator-android
Packt
27 Nov 2014
8 min read
Save for later

Setting up Qt Creator for Android

Packt
27 Nov 2014
8 min read
This article by Ray Rischpater, the author of the book Application Development with Qt Creator Second Edition, focusses on setting up Qt Creator for Android. Android's functionality is delimited in API levels; Qt for Android supports Android level 10 and above: that's Android 2.3.3, a variant of Gingerbread. Fortunately, most devices in the market today are at least Gingerbread, making Qt for Android a viable development platform for millions of devices. Downloading all the pieces To get started with Qt Creator for Android, you're going to need to download a lot of stuff. Let's get started: Begin with a release of Qt for Android, which was either. For this, you need to download it from http://qt-project.org/downloads. The Android developer tools require the current version of the Java Development Kit (JDK) (not just the runtime, the Java Runtime Environment, but the whole kit and caboodle); you can download it from http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html. You need the latest Android Software Development Kit (SDK), which you can download for Mac OS X, Linux, or Windows at http://developer.android.com/sdk/index.html. You need the latest Android Native Development Kit (NDK), which you can download at http://developer.android.com/tools/sdk/ndk/index.html. You need the current version of Ant, the Java build tool, which you can download at http://ant.apache.org/bindownload.cgi. Download, unzip, and install each of these, in the given order. On Windows, I installed the Android SDK and NDK by unzipping them to the root of my hard drive and installed the JDK at the default location I was offered. Setting environment variables Once you install the JDK, you need to be sure that you've set your JAVA_HOME environment variable to point to the directory where it was installed. How you will do this differs from platform to platform; on a Mac OS X or Linux box, you'd edit .bashrc, .tcshrc, or the likes; on Windows, go to System Properties, click on Environment Variables, and add the JAVA_HOME variable. The path should point to the base of the JDK directory; for me, it was C:Program FilesJavajdk1.7.0_25, although the path for you will depend on where you installed the JDK and which version you installed. (Make sure you set the path with the trailing directory separator; the Android SDK is pretty fussy about that sort of thing.) Next, you need to update your PATH to point to all the stuff you just installed. Again, this is an environment variable and you'll need to add the following: There are two different components/subsystems shown in the diagram. The first is YARN, which is the new resource management layer introduced in Hadoop 2.0. The second is HDFS. Let's first delve into HDFS since that has not changed much since Hadoop 1.0. The bin directory of your JDK The androidsdktools directory The androidsdkplatform-tools directory For me, on my Windows 8 computer, my PATH includes this now: …C:Program FilesJavajdk1.7.0_25bin;C:adt-bundle- windows-x86_64-20130729sdktools;;C:adt-bundlewindows-x86_64- 20130729sdkplatform-tools;… Don't forget the separators: on Windows, it's a semicolon (;), while on Mac OS X and Linux, it's a colon (:). An environment variable is a variable maintained by your operating system which affects its configuration; see http://en.wikipedia.org/wiki/Environment_variable for more details. At this point, it's a good idea to restart your computer (if you're running Windows) or log out and log in again (on Linux or Mac OS X) to make sure that all these settings take effect. If you're on a Mac OS X or Linux box, you might be able to start a new terminal and have the same effect (or reload your shell configuration file) instead, but I like the idea of restarting at this point to ensure that the next time I start everything up, it'll work correctly. Finishing the Android SDK installation Now, we need to use the Android SDK tools to ensure that you have a full version of the SDK for at least one Android API level installed. We'll need to start Eclipse, the Android SDK's development environment, and run the Android SDK manager. To do this, follow these steps: Find Eclipse. It's probably in the Eclipse directory of the directory where you installed the Android SDK. If Eclipse doesn't start, check your JAVA_HOME and PATH variables; the odds are that Eclipse will not find the Java environment it needs to run. Click on OK when Eclipse prompts you for a workspace. This doesn't matter; you won't use Eclipse except to download Android SDK components. Click on the Android SDK Manager button in the Eclipse toolbar (circled in the next screenshot): Make sure that you have at least one Android API level above API level 10 installed, along with the Google USB Driver (you'll need this to debug on the hardware). Quit Eclipse. Next, let's see whether the Android Debug Bridge—the software component that transfers your executables to your Android device and supports on-device debugging—is working as it should. Fire up a shell prompt and type adb. If you see a lot of output and no errors, the bridge is correctly installed. If not, go back and check your PATH variable to be sure it's correct. While you're at it, you should developer-enable your Android device too so that it'll work with ADB. Follow the steps provided at http://bit.ly/1a29sal. Configuring Qt Creator Now, it's time to tell Qt Creator about all the stuff you just installed. Perform the following steps: Start Qt Creator but don't create a new project. Under the Tools menu, select Options and then click on Android. Fill in the blanks, as shown in the next screenshot. They should be: The path to the SDK directory, in the directory where you installed the Android SDK. The path to where you installed the Android NDK. Check Automatically create kits for Android tool chains. The path to Ant; here, enter either the path to the Ant executable itself on Mac OS X and Linux platforms or the path to ant.bat in the bin directory of the directory where you unpacked Ant. The directory where you installed the JDK (this might be automatically picked up from your JAVA_HOME directory), as shown in the following screenshot: Click on OK to close the Options window. You should now be able to create a new Qt GUI or Qt Quick application for Android! Do so, and ensure that Android is a target option in the wizard, as the next screenshot shows; be sure to choose at least one ARM target, one x86 target, and one target for your desktop environment: If you want to add Android build configurations to an existing project, the process is slightly different. Perform the following steps: Load the project as you normally would. Click on Projects in the left-hand side pane. The Projects pane will open. Click on Add Kit and choose the desired Android (or other) device build kit. The following screenshot shows you where the Projects and Add Kit buttons are in Qt Creator: Building and running your application Write and build your application normally. A good idea is to build the Qt Quick Hello World application for Android first before you go to town and make a lot of changes, and test the environment by compiling for the device. When you're ready to run on the device, perform the following steps: Navigate to Projects (on the left-hand side) and then select the Android for arm kit's Run Settings. Under Package Configurations, ensure that the Android SDK level is set to the SDK level of the SDK you installed. Ensure that the Package name reads something similar to org.qtproject.example, followed by your project name. Connect your Android device to your computer using the USB cable. Select the Android for arm run target and then click on either Debug or Run to debug or run your application on the device. Summary Qt for Android gives you an excellent leg up on mobile development, but it's not a panacea. If you're planning to target mobile devices, you should be sure to have a good understanding of the usage patterns for your application's users as well as the constraints in CPU, GPU, memory, and network that a mobile application must run on. Once we understand these, however, all of our skills with Qt Creator and Qt carry over to the mobile arena. To develop for Android, begin by installing the JDK, Android SDK, Android NDK, and Ant, and then develop applications as usual: compiling for the device and running on the device frequently to iron out any unexpected problems along the way. Resources for Article: Further resources on this subject: Reversing Android Applications [article] Building Android (Must know) [article] Introducing an Android platform [article]
Read more
  • 0
  • 0
  • 14930

article-image-ogc-esri-professionals
Packt
27 Nov 2014
16 min read
Save for later

OGC for ESRI Professionals

Packt
27 Nov 2014
16 min read
In this article by Stefano Iacovella, author of GeoServer Cookbook, we look into a brief comparison between GeoServer and ArcGIS for Server, a map server created by ESRI. The importance of adopting OGC standards when building a geographical information system is stressed. We will also learn how OGC standards let us create a system where different pieces of software cooperate with each other. (For more resources related to this topic, see here.) ArcGIS versus GeoServer As an ESRI professional, you obviously know the server product from this vendor that can be compared to GeoServer well. It is called ArcGIS for Server and in many ways it can play the same role as that of GeoServer, and the opposite is true as well, of course. Undoubtedly, the big question for you is: why should I use GeoServer and not stand safely on the vendor side, leveraging on integration with the other software members of the big ArcGIS family? Listening to colleagues, asking to experts, and browsing on the Internet, you'll find a lot of different answers to this question, often supported by strong arguments and somehow by a religious and fanatic approach. There are a few benchmarks available on the Internet that compare performances of GeoServer and other open source map servers versus ArcGIS for Server. Although they're not definitely authoritative, a reasonably objective advantage of GeoServer and its OS cousins on ArcGIS for Server is recognizable. Anyway, I don't think that your choice should overestimate the importance of its performance. I'm sorry but my answer to your original question is another question: why should you choose a particular piece of software? This may sound puzzling, so let me elaborate a bit on the topic. Let's say you are an IT architect and a customer asked you to design a solution for a GIS portal. Of course, in that specific case, you have to give him or her a detailed response, containing specific software that'll be used for data publication. Also, as a professional, you'll arrive to the solution by accurately considering all requirements and constraints that can be inferred from the talks and surveying what is already up and running at the customer site. Then, a specific answer to what the software best suited for the task is should exist in any specific case. However, if you consider the question from a more general point of view, you should be aware that a map server, which is the best choice for any specific case, does not exist. You may find that the licensing costs a limit in some case or the performances in some other cases will lead you to a different choice. Also, as in any other job, the best tool is often the one you know better, and this is quite true when you are in a hurry and your customer can't wait to have the site up and running. So the right approach, although a little bit generic, is to keep your mind open and try to pick the right tool for any scenario. However, a general answer does exist. It's not about the vendor or the name of the piece of software you're going to use; it's about the way the components or your system communicate among them and with external systems. It's about standard protocol. This is a crucial consideration for any GIS architect or developer; nevertheless, if you're going to use an ESRI suite of products or open source tools, you should create your system with special care to expose data with open standards. Understanding standards Let's take a closer look at what standards are and why they're so important when you are designing your GIS solution. The term standard as mentioned in Wikipedia (http://en.wikipedia.org/wiki/ Technical_standard) may be explained as follows: "An established norm or requirement in regard to technical systems. It is usually a formal document that establishes uniform engineering or technical criteria, methods, processes and practices. In contrast, a custom, convention, company product, corporate standard, etc. that becomes generally accepted and dominant is often called a de facto standard." Obviously, a lot of standards exist if you consider the Information Technology domain. Standards are usually formalized by standards organization, which usually involves several members from different areas, such as government agencies, private companies, education, and so on. In the GIS world, an authoritative organization is the Open Geospatial Consortium (OGC), which you may find often cited in this book in many links to the reference information. In recent years, OGC has been publishing several standards that cover the interaction of the GIS system and details on how data is transferred from one software to another. We'll focus on three of them that are widely used and particularly important for GeoServer and ArcGIS for Server: WMS: This is the acronym for Web Mapping Service. This standard describes how a server should publish data for mapping purposes, which is a static representation of data. WFS: This is the acronym for Web Feature Service. This standard describes the details of publishing data for feature streaming to a client. WCS: This is the acronym for Web Coverage Service. This standard describes the details of publishing data for raster data streaming to a client. It's the equivalent of WFS applied to raster data. Now let's dive into these three standards. We'll explore the similarities and differences among GeoServer and ArcGIS for Server. WMS versus the mapping service As an ESRI user, you surely know how to publish some data in a map service. This lets you create a web service that can be used by a client who wants to show the map and data. This is the proprietary equivalent of exposing data through a WMS service. With WMS, you can inquire the server for its capabilities with an HTTP request: $ curl -XGET -H 'Accept: text/xml' 'http://localhost:8080/geoserver/wms?service=WMS &version=1.1.1&request=GetCapabilities' -o capabilitiesWMS.xml Browsing through the XML document, you'll know which data is published and how this can be represented. If you're using the proprietary way of exposing map services with ESRI, you can perform a similar query that starts from the root: $ curl -XGET 'http://localhost/arcgis/rest/services?f=pjson' -o capabilitiesArcGIS.json The output, in this case formatted as a JSON file, is a text file containing the first of the services and folders available to an anonymous user. It looks like the following code snippet: {"currentVersion": 10.22,"folders": ["Geology","Cultural data",…"Hydrography"],"services": [{"name": "SampleWorldCities","type": "MapServer"}]} At a glance, you can recognize two big differences here. Firstly, there are logical items, which are the folders that work only as a container for services. Secondly, there is no complete definition of items, just a list of elements contained at a certain level of a publishing tree. To obtain specific information about an element, you can perform another request pointing to the item: $ curl -XGET 'http://localhost/arcgis/rest/ services/SampleWorldCities/MapServer?f=pjson' -o SampleWorldCities.json Setting up an ArcGIS site is out of the scope of this book; besides, this appendix assumes that you are familiar with the software and its terminology. Anyway, all the examples use the SampleWorldCities service, which is a default service created by the standard installation. In the new JSON file, you'll find a lot of information about the specific service: {"currentVersion": 10.22,"serviceDescription": "A sample service just for demonstation.","mapName": "World Cities Population","description": "","copyrightText": "","supportsDynamicLayers": false,"layers": [{"id": 0,"name": "Cities","parentLayerId": -1,"defaultVisibility": true,"subLayerIds": null,"minScale": 0,"maxScale": 0},…"supportedImageFormatTypes":"PNG32,PNG24,PNG,JPG,DIB,TIFF,EMF,PS,PDF,GIF,SVG,SVGZ,BMP",…"capabilities": "Map,Query,Data","supportedQueryFormats": "JSON, AMF","exportTilesAllowed": false,"maxRecordCount": 1000,"maxImageHeight": 4096,"maxImageWidth": 4096,"supportedExtensions": "KmlServer"} Please note the information about the image format supported. We're, in fact, dealing with a map service. As for the operation supported, this one shows three different operations: Map, Query, and Data. For the first two, you can probably recognize the equivalent of the GetMap and GetFeatureinfo operations of WMS, while the third one is little bit more mysterious. In fact, it is not relevant to map services and we'll explore it in the next paragraph. If you're familiar with the GeoServer REST interface, you can see the similarities in the way you can retrieve information. We don't want to explore the ArcGIS for Server interface in detail and how to handle it. What is important to understand is the huge difference with the standard WMS capabilities document. If you're going to create a client to interact with maps produced by a mix of ArcGIS for Server and GeoServer, you should create different interfaces for both. In one case, you can interact with the proprietary REST interface and use the standard WMS for GeoServer. However, there is good news for you. ESRI also supports standards. If you go to the map service parameters page, you can change the way the data is published.   The situation shown in the previous screenshot is the default capabilities configuration. As you can see, there are options for WMS, WFS, and WCS, so you can expose your data with ArcGIS for Server according to the OGC standards. If you enable the WMS option, you can now perform this query: $ curl -XGET 'http://localhost/arcgis/ services/SampleWorldCities/MapServer/ WMSServer?SERVICE=WMS&VERSION=1.3.0&REQUEST=GetCapabilities'    -o capabilitiesArcGISWMS.xml The information contained is very similar to that of the GeoServer capabilities. A point of attention is about fundamental differences in data publishing with the two software. In ArcGIS for Server, you always start from a map project. A map project is a collection of datasets, containing vector or raster data, with a drawing order, a coordinate reference system, and rules to draw. It is, in fact, very similar to a map project you can prepare with a GIS desktop application. Actually, in the ESRI world, you should use ArcGIS for desktop to prepare the map project and then publish it on the server. In GeoServer, the map concept doesn't exist. You publish data, setting several parameters, and the map composition is totally demanded to the client. You can only mimic a map, server side, using the group layer for a logical merge of several layers in a single entity. In ArcGIS for Server, the map is central to the publication process; also, if you just want to publish a single dataset, you have to create a map project, containing just that dataset, and publish it. Always remember this different approach; when using WMS, you can use the same operation on both servers. A GetMap request on the previous map service will look like this: $ curl -XGET 'http://localhost/arcgis/services/ SampleWorldCities/MapServer/WMSServer?service= WMS&version=1.1.0&request=GetMap&layers=fields&styles =&bbox=47.130647,8.931116,48.604188,29.54223&srs= EPSG:4326&height=445&width=1073&format=img/png' -o map.png Please note that you can filter what layers will be drawn in the map. By default, all the layers contained in the map service definition will be drawn. WFS versus feature access If you open the capabilities panel for the ArcGIS service again, you will note that there is an option called feature access. This lets you enable the feature streaming to a client. With this option enabled, your clients can acquire features and symbology information to ArcGIS and render them directly on the client side. In fact, feature access can also be used to edit features, that is, you can modify the features on the client and then post the changes on the server. When you check the Feature Access option, many specific settings appear. In particular, you'll note that by default, the Update operation is enabled, but the Geometry Updates is disabled, so you can't edit the shape of each feature. If you want to stream features using a standard approach, you should instead turn on the WFS option. ArcGIS for Server supports versions 1.1 and 1.0 of WFS. Moreover, the transactional option, also known as WFS-T, is fully supported.   As you can see in the previous screenshot, when you check the WFS option, several more options appear. In the lower part of the panel, you'll find the option to enable the transaction, which is the editing feature. In this case, there is no separate option for geometry and attributes; you can only decide to enable editing on any part of your features. After you enable the WFS, you can access the capabilities from this address: $ curl -XGET 'http://localhost/arcgis/services/ SampleWorldCities/MapServer/WFSServer?SERVICE=WFS&VERSION=1.1. 0&REQUEST=GetCapabilities' -o capabilitiesArcGISWFS.xml Also, a request for features is shown as follows: $ curl -XGET "http://localhost/arcgis/services/SampleWorldCities /MapServer/WFSServer?service=wfs&version=1.1.0 &request=GetFeature&TypeName=SampleWorldCities: cities&maxFeatures=1" -o getFeatureArcGIS.xml This will output a GML code as a result of your request. As with WMS, the syntax is the same. You only need to pay attention to the difference between the service and the contained layers: <wfs:FeatureCollection xsi:schemaLocation="http://localhost/arcgis/services/SampleWorldCities/MapServer/WFSServer http://localhost/arcgis/services/SampleWorldCities/MapServer/WFSServer?request=DescribeFeatureType%26version=1.1.0%26typename=citieshttp://www.opengis.net/wfs http://schemas.opengis.net/wfs/1.1.0/wfs.xsd"><gml:boundedBy><gml:Envelope srsName="urn:ogc:def:crs:EPSG:6.9:4326"><gml:lowerCorner>-54.7919921875 -176.1514892578125</gml:lowerCorner><gml:upperCorner>78.2000732421875179.221923828125</gml:upperCorner></gml:Envelope></gml:boundedBy><gml:featureMember><SampleWorldCities:cities gml_id="F4__1"><SampleWorldCities:OBJECTID>1</SampleWorldCities:OBJECTID><SampleWorldCities:Shape><gml:Point><gml:pos>-15.614990234375 -56.093017578125</gml:pos></gml:Point></SampleWorldCities:Shape><SampleWorldCities:CITY_NAME>Cuiaba</SampleWorldCities:CITY_NAME><SampleWorldCities:POP>521934</SampleWorldCities:POP><SampleWorldCities:POP_RANK>3</SampleWorldCities:POP_RANK><SampleWorldCities:POP_CLASS>500,000 to999,999</SampleWorldCities:POP_CLASS><SampleWorldCities:LABEL_FLAG>0</SampleWorldCities:LABEL_FLAG></SampleWorldCities:cities></gml:featureMember></wfs:FeatureCollection> Publishing raster data with WCS The WCS option is always present in the panel to configure services. As we already noted, WCS is used to publish raster data, so this may sound odd to you. Indeed, ArcGIS for Server lets you enable the WCS option, only if the map project for the service contains one of the following: A map containing raster or mosaic layers A raster or mosaic dataset A layer file referencing a raster or mosaic dataset A geodatabase that contains raster data If you try to enable the WCS option on SampleWorldCities, you won't get an error. Then, try to ask for the capabilities: $ curl -XGET "http://localhost/arcgis/services /SampleWorldCities/MapServer/ WCSServer?SERVICE=WCS&VERSION=1.1.1&REQUEST=GetCapabilities" -o capabilitiesArcGISWCS.xml You'll get a proper document, compliant to the standard and well formatted, but containing no reference to any dataset. Indeed, the sample service does not contain any raster data:  <Capabilities xsi_schemaLocation="http://www.opengis.net/wcs/1.1.1http://schemas.opengis.net/wcs/1.1/wcsGetCapabilities.xsdhttp://www.opengis.net/ows/1.1/http://schemas.opengis.net/ows/1.1.0/owsAll.xsd"version="1.1.1"><ows:ServiceIdentification><ows:Title>WCS</ows:Title><ows:ServiceType>WCS</ows:ServiceType><ows:ServiceTypeVersion>1.0.0</ows:ServiceTypeVersion><ows:ServiceTypeVersion>1.1.0</ows:ServiceTypeVersion><ows:ServiceTypeVersion>1.1.1</ows:ServiceTypeVersion><ows:ServiceTypeVersion>1.1.2</ows:ServiceTypeVersion><ows:Fees>NONE</ows:Fees><ows:AccessConstraints>None</ows:AccessConstraints></ows:ServiceIdentification>...<Contents><SupportedCRS>urn:ogc:def:crs:EPSG::4326</SupportedCRS><SupportedFormat>image/GeoTIFF</SupportedFormat><SupportedFormat>image/NITF</SupportedFormat><SupportedFormat>image/JPEG</SupportedFormat><SupportedFormat>image/PNG</SupportedFormat><SupportedFormat>image/JPEG2000</SupportedFormat><SupportedFormat>image/HDF</SupportedFormat></Contents></Capabilities> If you want to try out WCS, other than the GetCapabilities operation, you need to publish a service with raster data; or, you may take a look at the sample service from ESRI arcgisonline™. Try the following request: $ curl -XGET "http://sampleserver3.arcgisonline.com/ ArcGIS/services/World/Temperature/ImageServer/ WCSServer?SERVICE=WCS&VERSION=1.1.0&REQUEST=GETCAPABILITIES" -o capabilitiesArcGISWCS.xml Parsing the XML file, you'll find that the contents section now contains coverage, raster data that you can retrieve from that server:  …<Contents><CoverageSummary><ows:Title>Temperature1950To2100_1</ows:Title><ows:Abstract>Temperature1950To2100</ows:Abstract><ows:WGS84BoundingBox><ows:LowerCorner>-179.99999999999994 -55.5</ows:LowerCorner><ows:UpperCorner>180.00000000000006 83.5</ows:UpperCorner></ows:WGS84BoundingBox><Identifier>1</Identifier></CoverageSummary><SupportedCRS>urn:ogc:def:crs:EPSG::4326</SupportedCRS><SupportedFormat>image/GeoTIFF</SupportedFormat><SupportedFormat>image/NITF</SupportedFormat><SupportedFormat>image/JPEG</SupportedFormat><SupportedFormat>image/PNG</SupportedFormat><SupportedFormat>image/JPEG2000</SupportedFormat><SupportedFormat>image/HDF</SupportedFormat></Contents> You can, of course, use all the operations supported by standard. The following request will return a full description of one or more coverages within the service in the GML format. An example of the URL is shown as follows: $ curl -XGET "http://sampleserver3.arcgisonline.com/ ArcGIS/services/World/Temperature/ImageServer/ WCSServer?SERVICE=WCS&VERSION=1.1.0&REQUEST=DescribeCoverage& COVERAGE=1" -o describeCoverageArcGISWCS.xml Also, you can obviously request for data, and use requests that will return coverage in one of the supported formats, namely GeoTIFF, NITF, HDF, JPEG, JPEG2000, and PNG. Another URL example is shown as follows: $ curl -XGET "http://sampleserver3.arcgisonline.com/ ArcGIS/services/World/Temperature/ImageServer/ WCSServer?SERVICE=WCS&VERSION=1.0.0 &REQUEST=GetCoverage&COVERAGE=1&CRS=EPSG:4326 &RESPONSE_CRS=EPSG:4326&BBOX=-158.203125,- 105.46875,158.203125,105.46875&WIDTH=500&HEIGHT=500&FORMAT=jpeg" -o coverage.jpeg  Summary In this article, we started with the differences between ArcGIS and GeoServer and then moved on to understanding standards. Then we went on to compare WMS with mapping service as well as WFS with feature access. Finally we successfully published a raster dataset with WCS. Resources for Article: Further resources on this subject: Getting Started with GeoServer [Article] Enterprise Geodatabase [Article] Sending Data to Google Docs [Article]
Read more
  • 0
  • 0
  • 2703

article-image-high-availability-scenarios
Packt
26 Nov 2014
14 min read
Save for later

High Availability Scenarios

Packt
26 Nov 2014
14 min read
"Live Migration between hosts in a Hyper-V cluster is very straightforward and requires no specific configuration, apart from type and amount of simultaneous Live Migrations. If you add multiple clusters and standalone Hyper-V hosts into the mix, I strongly advise you to configure Kerberos Constrained Delegation for all hosts and clusters involved." Hans Vredevoort – MVP Hyper-V This article written by Benedict Berger, the author of Hyper-V Best Practices, will guide you through the installation of Hyper-V clusters and their best practice configuration. After installing the first Hyper-V host, it may be necessary to add another layer of availability to your virtualization services. With Failover Clusters, you get independence from hardware failures and are protected from planned or unplanned service outages. This article includes prerequirements and implementation of Failover Clusters. (For more resources related to this topic, see here.) Preparing for High Availability Like every project, a High Availability (HA) scenario starts with a planning phase. Virtualization projects are often turning up the question for additional availability for the first time in an environment. In traditional data centers with physical server systems and local storage systems, an outage of a hardware component will only affect one server hosting one service. The source of the outage can be localized very fast and the affected parts can be replaced in a short amount of time. Server virtualization comes with great benefits, such as improved operating efficiency and reduced hardware dependencies. However, a single component failure can impact a lot of virtualized systems at once. By adding redundant systems, these single points of failure can be avoided. Planning a HA environment The most important factor in the decision whether you need a HA environment is your business requirements. You need to find out how often and how long an IT-related production service can be interrupted unplanned, or planned, without causing a serious problem to your business. Those requirements are defined in a central IT strategy of a business as well as in process definitions that are IT-driven. They include Service Level Agreements of critical business services run in the various departments of your company. If those definitions do not exist or are unavailable, talk to the process owners to find out the level of availability needed. High Availability is structured in different classes, measured by the total uptime in a defined timespan, that is 99.999 percent in a year. Every nine in this figure adds a huge amount of complexity and money needed to ensure this availability, so take time to find out the real availability needed by your services and resist the temptation to plan running every service on multi-redundant, geo-spread cluster systems, as it may not fit in the budget. Be sure to plan for additional capacity in a HA environment, so you can lose hardware components without the need to sacrifice application performance. Overview of the Failover Cluster A Hyper-V Failover Cluster consists of two or more Hyper-V Server compute nodes. Technically, it's possible to use a Failover Cluster with just one computing node; however, it will not provide any availability advantages over a standalone host and is typically only used for migration scenarios. A Failover Cluster is hosting roles such as Hyper-V virtual machines on its computing nodes. If one node fails due to a hardware problem, it will not answer any more to cluster heartbeat communication, even though the service interruption is almost instantly detected. The virtual machines running on the particular node are powered off immediately due to the hardware failure on their computing node. The remaining cluster nodes then immediately take over these VMs in an unplanned failover process and start them on their respective own hardware. The virtual machines will be the backup running after a successful boot of their operating systems and applications in just a few minutes. Hyper-V Failover Clusters work under the condition that all compute nodes have access to a shared storage instance, holding the virtual machine configuration data and its virtual hard disks. In case of a planned failover, that is, for patching compute nodes, it's possible to move running virtual machines from one cluster node to another without interrupting the VM. All cluster nodes can run virtual machines at the same time, as long as there is enough failover capacity running all services when a node goes down. Even though a Hyper-V cluster is still called a Failover Cluster—utilizing the Windows Server Failover-Clustering feature—it is indeed capable of running an Active/Active Cluster. To ensure that all these capabilities of a Failover Cluster are indeed working, it demands an accurate planning and implementation process. Failover Cluster prerequirements To successfully implement a Hyper-V Failover Cluster, we need suitable hardware, software, permissions, and network and storage infrastructure as outlined in the following sections. Hardware The hardware used in a Failover Cluster environment needs to be validated against the Windows Server Catalogue. Microsoft will only support Hyper-V clusters when all components are certified for Windows Server 2012 R2. The servers used to run our HA virtual machines should ideally consist of identical hardware models with identical components. It is possible, and supported, to run servers in the same cluster with different hardware components, that is, different size of RAM; however, due to a higher level of complexity, this is not best practice. Special planning considerations are needed to address the CPU requirements of a cluster. To ensure maximum compatibility, all CPUs in a cluster should be exactly the same model. While it's possible from a technical point of view to mix even CPUs from Intel and AMD in the same cluster through to different architecture, you will lose core cluster capabilities such as Live Migration. Choosing a single vendor for your CPUs is not enough, even when using different CPU models your cluster nodes may be using a different set of CPU instruction set extensions. With different instructions sets, Live Migrations won't work either. There is a compatibility mode that disables most of the instruction set on all CPUs on all cluster nodes; however, this leaves you with a negative impact on performance and should be avoided. A better approach to this problem would be creating another cluster from the legacy CPUs running smaller or non-production workloads without affecting your high-performance production workloads. If you want to extend your cluster after some time, you will find yourself with the problem of not having the exact same hardware available to purchase. Choose the current revision of the model or product line you are already using in your cluster and manually compare the CPU instruction sets at http://ark.intel.com/ and http://products.amd.com/, respectively. Choose the current CPU model that best fits the original CPU features of your cluster and have this design validated by your hardware partner. Ensure that your servers are equipped with compatible CPUs, the same amount of RAM, and the same network cards and storage controllers. The network design Mixing different vendors of network cards in a single server is fine and best practice for availability, but make sure all your Hyper-V hosts are using an identical hardware setup. A network adapter should only be used exclusively for LAN traffic or storage traffic. Do not mix these two types of communication in any basic scenario. There are some more advanced scenarios involving converged networking that can enable mixed traffic, but in most cases, this is not a good idea. A Hyper-V Failover Cluster requires multiple layers of communication between its nodes and storage systems. Hyper-V networking and storage options have changed dramatically through the different releases of Hyper-V. With Windows Server 2012 R2, the network design options are endless. In this article, we will work with a typically seen basic set of network designs. We have at least six Network Interface Cards (NICs) available in our servers with a bandwidth of 1 Gb/s. If you have more than five interface cards available per server, use NIC Teaming to ensure the availability of the network or even use converged networking. Converged networking will also be your choice if you have less than five network adapters available. The First NIC will be exclusively used for Host Communication to our Hyper-V host and will not be involved in the VM network traffic or cluster communication at any time. It will ensure Active Directory and management traffic to our Management OS. The second NIC will ensure Live Migration of virtual machines between our cluster nodes. The third NIC will be used for VM traffic. Our virtual machines will be connected to the various production and lab networks through this NIC. The fourth NIC will be used for internal cluster communication. The first four NICs can either be teamed through Windows Server NIC Teaming or can be abstracted from the physical hardware through to Windows Server network virtualization and converged fabric design. The fifth NIC will be reserved for storage communication. As advised, we will be isolating storage and production LAN communication from each other. If you do not use iSCSI or SMB3 storage communication, this NIC will not be necessary. If you use Fibre Channel SAN technology, use a FC-HBA instead. If you leverage Direct Attached Storage (DAS), use the appropriate connector for storage communication. The sixth NIC will also be used for storage communication as a redundancy. The redundancy will be established via MPIO and not via NIC Teaming. There is no need for a dedicated heartbeat network as in older revisions of Windows Server with Hyper-V. All cluster networks will automatically be used for sending heartbeat signals throughout the other cluster members. If you don't have 1 Gb/s interfaces available, or if you use 10 GbE adapters, it’s best practice to implement a converged networking solution. Storage design All cluster nodes must have access to the virtual machines residing on a centrally shared storage medium. This could be a classic setup with a SAN, a NAS, or a more modern concept with Windows Scale Out File Servers hosting Virtual Machine Files SMB3 Fileshares. In this article, we will use a NetApp SAN system that's capable of providing a classic SAN approach with LUNs mapped to our Hosts as well as utilizing SMB3 Fileshares, but any other Windows Server 2012 R2 validated SAN will fulfill the requirements. In our first setup, we will utilize Cluster Shared Volumes (CSVs) to store several virtual machines on the same storage volume. It's not good these days to create a single volume per virtual machine due to a massive management overhead. It's a good rule of thumb to create one CSV per cluster node; in larger environments with more than eight hosts, a CSV per two to four cluster nodes. To utilize CSVs, follow these steps: Ensure that all components (SAN, Firmware, HBAs, and so on) are validated for Windows Server 2012 R2 and are up to date. Connect your SAN physically to all your Hyper-V hosts via iSCSI or Fibre Channel connections. Create two LUNs on your SAN for hosting virtual machines. Activate Hyper-V performance options for these LUNs if possible (that is, on a NetApp, by setting the LUN type to Hyper-V). Size the LUNs for enough capacity to host all your virtual hard disks. Label the LUNs CSV01 and CSV02 with appropriate LUN IDs. Create another small LUN with 1 GB in size and label it Quorum. Make the LUNs available to all Hyper-V hosts in this specified cluster by mapping it on the storage device. Do not make these LUNs available to any other hosts or cluster. Prepare storage DSMs and drivers (that is, MPIO) for Hyper-V host installation. Refresh disk configuration on hosts, install drivers and DSMs, and format volumes as NTFS (quick). Install Microsoft Multipath IO when using redundant storage paths: Install-WindowsFeature -Name Multipath-IO –Computername ElanityHV01, ElanityHV02 In this example, I added the MPIO feature to two Hyper-V hosts with the computer names ElanityHV01 and ElanityHV02. SANs typically are equipped with two storage controllers for redundancy reasons. Make sure to disperse your workloads over both controllers for optimal availability and performance. If you leverage file servers providing SMB3 shares, the preceding steps do not apply to you. Perform the following steps instead: Create a storage space with the desired disk-types, use storage tiering if possible. Create a new SMB3 Fileshare for applications. Customize the Permissions to include all Hyper-V servers from the planned clusters as well as the Hyper-V cluster object itself with full control. Server and software requirements To create a Failover Cluster, you need to install a second Hyper-V host. Use the same unattended file but change the IP address and the hostname. Join both Hyper-V hosts to your Active Directory domain if you have not done this until yet. Hyper-V can be clustered without leveraging Active Directory but it's lacking several key components, such as Live Migration, and shouldn't be done on purpose. The availability to successfully boot up a domain-joined Hyper-V cluster without the need to have any Active Directory domain controller present during boot time is the major benefit from the Active Directory independency of Failover Clusters. Ensure that you create a Hyper-V virtual switch as shown earlier with the same name on both hosts, to ensure cluster compatibility and that both nodes are installed with all updates. If you have System Center 2012 R2 in place, use the System Center Virtual Machine Manager to create a Hyper-V cluster. Implementing Failover Clusters After preparing our Hyper-V hosts, we will now create a Failover Cluster using PowerShell. I'm assuming your hosts are installed, storage and network connections are prepared, and the Hyper-V role is already active utilizing up-to-date drivers and firmware on your hardware. First, we need to ensure that Servername, Date, and Time of our Hosts are correct. Time and Timezone configurations should occur via Group Policy. For automatic network configuration later on, it's important to rename the network connections from default to their designated roles using PowerShell, as seen in the following commands: Rename-NetAdapter -Name "Ethernet" -NewName "Host" Rename-NetAdapter -Name "Ethernet 2" -NewName "LiveMig" Rename-NetAdapter -Name "Ethernet 3" -NewName "VMs" Rename-NetAdapter -Name "Ethernet 4" -NewName "Cluster" Rename-NetAdapter -Name "Ethernet 5" -NewName "Storage" The Network Connections window should look like the following screenshot: Hyper-V host Network Connections Next, IP configuration of the network adapters. If you are not using DHCP for your servers, manually set the IP configuration (different subnets) of the specified network cards. Here is a great blog post on how to automate this step: http://bit.ly/Upa5bJ Next, we need to activate the necessary Failover Clustering features on both of our Hyper-V hosts: Install-WindowsFeature -Name Failover-Clustering-IncludeManagementTools –Computername ElanityHV01, ElanityHV02 Before actually creating the cluster, we are launching a cluster validation cmdlet via PowerShell: Test-Cluster ElanityHV01, ElanityHV02 Test-Cluster cmdlet Open the generated .mht file for more details, as shown in the following screenshot: Cluster validation As you can see, there are some warnings that should be investigated. However, as long as there are no errors, the configuration is ready for clustering and fully supported by Microsoft. However, check out Warnings to be sure you won't run into problems in the long run. After you have fixed potential errors and warnings listed in the Cluster Validation Report, you can finally create the cluster as follows: New-Cluster-Name CN=ElanityClu1,OU=Servers,DC=cloud,DC=local-Node ElanityHV01, ElanityHV02-StaticAddress 192.168.1.49 This will create a new cluster named ElanityClu1 consisting of the nodes ElanityHV01 and ElanityHV02 and using the cluster IP address 192.168.1.49. This cmdlet will create the cluster and the corresponding Active Directory Object in the specified OU. Moving the cluster object to a different OU later on is no problem at all; even renaming is possible when done the right way. After creating the cluster, when you open the Failover Cluster Management console, you should be able to connect to your cluster: Failover Cluster Manager You will see that all your cluster nodes and Cluster Core Resources are online. Rerun the Validation Report and copy the generated .mht files to a secure location if you need them for support queries. Keep in mind that you have to rerun this wizard if any hardware or configuration changes occurring to the cluster components, including any of its nodes. The initial cluster setup is now complete and we can continue with post creation tasks. Summary With the knowledge from this article, you are now able to design and implement Hyper-V Failover Clusters as well as guest clusters. You are aware of the basic concepts of High Availability and the storage and networking options necessary to achieve this. You have seen real-world proven configurations to ensure a stable operating environment. Resources for Article: Further resources on this subject: Planning Desktop Virtualization [Article] Backups in the VMware View Infrastructure [Article] Virtual Machine Design/a> [Article]
Read more
  • 0
  • 0
  • 9389
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
Packt
26 Nov 2014
26 min read
Save for later

Ansible – An Introduction

Packt
26 Nov 2014
26 min read
In this article by Madhurranjan Mohan and Ramesh Raithatha, the authors of the book, Learning Ansible, have given an overview of the basic features of Ansible, right from the installation part till the deployment. (For more resources related to this topic, see here.) What is Ansible? Ansible is an orchestration engine in IT, which can be used for several use cases, such as configuration management, orchestration, provisioning, and deployment. Compared to other automation tools, Ansible brings you an easy way to configure your orchestration engine without the overhead of a client or central server setup. That's right! No central server! It comes preloaded with a wide range of modules that make your life simpler. Ansible is an open source tool (with enterprise editions available) developed using Python and runs on Windows, Mac, and Unix-like systems. You can use Ansible for configuration management, orchestration, provisioning, and deployments, which covers many of the problems that are solved under the broad umbrella of DevOps. We won't be talking about culture here as that's a book by itself! You could refer to the book, Continuous Delivery and DevOps – A Quickstart Guide by Packt Publishing for more information at https://www.packtpub.com/virtualization-and-cloud/continuous-delivery-and-devops-quickstart-guide. Let's try to answer some questions that you may have right away. Can I use Ansible if I am starting afresh, have no automation in my system, and would like to introduce that (and as a result, increase my bonus for the next year)? A short answer to this question is Ansible is probably perfect for you. The learning curve with Ansible is way shorter than most other tools currently present in the market. I have other tools in my environment. Can I still use Ansible? Yes, again! If you already have other tools in your environment, you can still augment those with Ansible as it solves many problems in an elegant way. A case in point is a puppet shop that uses Ansible for orchestration and provisioning of new systems but continues to use Puppet for configuration management. I don't have Python in my environment and introducing Ansible would bring in Python. What do I do? You need to remember that, on most Linux systems, a version of Python is present at boot time, and you don't have to explicitly install Python. You should still go ahead with Ansible if it solves particular problems for you. Always question what problems you are trying to solve and then check whether a tool such as Ansible would solve that use case. I have no configuration management at present. Can I start today? The answer is yes! In many of the conferences we presented, the preceding four questions popped up most frequently. Now that these questions are answered, let's dig deeper. The architecture of Ansible is agentless. Yes, you heard that right; you don't have to install any client-side software. It works purely on SSH connections, in which case you can consider SSH as your agent and our previous statement that Ansible is agentless is not entirely right. However, SSH is almost assumed to run on every server that its consider as a separate agent. Hence, they call Ansible agentless. So, if you have a well-oiled SSH setup, then you're ready to roll Ansible into your environment in no time. This also means that you can install it only on one system (either a Linux or Mac machine) and you can control your entire infrastructure from that machine. Yes, we understand that you must be thinking about what happens if this machine goes down. You would probably have multiple such machines in production, but this was just an example to elucidate the simplicity of Ansible. As Ansible works on SSH connections, it would be slower; in order to speedup default SSH connections, you can always enable ControlPersist and the pipeline mode, which makes Ansible faster and secure. Ansible works like any other Unix command that doesn't require any daemon process to be running all the time. So you would either run it via a cron, on demand from a single node, or at startup. Ansible can be push or pull based and you can utilize whatever suits you. When you start with something new, the first aspect you need to pay attention to is the nomenclature. The faster you're able to pick up the terms associated with the tool, the faster you're comfortable with that tool. So, to deploy, let's say, a package on one or more machines in Ansible, you would need to write a playbook that has a single task, which in turn uses the package module that would then go ahead and install the package based on an inventory file that contains a list of these machines. If you feel overwhelmed by the nomenclature, don't worry, you'll soon get used to it. Similar to the package module, Ansible comes loaded with more than 200 modules, purely written in Python. We will talk about modules in detail later. It is now time to install Ansible to start trying out various fun examples. Installing Ansible Installing Ansible is rather quick and simple. You can directly use the source code by cloning it from the GitHub project (https://github.com/ansible/ansible), install it using your system's package manager, or use Python's package management tool (pip). You can use Ansible on any Windows, Mac, or Unix-like system. Ansible doesn't require any database and doesn't need any daemons running. This makes it easier to maintain the Ansible versions and upgrade without any breaks. We'd like to call the machine where we will install Ansible our command center. Many people also refer to it as the Ansible workstation. Note that, as Ansible is developed using Python, you would need Python Version 2.4 or a higher version installed. Python is preinstalled, as specified earlier, on the majority of operating systems. If this is not the case for you, refer to https://wiki.python.org/moin/BeginnersGuide/Download to download/upgrade Python. Installing Ansible from source Installing from source is as easy as cloning a repository. You don't require any root permissions while installing from source. Let's clone a repository and activate virtualenv, which is an isolated environment in Python where you can install packages without interfering with the system's Python packages. The command is as follows: $ git clone git://github.com/ansible/ansible.git $ cd ansible/ $ source ./hacking/env-setup Ansible needs a couple of Python packages, which you can install using pip. If you don't have pip installed on your system, install it using the following command: $ sudo easy_install pip Once you have installed pip, install the paramiko, PyYAML, jinja2, and httplib2 packages using the following command lines: $ sudo pip install paramiko PyYAML jinja2 httplib2 By default, Ansible will be running against the development branch. You might want to check out the latest stable branch. Check what the latest stable version is using the following command line: $ git branch -a Copy the latest version you want to use. Version 1.7.1 was the latest version available at the time of writing. Check the latest version you would like to use using the following command lines: $ git checkout release1.7.1 You now have a working setup of Ansible ready. One of the benefits of running Ansible through source is that you can enjoy the benefits of new features immediately without waiting for your package manager to make them available for you. Installing Ansible using the system's package manager Ansible also provides a way to install itself using the system's package manager. We will look into installing Ansible via Yum, Apt, Homebrew, and pip. Installing via Yum If you are running a Fedora system, you can install Ansible directly. For CentOS- or RHEL-based systems, you should add the EPEL repository first, as follows: $ sudo yum install ansible On CentOS 6 or RHEL 6, you have to run the command rpm -Uvh. Refer to http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm for instructions on how to install EPEL. You can also install Ansible from an RPM file. You need to use the make rpm command against the git checkout of Ansible, as follows: $ git clone git://github.com/ansible/ansible.git $ cd ./ansible $ make rpm $ sudo rpm -Uvh ~/rpmbuild/ansible-*.noarch.rpm You should have rpm-build, make, and python2-devel installed on your system to build an rpm. Installing via Apt Ansible is available for Ubuntu in a Personal Package Archive (PPA). To configure the PPA and install Ansible on your Ubuntu system, use the following command lines: $ sudo apt-get install apt-add-repository $ sudo apt-add-repository ppa:rquillo/ansible $ sudo apt-get update $ sudo apt-get install ansible You can also compile a deb file for Debian and Ubuntu systems, using the following command line: $ make deb Installing via Homebrew You can install Ansible on Mac OSX using Homebrew, as follows: $ brew update $ brew install ansible Installing via pip You can install Ansible via Python's package manager pip. If you don't have pip installed on your system, install it. You can use pip to install Ansible on Windows too, using the following command line: $ sudo easy_install pip You can now install Ansible using pip, as follows: $ sudo pip install ansible Once you're done installing Ansible, run ansible --version to verify that it has been installed: $ ansible –version You will get the following as the output of the preceding command line: ansible 1.7.1 Hello Ansible Let's start by checking if two remote machines are reachable; in other words, let's start by pinging two machines following which we'll echo hello ansible on the two remote machines. The following are the steps that need to be performed: Create an Ansible inventory file. This can contain one or more groups. Each group is defined within square brackets. This example has one group called servers: $ cat inventory [servers] machine1 machine2 Now, we have to ping the two machines. In order to do that, first run ansible --help to view the available options, as shown below (only copying the subset that we need for this example): ansible --help Usage: ansible <host-pattern> [options] Options: -a MODULE_ARGS, --args=MODULE_ARGS          module arguments -i INVENTORY, --inventory-file=INVENTORY          specify inventory host file          (default=/etc/ansible/hosts) -m MODULE_NAME, --module-name=MODULE_NAME          module name to execute          (default=command) We'll now ping the two servers using the Ansible command line, as shown in the following screenshot: Now that we can ping these two servers, let's echo hello ansible!, using the command line shown in the following screenshot: Consider the following command: $ansible servers -i inventory -a '/bin/echo hello ansible!' The preceding command line is the same as the following one: $ansible servers -i inventory -m command -a '/bin/echo hello ansible!'. If you move the inventory file to /etc/ansible/hosts, the Ansible command will become even simpler, as follows: $ ansible servers -a '/bin/echo hello ansible!' There you go. The 'Hello Ansible' program works! Time to tweet! You can also specify the inventory file by exporting it in a variable named ANSIBLE_HOSTS. The preceding command, without the –i option, will work even in that situation. Developing a playbook In Ansible, except for ad hoc tasks that are run using the ansible command, we need to make sure we have playbooks for every other repeatable task. In order to do that, it is important to have a local development environment, especially when a larger team is involved, where people can develop their playbooks and test them before checking them into Git. A very popular tool that currently fits this bill is Vagrant. Vagrant's aim is to help users create and configure lightweight, reproducible, and portable development environments. By default, Vagrant works on VirtualBox, which can run on a local laptop or desktop. To elaborate further, it can be used for the following use cases: Vagrant can be used when creating development environments to constantly check new builds of software, especially when you have several other dependent components. For example, if I am developing service A and it depends on two other services, B and C, and also a database, then the fastest way to test the service locally is to make sure the dependent services and the database are set up (especially if you're testing multiple versions), and every time you compile the service locally, you deploy the module against these services and test them out. Testing teams can use Vagrant to deploy the versions of code they want to test and work with them, and each person in the testing team can have local environments on their laptop or desktop rather than common environments that are shared between teams. If your software is developed for cloud-based platforms and needs to be deployed on AWS and Rackspace (for example), apart from testing it locally on VMware Fusion or VirtualBox, Vagrant is perfect for this purpose. In Vagrant's terms, you can deploy your software on multiple providers with a configuration file that differs only for a provider. For example, the following screenshot shows the VirtualBox configuration for a simple machine: The following is the AWS configuration for a simple machine: As you can see, the provider configuration changes but the rest of the configuration remains more or less the same. (Private IP is virtual-box-specific but it is ignored when run using the AWS provider.) Vagrant also provides provisioners. Vagrant provides users with multiple options to configure new machines as they come up using provisioners. They support shell scripts, tools such as Chef, Puppet, Salt, and Docker, as well as Ansible. By using Ansible with Vagrant, you can develop your Ansible scripts locally, deploy, and redeploy them as many times as needed to get them right, and then check them in. The advantage, from an infrastructure point of view, is that the same code can also be used by other developers and testers when they spawn off their vagrant environments for testing (The software would be configured to work in the expected manner by Ansible playbooks.). The checked-in Ansible code will then flow like the rest of your software, through testing and stage environments, before they are finally deployed into Production. Roles When you start thinking about your infrastructure, you will soon look at the purposes each node in your infrastructure is solving and you will be able to categorize them. You will also start to abstract out information regarding nodes and start thinking at a higher level. For example, if you're running a web application, you'll be able to categorize them broadly as db_servers, app_servers, web servers, and load balancers. If you then talk to your provisioning team, they will tell you which base packages need to be installed on each machine, either for the sake of compliance or to manage them remotely after choosing the OS distribution or for security purposes. Simple examples can be packages such as bind, ntp, collectd, psacct, and so on. Soon you will add all these packages under a category named common or base. As you dig deeper, you might find further dependencies that exist. For example, if your application is written in Java, having some version of JDK is a dependency. So, for what we've discussed so far, we have the following categories: db_servers app_servers web_servers load_balancers common jdk We've taken a top-down approach to come up with the categories listed. Now, depending on the size of your infrastructure, you will slowly start identifying reusable components, and these can be as simple as ntp or collectd. These categories, in Ansible's terms, are called Roles. If you're familiar with Chef, the concept is very similar. Callback plugins One of the features that Ansible provides is a callback mechanism. You can configure as many callback plugins as required. These plugins can intercept events and trigger certain actions. Let's see a simple example where we just print the run results at the end of the playbook run as part of a callback and then take a brief look at how to configure a callback. We first use grep for the location of callback_plugins in ansible.cfg as follows: $ grep callback ansible.cfg callback_plugins   = /usr/share/ansible_plugins/callback_plugins We then create our callback plugin in this location. $ ls -l /usr/share/ansible_plugins/callback_plugins callback_sample.py Let's now look at the contents of the callback_sample file: $ cat /usr/share/ansible_plugins/callback_plugins/callback_sample.py class CallbackModule(object): def on_any(self, *args, **kwargs):    pass def runner_on_failed(self, host, res, ignore_errors=False):    pass def runner_on_ok(self, host, res):    pass def runner_on_error(self, host, msg):    pass def runner_on_skipped(self, host, item=None):    pass def runner_on_unreachable(self, host, res):    pass def runner_on_no_hosts(self):    pass def runner_on_async_poll(self, host, res, jid, clock):    pass def runner_on_async_ok(self, host, res, jid):    pass def runner_on_async_failed(self, host, res, jid):    pass def playbook_on_start(self):    pass def playbook_on_notify(self, host, handler):    pass def playbook_on_no_hosts_matched(self):    pass def playbook_on_no_hosts_remaining(self):    pass def playbook_on_task_start(self, name, is_conditional):    pass def playbook_on_vars_prompt(self, varname, private=True, prompt=None, encrypt=None, confirm=False, salt_size=None, salt=None, default=None):    pass def playbook_on_setup(self):    pass def playbook_on_import_for_host(self, host, imported_file):    pass def playbook_on_not_import_for_host(self, host, missing_file):    pass def playbook_on_play_start(self, pattern):    pass def playbook_on_stats(self, stats):    results = dict([(h, stats.summarize(h)) for h in stats.processed])    print "Run Results: %s" % results As you can see, the callback class, CallbackModule, contains several methods. The methods of this class are called and the Ansible run parameters are provided as parameters to these methods. Playbook activities can be intercepted by using these methods and relevant actions can be taken based on that. Relevant methods are called based on the action, for example, we've used the playbook_on_stats method (in bold) to display statistics regarding the playbook run. Let's run a basic playbook with the callback plugin and view the output as follows: In the preceding screenshot, you can now see the Run Results line right at the end consisting of a dictionary or hash of the actual results. This is due to our custom code. This is just an example of how you can intercept methods and use them to your advantage. You can utilize this information in any number of ways. Isn’t this powerful? Are you getting any ideas around how you can utilize a feature such as this? Do write it down before reading further. If you’re able to write out even two use cases that we’ve not covered here and is relevant to your infrastructure, give yourself a pat on the back! Modules Ansible allows you to extend functionality using custom modules. Arguments , as you have seen can be passed to the modules. The arguments that you pass, provided they are in a key value format, will be forwarded in a separate file along with the module. Ansible expects at least two variables in your module output, that is, result of the module run, whether it passed or failed, and a message for the user and they have to be in JSON format. If you adhere to this simple rule, you can customize as much as you want and the module can be written in any language. Using Bash modules Bash modules in Ansible are no different than any other bash scripts, except the way it prints the data on stdout. Bash modules could be as simple as checking if a process is running on the remote host to running some complex commands. We recommend that you use bash over other languages, such as Python and Ruby only when you're performing simple tasks. In other cases, you should use languages that provide better error handling. Let's see an example for the bash module as follows: The preceding bash module will take the service_name argument and forcefully kill all of the Java processes that belong to that service. As you know, Ansible passes the argument file to the module. We then source the arguments file using source $1. This will actually set the environment variable with the name, service_name. We then access this variable using $service_name as follows: We then check to see if we obtained any PIDs for the service and run a loop over it to forcefully kill all of the Java processes that match service_name. Once they're killed, we exit the module with failed=False and a message with an exit code of 0, as shown in the following screenshot, because terminating the Ansible run might not make sense: Provisioning a machine in the cloud With that, let's jump to the first topic. Teams managing infrastructures have a lot of choices today to run their builds, tests, and deployments. Providers such as Amazon, Rackspace, and DigitalOcean primarily provide Infrastructure as a Service (IAAS). They expose an API via SDKs, which you can invoke in order to create new machines, or use their GUI to set it up. We're more interested in using their SDK as it will play an important part in our automation effort. Setting up new servers and provisioning them is interesting at first but at some stage it can become boring as it's quite repetitive in nature. Each provisioning step would involve several similar steps to get them up-and-running. Imagine one fine morning you receive an e-mail asking for three new customer setups, where each customer setup has three to four instances and a bunch of services and dependencies. This might be an easy task for you, but would require running the same set of repetitive commands multiple times, followed by monitoring the servers once they come up to confirm that everything just went fine. In addition, anything you do manually has a chance of introducing bugs. What if two of the customer setups come up correctly but, due to fatigue, you miss out a step for the third customer and hence introduce a bug? To deal with such situations, there exists automation. Cloud provisioning automation makes it easy for an engineer to build up a new server as quickly as possible, allowing him/her to concentrate on other priorities. Using Ansible, you can easily perform these actions and automate cloud provisioning with minimal effort. Ansible provides you with the power to automate various different cloud platforms, such as Amazon, DigitalOcean, Google Cloud, Rackspace, and so on, with modules for different services available in the Ansible core. Docker provisioning Docker is perhaps the most popular open source tool that has been released in the last year. The following quote can be seen on the Docker website: Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications, whether on laptops, data center VMs, or the cloud. Increasingly, more and more individuals and companies are adopting Docker. The tagline for Docker is Containerization is the new virtualization. At a high level, all Docker allows you to do is prepare lightweight containers using instructions from a Dockerfile and run the container. The same container can be shared or shipped across environments, thereby making sure you run the exact same image and reducing the chance of errors. The Docker image that you build is cached by default; thus, the next time you have similar instructions, the time taken to bring up a similar container is reduced to almost nothing. Let's now look at how Ansible can be used with Docker to make this a powerful working combination. You can use Ansible with Docker to perform the following: Installing Docker on hosts Deploying new Docker images Building or provisioning new Docker images Deployment strategies with RPM In most cases, we already have a certain version of the application that has been deployed and now, either to introduce a new feature or fix bugs, a new version has to be deployed. We'd like to discuss this scenario in greater detail. At a high level, whenever we deploy an application, there are three kinds of changes to take care of: Code changes Config changes Database changes The first two types are ideally handled by the RPM, unless you have very specific values that need to be set up during runtime. Files with passwords can be checked but they should be encrypted with Ansible Vault or dropped into files as templates during run time, just as we did with database.yml. With templates, if the configs are ideally just name-value pairs that can be handled in a Jinja template, you should be good, but if you have other lines in the configuration that do not change, then it's better that those configuration files are checked and appear as part of the RPM. Many teams we've worked with check environment-specific folders that have all the configuration parameters; while starting the application, we provide the environment in which the application should be started. Another way is to deploy the RPM with default values for all configuration properties while writing a different folder with name-value pairs that override the parameters in the default configuration that is part of the RPM. The database changes should be handled carefully. Ideally, you want them to be idempotent for a particular version of the RPM so that, even if someone tries to push database changes multiple times, the database is not really affected. For example, in the preceding case, we run rake db:migrate that is idempotent in nature; even if you run the same command from multiple machines, you shouldn't really face issues. The Rails framework does it by storing the database migration version in a separate table. Having looked at the three types of changes, we can now examine how to perform rpm deployments for each release. When you're pushing new changes, the current version or service is already running. It's recommended that you take the server out of service before performing upgrades. For example, if it's part of a load balancer, make sure it's out of the load balancer and not serving any traffic before performing the upgrades. Primarily, there are the following two ways: Deploying newer versions of rpm in the same directory Deploying the rpm into a version-specific directory Canary deployment The name Canary is used with reference to the canaries that were often sent in to coal mines to alert miners about toxic gases reaching dangerous levels. Canary deployment is a technique that is used to reduce the risk of releasing a new version of software by first releasing it to a small subset of users (demographically, location-wise, and so on), gauging their reaction, and then slowly releasing it to the rest of the users. Whenever possible, keep the first set of users as internal users, since it reduces the impact on the actual customers. This is especially useful when you introduce a new feature that you want to test with a small subset of users before releasing it to the rest of your customer base. If you're running, let's say, an application across multiple data centers and you're sure that certain users would only contact specific data centers when they access your site, you could definitely run a Canary deployment. Deploying Ansible pull The last topic we'd like to cover in this section is Ansible pull. If you have a large number of hosts that you'd like to release software on simultaneously, you will be limited by the number of parallel SSH connections that can be run. At scale, the pull model is preferred to the push model. Ansible supports what is called as Ansible pull. Ansible pull works individually on each node. The prerequisite is that it points to a repository from where it can invoke a special file called localhost.yml or <hostname>.yml. Typically, the ansible-pull option is run either as a cron job or is triggered remotely by some other means. We're going to use our tomcat example again, with the only difference being that the structure of the repository has been changed slightly. Let's look at the structure of the git repository that will work for Ansible pull as follows: As you can see, localhost.yml is present at the top level and the roles folder consists of the tomcat folder, under which is the tasks folder with the main.yml task file. Let's now run the playbook using ansible-pull as follows: Let's look at the preceding run in detail as follows: The ansible-pull command: We invoke ansible-pull with the following options: –o: This option means that the Ansible run takes place only if the remote repository has changes. –C master: This option indicates which branch to refer to in the git repository. –U < >: This option indicates the repository that needs to be checked out. –i localhost: This option indicates the inventory that needs to be considered. In this case, since we're only concerned about one tomcat group, we use -i localhost. However, when there are many more inventory groups, make sure you use an inventory file with the –i option. The localhost | success JSON: This option checks whether the repository has changed and lists the latest commit. The actual Ansible playbook run: The Ansible playbook run is just like before. At the end of the run, we will have the WAR file up and running. Summary In this article, we got an overview of Ansible, we looked at the introduction to Ansible, how to install Ansible in various ways, wrote our very first playbook, learned ways to use the Ansible command line, how to debug playbooks, and learned how to develop a playbook on our own. We also looked into the various aspects of Ansible such as Roles, callback plugins, modules and the Bash module, how to provision a machine in the cloud, Docker provisioning, the deployments strategies with RPM, Canary deployment and deployment using Ansible pull. Resources for Article: Further resources on this subject: Getting Started with Ansible [Article] Module, Facts, Types and Reporting tools in Puppet [Article] Designing Puppet Architectures [Article]
Read more
  • 0
  • 0
  • 2276

article-image-concurrency-practice
Packt
26 Nov 2014
25 min read
Save for later

Concurrency in Practice

Packt
26 Nov 2014
25 min read
This article written by Aleksandar Prokopec, the author of Learning Concurrent Programming in Scala, helps you develop skills that are necessary to write correct and efficient concurrent programs. It teaches you about concurrency in Scala through a sequence of programs. (For more resources related to this topic, see here.) "The best theory is inspired by practice."                                          -Donald Knuth We have studied a plethora of different concurrency facilities in this article. By now, you will have learned about dozens of different ways of starting concurrent computations and accessing shared data. Knowing how to use different styles of concurrency is useful, but it might not yet be obvious when to use which. The goal of this article is to introduce the big picture of concurrent programming. We will study the use cases for various concurrency abstractions, see how to debug concurrent programs, and how to integrate different concurrency libraries in larger applications. In this article, we perform the following tasks: Investigate how to deal with various kinds of bugs appearing in concurrent applications Learn how to identify and resolve performance bottlenecks Apply the previous knowledge about concurrency to implement a larger concurrent application, namely, a remote file browser We start with an overview of the important concurrency frameworks that we learned about in this article, and a summary of when to use each of them. Choosing the right tools for the job In this section, we present an overview of the different concurrency libraries that we learned about. We take a step back and look at the differences between these libraries, and what they have in common. This summary will give us an insight into what different concurrency abstractions are useful for. A concurrency framework usually needs to address several concerns: It must provide a way to declare data that is shared between concurrent executions It must provide constructs for reading and modifying program data It must be able to express conditional execution, triggered when a certain set of conditions are fulfilled It must define a way to start concurrent executions Some of the frameworks from this article address all of these concerns; others address only a subset, and transfer part of the responsibility to another framework. Typically, in a concurrent programming model, we express concurrently shared data differently from data intended to be accessed only from a single thread. This allows the JVM runtime to optimize sequential parts of the program more effectively. So far, we've seen a lot of different ways to express concurrently shared data, ranging from the low-level facilities to advanced high-level abstractions. We summarize different data abstractions in the following table: Data abstraction Datatype or annotation Description Volatile variables (JDK) @volatile Ensure visibility and the happens-before relationship on class fields and local variables that are captured in closures. Atomic variables (JDK) AtomicReference[T] AtomicInteger AtomicLong Provide basic composite atomic operations, such as compareAndSet and incrementAndGet. Futures and promises (scala.concurrent) Future[T] Promise[T] Sometimes called single-assignment variables, these express values that might not be computed yet, but will eventually become available. Observables and subjects (Rx) Observable[T] Subject[T] Also known as first-class event streams, these describe many different values that arrive one after another in time. Transactional references (Scala Software Transactional Memory (STM)) Ref[T] These describe memory locations that can only be accessed from within memory transactions. Their modifications only become visible after the transaction successfully commits. The next important concern is providing access to shared data, which includes reading and modifying shared memory locations. Usually, a concurrent program uses special constructs to express such accesses. We summarize the different data access constructs in the following table: Data abstraction Data access constructs Description Arbitrary data (JDK) synchronized   Uses intrinsic object locks to exclude access to arbitrary shared data. Atomic variables and classes (JDK) compareAndSet Atomically exchanges the value of a single memory location. It allows implementing lock-free programs. Futures and promises (scala.concurrent) value tryComplete Used to assign a value to a promise, or to check the value of the corresponding future. The value method is not a preferred way to interact with a future. Transactional references (ScalaSTM) atomic orAtomic single Atomically modify the values of a set of memory locations. Reduces the risk of deadlocks, but disallow side effects inside the transactional block. Concurrent data access is not the only concern of a concurrency framework. Concurrent computations sometimes need to proceed only after a certain condition is met. In the following table, we summarize different constructs that enable this: Concurrency framework Conditional execution constructs Description JVM concurrency wait notify notifyAll Used to suspend the execution of a thread until some other thread notifies that the conditions are met. Futures and promises onComplete Await.ready Conditionally schedules an asynchronous computation. The Await.ready method suspends the thread until the future completes. Reactive extensions subscribe Asynchronously or synchronously executes a computation when an event arrives. Software transactional memory retry retryFor withRetryTimeout Retries the current memory transaction when some of the relevant memory locations change. Actors receive Executes the actor's receive block when a message arrives. Finally, a concurrency model must define a way to start a concurrent execution. We summarize different concurrency constructs in the following table: Concurrency framework Concurrency constructs Description JVM concurrency Thread.start Starts a new thread of execution. Execution contexts execute Schedules a block of code for execution on a thread pool. Futures and promises Future.apply Schedules a block of code for execution, and returns the future value with the result of the execution. Parallel collections par Allows invoking data-parallel versions of collection methods. Reactive extensions Observable.create observeOn The create method defines an event source. The observeOn method schedules the handling of events on different threads. Actors actorOf Schedules a new actor object for execution. This breakdown shows us that different concurrency libraries focus on different tasks. For example, parallel collections do not have conditional waiting constructs, because a data-parallel operation proceeds on separate elements independently. Similarly, software transactional memory does not come with a construct to express concurrent computations, and focuses only on protecting access to shared data. Actors do not have special constructs for modeling shared data and protecting access to it, because data is encapsulated within separate actors and accessed serially only by the actor that owns it. Having classified concurrency libraries according to how they model shared data and express concurrency, we present a summary of what different concurrency libraries are good for: The classical JVM concurrency model uses threads, the synchronized statement, volatile variables, and atomic primitives for low-level tasks. Uses include implementing a custom concurrency utility, a concurrent data structure, or a concurrency framework optimized for specific tasks. Futures and promises are best suited for referring to concurrent computations that produce a single result value. Futures model latency in the program, and allow composing values that become available later during the execution of the program. Uses include performing remote network requests and waiting for replies, referring to the result of an asynchronous long-running computation, or reacting to the completion of an I/O operation. Futures are usually the glue of a concurrent application, binding the different parts of a concurrent program together. We often use futures to convert single-event callback APIs into a standardized representation based on the Future type. Parallel collections are best suited for efficiently executing data-parallel operations on large datasets. Usages include file searching, text processing, linear algebra applications, numerical computations, and simulations. Long-running Scala collection operations are usually good candidates for parallelization. Reactive extensions are used to express asynchronous event-based programs. Unlike parallel collections, in reactive extensions, data elements are not available when the operation starts, but arrive while the application is running. Uses include converting callback-based APIs, modeling events in user interfaces, modeling events external to the application, manipulating program events with collection-style combinators, streaming data from input devices or remote locations, or incrementally propagating changes in the data model throughout the program. Use STM to protect program data from getting corrupted by concurrent accesses. An STM allows building complex data models and accessing them with the reduced risk of deadlocks and race conditions. A typical use is to protect concurrently accessible data, while retaining good scalability between threads whose accesses to data do not overlap. Actors are suitable for encapsulating concurrently accessible data, and seamlessly building distributed systems. Actor frameworks provide a natural way to express concurrent tasks that communicate by explicitly sending messages. Uses include serializing concurrent access to data to prevent corruption, expressing stateful concurrency units in the system, and building distributed applications like trading systems, P2P networks, communication hubs, or data mining frameworks. Advocates of specific programming languages, libraries, or frameworks might try to convince you that their technology is the best for any task and any situation, often with the intent of selling it. Richard Stallman once said how computer science is the only industry more fashion-driven than women's fashion. As engineers, we need to know better than to succumb to programming fashion and marketing propaganda. Different frameworks are tailored towards specific use cases, and the correct way to choose a technology is to carefully weigh its advantages and disadvantages when applied to a specific situation. There is no one-size-fits-all technology. Use your own best judgment when deciding which concurrency framework to use for a specific programming task. Sometimes, choosing the best-suited concurrency utility is easier said than done. It takes a great deal of experience to choose the correct technology. In many cases, we do not even know enough about the requirements of the system to make an informed decision. Regardless, a good rule of thumb is to apply several concurrency frameworks to different parts of the same application, each best suited for a specific task. Often, the real power of different concurrency frameworks becomes apparent when they are used together. This is the topic of the next section. Putting it all together – a remote file browser In this section, we use our knowledge about different concurrency frameworks to build a remote file browser. This larger application example illustrates how different concurrency libraries work together, and how to apply them to different situations. We will name our remote file browser ScalaFTP. The ScalaFTP browser is divided into two main components: the server and the client process. The server process will run on the machine whose filesystem we want to manipulate. The client will run on our own computer, and comprise of a graphical user interface used to navigate the remote filesystem. To keep things simple, the protocol that the client and the server will use to communicate will not really be FTP, but a custom communication protocol. By choosing the correct concurrency libraries to implement different parts of ScalaFTP, we will ensure that the complete ScalaFTP implementation fits inside just 500 lines of code. Specifically, the ScalaFTP browser will implement the following features: Displaying the names of the files and the directories in a remote filesystem, and allow navigating through the directory structure Copying files between directories in a remote filesystem Deleting files in a remote filesystem To implement separate pieces of this functionality, we will divide the ScalaFTP server and client programs into layers. The task of the server program is to answer to incoming copy and delete requests, and to answer queries about the contents of specific directories. To make sure that its view of the filesystem is consistent, the server will cache the directory structure of the filesystem. We divide the server program into two layers: the filesystem API and the server interface. The filesystem API will expose the data model of the server program, and define useful utility methods to manipulate the filesystem. The server interface will receive requests and send responses back to the client. Since the server interface will require communicating with the remote client, we decide to use the Akka actor framework. Akka comes with remote communication facilities. The contents of the filesystem, that is, its state, will change over time. We are therefore interested in choosing proper constructs for data access. In the filesystem API, we can use object monitors and locking to synchronize access to shared state, but we will avoid these due to the risk of deadlocks. We similarly avoid using atomic variables, because they are prone to race conditions. We could encapsulate the filesystem state within an actor, but note that this can lead to a scalability bottleneck:an actor would serialize all accesses to the filesystem state. Therefore, we decide to use the ScalaSTM framework to model the filesystem contents. An STM avoids the risk of deadlocks and race conditions, and ensures good horizontal scalability. The task of the client program will be to graphically present the contents of the remote filesystem, and communicate with the server. We divide the client program into three layers of functionality. The GUI layer will render the contents of the remote filesystem and register user requests such as button clicks. The client API will replicate the server interface on the client side and communicate with the server. We will use Akka to communicate with the server, but expose the results of remote operations as futures. Finally, the client logic will be a gluing layer, which binds the GUI and the client API together. The architecture of the ScalaFTP browser is illustrated in the following diagram, in which we indicate which concurrency libraries will be used by separate layers. The dashed line represents the communication path between the client and the server: We now start by implementing the ScalaFTP server, relying on the bottom-up design approach. In the next section, we will describe the internals of the filesystem API. Modeling the filesystem We used atomic variables and concurrent collections to implement a non-blocking, thread-safe filesystem API, which allowed copying files and retrieving snapshots of the filesystem. In this section, we repeat this task using STM. We will see that it is much more intuitive and less error-prone to use an STM. We start by defining the different states that a file can be in. The file can be currently created, in the idle state, being copied, or being deleted. We model this with a sealed State trait, and its four cases: sealed trait Statecase object Created extends Statecase object Idle extends Statecase class Copying(n: Int) extends Statecase object Deleted extends State A file can only be deleted if it is in the idle state, and it can only be copied if it is in the idle state or in the copied state. Since a file can be copied to multiple destinations at a time, the Copying state encodes how many copies are currently under way. We add the methods inc and dec to the State trait, which return a new state with one more or one fewer copy, respectively. For example, the implementation of inc and dec for the Copying state is as follows: def inc: State = Copying(n + 1)def dec: State = if (n > 1) Copying(n - 1) else Idle Similar to the File class in the java.io package, we represent both the files and directories with the same entity, and refer to them more generally as files. Each file is represented by the FileInfo class that encodes the path, its name, its parent directory, and the date of the last modification to the file; a Boolean value denoting if the file is a directory, the size of the file, and its State object. The FileInfo class is immutable, and updating the state of the file will require creating a fresh FileInfo object: case class FileInfo(path: String, name: String,parent: String, modified: String, isDir: Boolean,size: Long, state: State) We separately define the factory methods apply and creating that take a File object and return a FileInfo object in the Idle or Created state, respectively. Depending on where the server is started, the root of the ScalaFTP directory structure is a different subdirectory in the actual filesystem. A FileSystem object tracks the files in the given rootpath directory, using a transactional map called files: class FileSystem(val rootpath: String) {val files = TMap[String, FileInfo]()} We introduce a separate init method to initialize the FileSystem object. The init method starts a transaction, clears the contents of the files map, and traverses the files and directories under rootpath using the Apache Commons IO library. For each file and directory, the init method creates a FileInfo object and adds it to the files map, using its path as the key: def init() = atomic { implicit txn =>files.clear()val rootDir = new File(rootpath)val all = TrueFileFilter.INSTANCEval fileIterator =FileUtils.iterateFilesAndDirs(rootDir, all, all).asScalafor (file <- fileIterator) {val info = FileInfo(file)files(info.path) = info} Recall that the ScalaFTP browser must display the contents of the remote filesystem. To enable directory queries, we first add the getFileList method to the FileSystem class, which retrieves the files in the specified dir directory. The getFileList method starts a transaction and filters the files whose direct parent is equal to dir: def getFileList(dir: String): Map[String, FileInfo] =atomic { implicit txn =>files.filter(_._2.parent == dir)} We implement the copying logic in the filesystem API with the copyFile method. This method takes a path to the src source file and the dest destination file, and starts a transaction. After checking whether the dest destination file exists or not, the copyFile method inspects the state of the source file entry, and fails unless the state is Idle or Copying. It then calls inc to create a new state with the increased copy count, and updates the source file entry in the files map with the new state. Similarly, the copyFile method creates a new entry for the destination file in the files map. Finally, the copyFile method calls the afterCommit handler to physically copy the file to disk after the transaction completes. Recall that it is not legal to execute side-effecting operations from within the transaction body, so the private copyOnDisk method is called only after the transaction commits: def copyFile(src: String, dest: String) = atomic { implicit txn =>val srcfile = new File(src)val destfile = new File(dest)val info = files(src)if (files.contains(dest)) sys.error(s"Destination exists.")info.state match {case Idle | Copying(_) =>files(src) = info.copy(state = info.state.inc)files(dest) = FileInfo.creating(destfile, info.size)Txn.afterCommit { _ => copyOnDisk(srcfile, destfile) }src}} The copyOnDisk method calls the copyFile method on the FileUtils class from the Apache Commons IO library. After the file transfer completes, the copyOnDisk method starts another transaction, in which it decreases the copy count of the source file and sets the state of the destination file to Idle: private def copyOnDisk(srcfile: File, destfile: File) = {FileUtils.copyFile(srcfile, destfile)atomic { implicit txn =>val ninfo = files(srcfile.getPath)files(srcfile.getPath) = ninfo.copy(state = ninfo.state.dec)files(destfile.getPath) = FileInfo(destfile)}} The deleteFile method deletes a file in a similar way. It changes the file state to Deleted, deletes the file, and starts another transaction to remove the file entry: def deleteFile(srcpath: String): String = atomic { implicit txn =>val info = files(srcpath)info.state match {case Idle =>files(srcpath) = info.copy(state = Deleted)Txn.afterCommit { _ =>FileUtils.forceDelete(info.toFile)files.single.remove(srcpath)}srcpath}} Modeling the server data model with the STM allows seamlessly adding different concurrent computations to the server program. In the next section, we will implement a server actor that uses the server API to execute filesystem operations. Use STM to model concurrently accessible data, as an STM works transparently with most concurrency frameworks. Having completed the filesystem API, we now proceed to the server interface layer of the ScalaFTP browser. The Server interface The server interface comprises of a single actor called FTPServerActor. This actor will receive client requests and respond to them serially. If it turns out that the server actor is the sequential bottleneck of the system, we can simply add additional server interface actors to improve horizontal scalability. We start by defining the different types of messages that the server actor can receive. We follow the convention of defining them inside the companion object of the FTPServerActor class: object FTPServerActor {sealed trait Commandcase class GetFileList(dir: String) extends Commandcase class CopyFile(src: String, dest: String) extends Commandcase class DeleteFile(path: String) extends Commanddef apply(fs: FileSystem) = Props(classOf[FTPServerActor], fs)} The actor template of the server actor takes a FileSystem object as a parameter. It reacts to the GetFileList, CopyFile, and DeleteFile messages by calling the appropriate methods from the filesystem API: class FTPServerActor(fileSystem: FileSystem) extends Actor {val log = Logging(context.system, this)def receive = {case GetFileList(dir) =>val filesMap = fileSystem.getFileList(dir)val files = filesMap.map(_._2).to[Seq]sender ! filescase CopyFile(srcpath, destpath) =>Future {Try(fileSystem.copyFile(srcpath, destpath))} pipeTo sendercase DeleteFile(path) =>Future {Try(fileSystem.deleteFile(path))} pipeTo sender}} When the server receives a GetFileList message, it calls the getFileList method with the specified dir directory, and sends a sequence collection with the FileInfo objects back to the client. Since FileInfo is a case class, it extends the Serializable interface, and its instances can be sent over the network. When the server receives a CopyFile or DeleteFile message, it calls the appropriate filesystem method asynchronously. The methods in the filesystem API throw exceptions when something goes wrong, so we need to wrap calls to them in Try objects. After the asynchronous file operations complete, the resulting Try objects are piped back as messages to the sender actor, using the Akka pipeTo method. To start the ScalaFTP server, we need to instantiate and initialize a FileSystem object, and start the server actor. We parse the network port command-line argument, and use it to create an actor system that is capable of remote communication. For this, we use the remotingSystem factory method that we introduced. The remoting actor system then creates an instance of the FTPServerActor. This is shown in the following program: object FTPServer extends App {val fileSystem = new FileSystem(".")fileSystem.init()val port = args(0).toIntval actorSystem = ch8.remotingSystem("FTPServerSystem", port)actorSystem.actorOf(FTPServerActor(fileSystem), "server")} The ScalaFTP server actor can run inside the same process as the client application, in another process in the same machine, or on a different machine connected with a network. The advantage of the actor model is that we usually need not worry about where the actor runs until we integrate it into the entire application. When you need to implement a distributed application that runs on different machines, use an actor framework. Our server program is now complete, and we can run it with the run command from SBT. We set the actor system to use the port 12345: run 12345 In the next section, we will implement the file navigation API for the ScalaFTP client, which will communicate with the server interface over the network. Client navigation API The client API exposes the server interfaces to the client program through asynchronous methods that return future objects. Unlike the server's filesystem API, which runs locally, the client API methods execute remote network requests. Futures are a natural way to model latency in the client API methods, and to avoid blocking during the network requests. Internally, the client API maintains an actor instance that communicates with the server actor. The client actor does not know the actor reference of the server actor when it is created. For this reason, the client actor starts in an unconnected state. When it receives the Start message with the URL of the server actor system, the client constructs an actor path to the server actor, sends out an Identify message, and switches to the connecting state. If the actor system is able to find the server actor, the client actor eventually receives the ActorIdentity message with the server actor reference. In this case, the client actor switches to the connected state, and is able to forward commands to the server. Otherwise, the connection fails and the client actor reverts to the unconnected state. The state diagram of the client actor is shown in the following figure: We define the Start message in the client actor's companion object: object FTPClientActor {case class Start(host: String)} We then define the FTPClientActor class and give it an implicit Timeout parameter. The Timeout parameter will be used later in the Akka ask pattern, when forwarding client requests to the server actor. The stub of the FTPClientActor class is as follows: class FTPClientActor(implicit val timeout: Timeout)extends Actor Before defining the receive method, we define behaviors corresponding to different actor states. Once the client actor in the unconnected state receives the Start message with the host string, it constructs an actor path to the server, and creates an actor selection object. The client actor then sends the Identify message to the actor selection, and switches its behavior to connecting. This is shown in the following behavior method, named unconnected: def unconnected: Actor.Receive = {case Start(host) =>val serverActorPath =s"akka.tcp://FTPServerSystem@$host/user/server"val serverActorSel = context.actorSelection(serverActorPath)serverActorSel ! Identify(())context.become(connecting(sender))} The connecting method creates a behavior given an actor reference to the sender of the Start message. We call this actor reference clientApp, because the ScalaFTP client application will send the Start message to the client actor. Once the client actor receives an ActorIdentity message with the ref reference to the server actor, it can send true back to the clientApp reference, indicating that the connection was successful. In this case, the client actor switches to the connected behavior. Otherwise, if the client actor receives an ActorIdentity message without the server reference, the client actor sends false back to the application, and reverts to the unconnected state: def connecting(clientApp: ActorRef): Actor.Receive = {case ActorIdentity(_, Some(ref)) =>clientApp ! truecontext.become(connected(ref))case ActorIdentity(_, None) =>clientApp ! falsecontext.become(unconnected)} The connected state uses the serverActor server actor reference to forward the Command messages. To do so, the client actor uses the Akka ask pattern, which returns a future object with the server's response. The contents of the future are piped back to the original sender of the Command message. In this way, the client actor serves as an intermediary between the application, which is the sender, and the server actor. The connected method is shown in the following code snippet: def connected(serverActor: ActorRef): Actor.Receive = {case command: Command =>(serverActor ? command).pipeTo(sender)} Finally, the receive method returns the unconnected behavior, in which the client actor is created: def receive = unconnected Having implemented the client actor, we can proceed to the client API layer. We model it as a trait with a connected value, the concrete methods getFileList, copyFile, and deleteFile, and an abstract host method. The client API creates a private remoting actor system and a client actor. It then instantiates the connected future that computes the connection status by sending a Start message to the client actor. The methods getFileList, copyFile, and deleteFile are similar. They use the ask pattern on the client actor to obtain a future with the response. Recall that the actor messages are not typed, and the ask pattern returns a Future[Any] object. For this reason, each method in the client API uses the mapTo future combinator to restore the type of the message: trait FTPClientApi {implicit val timeout: Timeout = Timeout(4 seconds)private val props = Props(classOf[FTPClientActor], timeout)private val system = ch8.remotingSystem("FTPClientSystem", 0)private val clientActor = system.actorOf(props)def host: Stringval connected: Future[Boolean] = {val f = clientActor ? FTPClientActor.Startf.mapTo[Boolean]}def getFileList(d: String): Future[(String, Seq[FileInfo])] = {val f = clientActor ? FTPServerActor.GetFileList(d)f.mapTo[Seq[FileInfo]].map(fs => (d, fs))}def copyFile(src: String, dest: String): Future[String] = {val f = clientActor ? FTPServerActor.CopyFile(src, dest)f.mapTo[Try[String]].map(_.get)}def deleteFile(srcpath: String): Future[String] = {val f = clientActor ? FTPServerActor.DeleteFile(srcpath)f.mapTo[Try[String]].map(_.get)}} Note that the client API does not expose the fact that it uses actors for remote communication. Moreover, the client API is similar to the server API, but the return types of the methods are futures instead of normal values. Futures encode the latency of a method without exposing the cause for the latency, so we often find them at the boundaries between different APIs. We can internally replace the actor communication between the client and the server with the remote Observable objects, but that would not change the client API. In a concurrent application, use futures at the boundaries of the layers to express latency. Now that we can programmatically communicate with the remote ScalaFTP server, we turn our attention to the user interface of the client program. Summary This article summarized the different concurrency libraries introduced to us. In this article, you learned how to choose the correct concurrent abstraction to solve a given problem. We learned to combine different concurrency abstractions together when designing larger concurrent applications. Resources for Article: Further resources on this subject: Creating Java EE Applications [Article] Differences in style between Java and Scala code [Article] Integrating Scala, Groovy, and Flex Development with Apache Maven [Article]
Read more
  • 0
  • 0
  • 1782

article-image-modernizing-our-spring-boot-app
Packt
26 Nov 2014
15 min read
Save for later

Modernizing our Spring Boot app

Packt
26 Nov 2014
15 min read
In this article by Greg L. Turnquist, the author of the book, Learning Spring Boot, we will discuss modernizing our Spring Boot app with JavaScript and adding production-ready support features. (For more resources related to this topic, see here.) Modernizing our app with JavaScript We just saw that, with a single @Grab statement, Spring Boot automatically configured the Thymeleaf template engine and some specialized view resolvers. We took advantage of Spring MVC's ability to pass attributes to the template through ModelAndView. Instead of figuring out the details of view resolvers, we instead channeled our efforts into building a handy template to render data fetched from the server. We didn't have to dig through reference docs, Google, and Stack Overflow to figure out how to configure and integrate Spring MVC with Thymeleaf. We let Spring Boot do the heavy lifting. But that's not enough, right? Any real application is going to also have some JavaScript. Love it or hate it, JavaScript is the engine for frontend web development. See how the following code lets us make things more modern by creating modern.groovy: @Grab("org.webjars:jquery:2.1.1")@Grab("thymeleaf-spring4")@Controllerclass ModernApp {def chapters = ["Quick Start With Groovy","Quick Start With Java","Debugging and Managing Your App","Data Access with Spring Boot","Securing Your App"]@RequestMapping("/")def home(@RequestParam(value="name", defaultValue="World")String n) {new ModelAndView("modern").addObject("name", n).addObject("chapters", chapters)}} A single @Grab statement pulls in jQuery 2.1.1. The rest of our server-side Groovy code is the same as before. There are multiple ways to use JavaScript libraries. For Java developers, it's especially convenient to use the WebJars project (http://webjars.org), where lots of handy JavaScript libraries are wrapped up with Maven coordinates. Every library is found on the /webjars/<library>/<version>/<module> path. To top it off, Spring Boot comes with prebuilt support. Perhaps you noticed this buried in earlier console outputs: ...2014-05-20 08:33:09.062 ... : Mapped URL path [/webjars/**] onto handlerof [...... With jQuery added to our application, we can amp up our template (templates/modern.html) like this: <html><head><title>Learning Spring Boot - Chapter 1</title><script src="webjars/jquery/2.1.1/jquery.min.js"></script><script>$(document).ready(function() {$('p').animate({fontSize: '48px',}, "slow");});</script></head><body><p th_text="'Hello, ' + ${name}"></p><ol><li th_each="chapter : ${chapters}"th:text="${chapter}"></li></ol></body></html> What's different between this template and the previous one? It has a couple extra <script> tags in the head section: The first one loads jQuery from /webjars/jquery/2.1.1/jquery.min.js (implying that we can also grab jquery.js if we want to debug jQuery) The second script looks for the <p> element containing our Hello, world! message and then performs an animation that increases the font size to 48 pixels after the DOM is fully loaded into the browser If we run spring run modern.groovy and visit http://localhost:8080, then we can see this simple but stylish animation. It shows us that all of jQuery is available for us to work with on our application. Using Bower instead of WebJars WebJars isn't the only option when it comes to adding JavaScript to our app. More sophisticated UI developers might use Bower (http://bower.io), a popular JavaScript library management tool. WebJars are useful for Java developers, but not every library has been bundled as a WebJar. There is also a huge community of frontend developers more familiar with Bower and NodeJS that will probably prefer using their standard tool chain to do their jobs. We'll see how to plug that into our app. First, it's important to know some basic options. Spring Boot supports serving up static web resources from the following paths: /META-INF/resources/ /resources/ /static/ /public/ To craft a Bower-based app with Spring Boot, we first need to craft a .bowerrc file in the same folder we plan to create our Spring Boot CLI application. Let's pick public/ as the folder of choice for JavaScript modules and put it in this file, as shown in the following code: {"directory": "public/"} Do I have to use public? No. Again, you can pick any of the folders listed previously and Spring Boot will serve up the code. It's a matter of taste and semantics. Our first step towards a Bower-based app is to define our project by answering a series of questions (this only has to be done once): $ bower init[?] name: app_with_bower[?] version: 0.1.0[?] description: Learning Spring Boot - bower sample[?] main file:[?] what types of modules does this package expose? amd[?] keywords:[?] authors: Greg Turnquist <gturnquist@pivotal.io>[?] license: ASL[?] homepage: http://blog.greglturnquist.com/category/learning-springboot[?] set currently installed components as dependencies? No[?] add commonly ignored files to ignore list? Yes[?] would you like to mark this package as private which prevents it frombeing accidentally published to the registry? Yes...[?] Looks good? Yes Now that we have set our project, let's do something simple such as install jQuery with the following command: $ bower install jquery --savebower jquery#* cached git://github.com/jquery/jquery.git#2.1.1bower jquery#* validate 2.1.1 against git://github.com/jquery/jquery.git#* These two commands will have created the following bower.json file: {"name": "app_with_bower","version": "0.1.0","authors": ["Greg Turnquist <gturnquist@pivotal.io>"],"description": "Learning Spring Boot - bower sample","license": "ASL","homepage": "http://blog.greglturnquist.com/category/learningspring-boot","private": true,"ignore": ["**/.*","node_modules","bower_components","public/","test","tests"],"dependencies": {"jquery": "~2.1.1"}} It will also have installed jQuery 2.1.1 into our app with the following directory structure: public└── jquery├── MIT-LICENSE.txt├── bower.json└── dist├── jquery.js└── jquery.min.js We must include --save (two dashes) whenever we install a module. This ensures that our bower.json file is updated at the same time, allowing us to rebuild things if needed. The altered version of our app with WebJars removed should now look like this: @Grab("thymeleaf-spring4")@Controllerclass ModernApp {def chapters = ["Quick Start With Groovy","Quick Start With Java","Debugging and Managing Your App","Data Access with Spring Boot","Securing Your App"]@RequestMapping("/")def home(@RequestParam(value="name", defaultValue="World")String n) {new ModelAndView("modern_with_bower").addObject("name", n).addObject("chapters", chapters)}} The view name has been changed to modern_with_bower, so it doesn't collide with the previous template if found in the same folder. This version of the template, templates/modern_with_bower.html, should look like this: <html><head><title>Learning Spring Boot - Chapter 1</title><script src="jquery/dist/jquery.min.js"></script><script>$(document).ready(function() {$('p').animate({fontSize: '48px',}, "slow");});</script></head><body><p th_text="'Hello, ' + ${name}"></p><ol><li th_each="chapter : ${chapters}"th:text="${chapter}"></li></ol></body></html> The path to jquery is now jquery/dist/jquery.min.js. The rest is the same as the WebJars example. We just launch the app with spring run modern_with_bower.groovy and navigate to http://localhost:8080. (Might need to refresh the page to ensure loading of the latest HTML.) The animation should work just the same. The options shown in this section can quickly give us a taste of how easy it is to use popular JavaScript tools with Spring Boot. We don't have to fiddle with messy tool chains to achieve a smooth integration. Instead, we can use them the way they are meant to be used. What about an app that is all frontend with no backend? Perhaps we're building an app that gets all its data from a remote backend. In this age of RESTful backends, it's not uncommon to build a single page frontend that is fed data updates via AJAX. Spring Boot's Groovy support provides the perfect and arguably smallest way to get started. We do so by creating pure_javascript.groovy, as shown in the following code: @Controllerclass JsApp { } That doesn't look like much, but it accomplishes a lot. Let's see what this tiny fragment of code actually does for us: The @Controller annotation, like @RestController, causes Spring Boot to auto-configure Spring MVC. Spring Boot will launch an embedded Apache Tomcat server. Spring Boot will serve up static content from resources, static, and public. Since there are no Spring MVC routes in this tiny fragment of code, things will fall to resource resolution. Next, we can create a static/index.html page as follows: <html>Greetings from pure HTML which can, in turn, load JavaScript!</html> Run spring run pure_javascript.groovy and navigate to http://localhost:8080. We will see the preceding plain text shown in our browser as expected. There is nothing here but pure HTML being served up by our embedded Apache Tomcat server. This is arguably the lightest way to serve up static content. Use spring jar and it's possible to easily bundle up our client-side app to be installed anywhere. Spring Boot's support for static HTML, JavaScript, and CSS opens the door to many options. We can add WebJar annotations to JsApp or use Bower to introduce third-party JavaScript libraries in addition to any custom client-side code. We might just manually download the JavaScript and CSS. No matter what option we choose, Spring Boot CLI certainly provides a super simple way to add rich-client power for app development. To top it off, RESTful backends that are decoupled from the frontend can have different iteration cycles as well as different development teams. You might need to configure CORS (http://spring.io/understanding/CORS) to properly handle making remote calls that don't go back to the original server. Adding production-ready support features So far, we have created a Spring MVC app with minimal code. We added views and JavaScript. We are on the verge of a production release. Before deploying our rapidly built and modernized web application, we might want to think about potential issues that might arise in production: What do we do when the system administrator wants to configure his monitoring software to ping our app to see if it's up? What happens when our manager wants to know the metrics of people hitting our app? What are we going to do when the Ops center supervisor calls us at 2:00 a.m. and we have to figure out what went wrong? The last feature we are going to introduce in this article is Spring Boot's Actuator module and CRaSH remote shell support (http://www.crashub.org). These two modules provide some super slick, Ops-oriented features that are incredibly valuable in a production environment. We first need to update our previous code (we'll call it ops.groovy), as shown in the following code: @Grab("spring-boot-actuator")@Grab("spring-boot-starter-remote-shell")@Grab("org.webjars:jquery:2.1.1")@Grab("thymeleaf-spring4")@Controllerclass OpsReadyApp {@RequestMapping("/")def home(@RequestParam(value="name", defaultValue="World")String n) {new ModelAndView("modern").addObject("name", n)}} This app is exactly like the WebJars example with two key differences: it adds @Grab("spring-boot-actuator") and @Grab("spring-boot-starter-remote-shell"). When you run this version of our app, the same business functionality is available that we saw earlier, but there are additional HTTP endpoints available: Actuator endpoint Description /autoconfig This reports what Spring Boot did and didn't auto-configure and why /beans This reports all the beans configured in the application context (including ours as well as the ones auto-configured by Boot) /configprops This exposes all configuration properties /dump This creates a thread dump report /env This reports on the current system environment /health This is a simple endpoint to check life of the app /info This serves up custom content from the app /metrics This shows counters and gauges on web usage /mappings This gives us details about all Spring MVC routes /trace This shows details about past requests Pinging our app for general health Each of these endpoints can be visited using our browser or using other tools such as curl. For example, let's assume we ran spring run ops.groovy and then opened up another shell. From the second shell, let's run the following curl command: $ curl localhost:8080/health{"status":"UP"} This immediately solves our first need listed previously. We can inform the system administrator that he or she can write a management script to interrogate our app's health. Gathering metrics Be warned that each of these endpoints serves up a compact JSON document. Generally speaking, command-line curl probably isn't the best option. While it's convenient on *nix and Mac systems, the content is dense and hard to read. It's more practical to have: A JSON plugin installed in our browser (such as JSONView at http://jsonview.com) A script that uses a JSON parsing library if we're writing a management script (such as Groovy's JsonSlurper at http://groovy.codehaus.org/gapi/groovy/json/JsonSlurper.html or JSONPath at https://code.google.com/p/json-path) Assuming we have JSONView installed, the following screenshot shows a listing of metrics: It lists counters for each HTTP endpoint. According to this, /metrics has been visited four times with a successful 200 status code. Someone tried to access /foo, but it failed with a 404 error code. The report also lists gauges for each endpoint, reporting the last response time. In this case, /metrics took 2 milliseconds. Also included are some memory stats as well as the total CPUs available. It's important to realize that the metrics start at 0. To generate some numbers, you might want to first click on some links before visiting /metrics. The following screenshot shows a trace report: It shows the entire web request and response for curl localhost:8080/health. This provides a basic framework of metrics to satisfy our manager's needs. It's important to understand that metrics gathered by Spring Boot Actuator aren't persistent across application restarts. So to gather long-term data, we have to gather them and then write them elsewhere. With these options, we can perform the following: Write a script that gathers metrics every hour and appends them to a running spreadsheet somewhere else in the filesystem, such as a shared drive. This might be simple, but probably also crude. To step it up, we can dump the data into a Hadoop filesystem for raw collection and configure Spring XD (http://projects.spring.io/spring-xd/) to consume it. Spring XD stands for Spring eXtreme Data. It is an open source product that makes it incredibly easy to chain together sources and sinks comprised of many components, such as HTTP endpoints, Hadoop filesystems, Redis metrics, and RabbitMQ messaging. Unfortunately, there is no space to dive into this subject. With any monitoring, it's important to check that we aren't taxing the system too heavily. The same container responding to business-related web requests is also serving metrics data, so it will be wise to engage profilers periodically to ensure that the whole system is performing as expected. Detailed management with CRaSH So what can we do when we receive that 2:00 a.m. phone call from the Ops center? After either coming in or logging in remotely, we can access the convenient CRaSH shell we configured. Every time the app launches, it generates a random password for SSH access and prints this to the local console: 2014-06-11 23:00:18.822 ... : Configuring property ssh.port=2000 fromproperties2014-06-11 23:00:18.823 ... : Configuring property ssh.authtimeout=600000 fro...2014-06-11 23:00:18.824 ... : Configuring property ssh.idletimeout=600000 fro...2014-06-11 23:00:18.824 ... : Configuring property auth=simple fromproperties2014-06-11 23:00:18.824 ... : Configuring property auth.simple.username=user f...2014-06-11 23:00:18.824 ... : Configuring property auth.simple.password=bdbe4a... We can easily see that there's SSH access on port 2000 via a user if we use this information to log in: $ ssh -p 2000 user@localhostPassword authenticationPassword:. ____ _ __ _ _/\ / ___'_ __ _ _(_)_ __ __ _ ( ( )___ | '_ | '_| | '_ / _' | \/ ___)| |_)| | | | | || (_| | ) ) ) )' |____| .__|_| |_|_| |___, | / / / /=========|_|==============|___/=/_/_/_/:: Spring Boot :: (v1.1.6.RELEASE) on retina> There's a fistful of commands: help: This gets a listing of available commands dashboard: This gets a graphic, text-based display of all the threads, environment properties, memory, and other things autoconfig: This prints out a report of which Spring Boot auto-configuration rules were applied and which were skipped (and why) All of the previous commands have man pages: > man autoconfigNAMEautoconfig - Display auto configuration report fromApplicationContextSYNOPSISautoconfig [-h | --help]STREAMautoconfig <java.lang.Void, java.lang.Object>PARAMETERS[-h | --help]Display this help message... There are many commands available to help manage our application. More details are available at http://www.crashub.org/1.3/reference.html. Summary In this article, we learned about modernizing our Spring Boot app with JavaScript and adding production-ready support features. We plugged in Spring Boot's Actuator module as well as the CRaSH remote shell, configuring it with metrics, health, and management features so that we can monitor it in production by merely adding two lines of extra code. Resources for Article: Further resources on this subject: Getting Started with Spring Security [Article] Spring Roo 1.1: Working with Roo-generated Web Applications [Article] Spring Security 3: Tips and Tricks [Article]
Read more
  • 0
  • 0
  • 3132

article-image-creating-css-stylus-preprocessor
Packt
26 Nov 2014
5 min read
Save for later

Creating CSS via the Stylus preprocessor

Packt
26 Nov 2014
5 min read
Instead of manually typing out each line of CSS you're going to require for your Ghost theme, we're going to get you setup to become highly efficient in your development through use of the CSS preprocessor named Stylus. Stylus can be described as a way of making CSS smart. It gives you the ability to define variables, create blocks of code that can be easily reused, perform mathematical calculations, and more. After Stylus code is written, it is compiled into a regular CSS file that is then linked into your design in the usual fashion. It is an extremely powerful tool with many capabilities, so we won't go into them all here; however, we will cover some of the essential features that will feature heavily in our theme development process. This article by Kezz Bracey, David Balderston, and Andy Boutte, author of the book Getting Started with Ghost, covers how to create CSS via the Stylus preprocessor. (For more resources related to this topic, see here.) Variables Stylus has the ability to create variables to hold any piece of information from color codes to numerical values for use in your layout. For example, you could map out the color scheme of your design like this: default_background_color = #F2F2F2 default_foreground_color = #333 default_highlight_color = #77b6f9 You could then use these variables all throughout your code instead of having to type them out multiple times: body { background-color: default_background_color; } a { color: default_highlight_color; } hr { border-color: default_foreground_color; } .post { border-color: default_highlight_color; color: default_foreground_color; } After the preceding Stylus code was compiled into CSS, it would look like this: body { background-color: #F2F2F2;} a { color: #77b6f9; } hr { border-color: #333; } .post { border-color: #77b6f9; color: #333; } So not only have you been saved the trouble of typing out these color code values repeatedly, which in a real style sheet means a lot of work, but you can also now easily update the color scheme of your site simply by changing the value of the variables you created. Variables come in very handy for many purposes, as you'll see when we get started on theme creation. Stylus syntax Stylus code uses a syntax that reads very much like CSS, but with the ability to take shortcuts in order to code faster and more smoothly. With Stylus, you don't need to include curly braces, colons, or semicolons. Instead, you use tab indentations, spaces, and new lines. For example, the code I used in the last section could actually be written like this in Stylus: body background-color default_background_color   a color default_highlight_color   hr border-color default_foreground_color   .post border-color default_highlight_color color default_foreground_color You may think at first glance that this code is more difficult to read than regular CSS; however, shortly we'll be getting you running with a syntax highlighting package that will make your code look like this: With the syntax highlighting package in place you don't need punctuation to make your code readable as the colors and emphasis allow you to easily differentiate between one thing and another. The chances are very high that you'll find coding in this manner much faster and easier than regular CSS syntax. However, if you're not comfortable, you can still choose to include the curly braces, colons, and semicolons you're used to and your code will still compile just fine. The golden rules of writing in Stylus syntax are as follows: After a class, ID, or element declaration, use a new line and then a tab indentation instead of curly braces Ensure each line of a style is also subsequently tab indented After a property, use a space instead of a colon At the end of a line, after a value, use a new line instead of a semicolon Mixins Mixins are a very useful way of preventing yourself from having to repeat code, and also to allow you to keep your code well organized and compartmentalized. The best way to understand what a mixin is, is to see one in action. For example, you may want to apply the same font-family, font-weight, and color to each of your heading tags. So instead of writing the same thing out manually for each H tag level, you could create a mixin as follows: header_settings() font-family Georgia font-weight 700 color #454545 You could then call that mixin into the styles for your heading tags: h1 header_settings() font-size 3em h2 header_settings() font-size 2.25em h3 header_settings() font-size 1.5em When compiled, you would get the following CSS: h1 { font-family: Georgia; font-weight: 700; color: #454545; font-size: 3em; }   h2 { font-family: Georgia; font-weight: 700; color: #454545; font-size: 2.25em; } h3 { font-family: Georgia; font-weight: 700; color: #454545; font-size: 1.5em; } As we move through the Ghost theme development process, you'll see just how useful and powerful Stylus is, and you'll never want to go back to handcoding CSS again! Summary You now have everything in place and ready to begin your Ghost theme development process. You understand the essentials of the Stylus, the means by which we'll be creating your theme's CSS. Resources for Article: Further resources on this subject: Advanced SOQL Statements [Article] Enabling your new theme in Magento [Article] Introduction to a WordPress application's frontend [Article]
Read more
  • 0
  • 0
  • 14156
article-image-audio-processing-and-generation-maxmsp
Packt
25 Nov 2014
19 min read
Save for later

Audio Processing and Generation in Max/MSP

Packt
25 Nov 2014
19 min read
In this article, by Patrik Lechner, the author of Multimedia Programming Using Max/MSP and TouchDesigner, focuses on the audio-specific examples. We will take a look at the following audio processing and generation techniques: Additive synthesis Subtractive synthesis Sampling Wave shaping Nearly every example provided here might be understood very intuitively or taken apart in hours of math and calculation. It's up to you how deep you want to go, but in order to develop some intuition; we'll have to be using some amount of Digital Signal Processing (DSP) theory. We will briefly cover the DSP theory, but it is highly recommended that you study its fundamentals deeper to clearly understand this scientific topic in case you are not familiar with it already. (For more resources related to this topic, see here.) Basic audio principles We already saw and stated that it's important to know, see, and hear what's happening along a signal way. If we work in the realm of audio, there are four most important ways to measure a signal, which are conceptually partly very different and offer a very broad perspective on audio signals if we always have all of them in the back of our heads. These are the following important ways: Numbers (actual sample values) Levels (such as RMS, LUFS, and dB FS) Transversal waves (waveform displays, so oscilloscopes) Spectra (an analysis of frequency components) There are many more ways to think about audio or signals in general, but these are the most common and important ones. Let's use them inside Max right away to observe their different behavior. We'll feed some very basic signals into them: DC offset, a sinusoid, and noise. The one that might surprise you the most and get you thinking is the constant signal or DC offset (if it's digital-analog converted). In the following screenshot, you can see how the different displays react: In general, one might think, we don't want any constant signals at all; we don't want any DC offset. However, we will use audio signals a lot to control things later, say, an LFO or sequencers that should run with great timing accuracy. Also, sometimes, we just add a DC offset to our audio streams by accident. You can see in the preceding screenshot, that a very slowly moving or constant signal can be observed best by looking at its value directly, for example, using the [number~] object. In a level display, the [meter~] or [levelmeter~] objects will seem to imply that the incoming signal is very loud, in fact, it should be at -6 dB Full Scale (FS). As it is very loud, we just can't hear anything since the frequency is infinitely low. This is reflected by the spectrum display too; we see a very low frequency at -6 dB. In theory, we should just see an infinitely thin spike at 0 Hz, so everything else can be considered an (inevitable but reducible) measuring error. Audio synthesis Awareness of these possibilities of viewing a signal and their constraints, and knowing how they actually work, will greatly increase our productivity. So let's get to actually synthesizing some waveforms. A good example of different views of a signal operation is Amplitude Modulation (AM); we will also try to formulate some other general principles using the example of AM. Amplitude modulation Amplitude modulation means the multiplication of a signal with an oscillator. This provides a method of generating sidebands, which is partial in a very easy, intuitive, and CPU-efficient way. Amplitude modulation seems like a word that has a very broad meaning and can be used as soon as we change a signal's amplitude by another signal. While this might be true, in the context of audio synthesis, it very specifically means the multiplication of two (most often sine) oscillators. Moreover, there is a distinction between AM and Ring Modulation. But before we get to this distinction, let's look at the following simple multiplication of two sine waves, and we are first going to look at the result in an oscilloscope as a wave: So in the preceding screenshot, we can see the two sine waves and their product. If we imagine every pair of samples being multiplied, the operation seems pretty intuitive as the result is what we would expect. But what does this resulting wave really mean besides looking like a product of two sine waves? What does it sound like? The wave seems to have stayed in there certainly, right? Well, viewing the product as a wave and looking at the whole process in the time domain rather than the frequency domain is helpful but slightly misleading. So let's jump over to the following frequency domain and look what's happening with the spectrum: So we can observe here that if we multiply a sine wave a with a sine wave b, a having a frequency of 1000 Hz and b a frequency of 100 Hz, we end up with two sine waves, one at 900 Hz and another at 1100 Hz. The original sine waves have disappeared. In general, we can say that the result of multiplying a and b is equal to adding and subtracting the frequencies. This is shown in the Equivalence to Sum and difference subpatcher (in the following screenshot, the two inlets to the spectrum display overlap completely, which might be hard to see): So in the preceding screenshot, you see a basic AM patcher that produces sidebands that we can predict quite easily. Multiplication is commutative; you will say, 1000 + 100 = 1100, 1000 - 100 = 900; that's alright, but what about 100 - 1000 and 100 + 1000? We get -900 and 1100 once again? It still works out, and the fact that it does has to do with negative frequencies, or the symmetry of a real frequency spectrum around 0. So you can see that the two ways of looking at our signal and thinking about AM lend different opportunities and pitfalls. Here is another way to think about AM: it's the convolution of the two spectra. We didn't talk about convolution yet; we will at a later point. But keep it in mind or do a little research on your own; this aspect of AM is yet another interesting one. Ring modulation versus amplitude modulation The difference between ring modulation and what we call AM in this context is that the former one uses a bipolar modulator and the latter one uses a unipolar one. So actually, this is just about scaling and offsetting one of the factors. The difference in the outcome is yet a big one; if we keep one oscillator unipolar, the other one will be present in the outcome. If we do so, it starts making sense to call one oscillator on the carrier and the other (unipolar) on the modulator. Also, it therefore introduces a modulation depth that controls the amplitude of the sidebands. In the following screenshot, you can see the resulting spectrum; we have the original signal, so the carrier plus two sidebands, which are the original signals, are shifted up and down: Therefore, you can see that AM has a possibility to roughen up our spectrum, which means we can use it to let through an original spectrum and add sidebands. Tremolo Tremolo (from the Latin word tremare, to shake or tremble) is a musical term, which means to change a sound's amplitude in regular short intervals. Many people confuse it with vibrato, which is a modulating pitch at regular intervals. AM is tremolo and FM is vibrato, and as a simple reminder, think that the V of vibrato is closer to the F of FM than to the A of AM. So multiplying the two oscillators' results in a different spectrum. But of course, we can also use multiplication to scale a signal and to change its amplitude. If we wanted to have a sine wave that has a tremolo, that is an oscillating variation in amplitude, with, say, a frequency of 1 Hertz, we would again multiply two sine waves, one with 1000 Hz for example and another with a frequency of 0.5 Hz. Why 0.5 Hz? Think about a sine wave; it has two peaks per cycle, a positive one and a negative one. We can visualize all that very well if we think about it in the time domain, looking at the result in an oscilloscope. But what about our view of the frequency domain? Well, let's go through it; when we multiply a sine with 1000 Hz and one with 0.5 Hz, we actually get two sine waves, one with 999.5 Hz and one with 100.5 Hz. Frequencies that close create beatings, since once in a while, their positive and negative peaks overlap, canceling out each other. In general, the frequency of the beating is defined by the difference in frequency, which is 1 Hz in this case. So we see, if we look at it this way, we come to the same result again of course, but this time, we actually think of two frequencies instead of one being attenuated. Lastly, we could have looked up trigonometric identities to anticipate what happens if we multiply two sine waves. We find the following: Here, φ and θ are the two angular frequencies multiplied by the time in seconds, for example: This is the equation for the 1000 Hz sine wave. Feedback Feedback always brings the complexity of a system to the next level. It can be used to stabilize a system, but can also make a given system unstable easily. In a strict sense, in the context of DSP, stability means that for a finite input to a system, we get finite output. Obviously, feedback can give us infinite output for a finite input. We can use attenuated feedback, for example, not only to make our AM patches recursive, adding more and more sidebands, but also to achieve some surprising results as we will see in a minute. Before we look at this application, let's quickly talk about feedback in general. In the digital domain, feedback always demands some amount of delay. This is because the evaluation of the chain of operations would otherwise resemble an infinite amount of operations on one sample. This is true for both the Max message domain (we get a stack overflow error if we use feedback without delaying or breaking the chain of events) and the MSP domain; audio will just stop working if we try it. So the minimum network for a feedback chain as a block diagram looks something like this: In the preceding screenshot, X is the input signal and x[n] is the current input sample; Y is the output signal and y[n] is the current output sample. In the block marked Z-m, i is a delay of m samples (m being a constant). Denoting a delay with Z-m comes from a mathematical construct named the Z-transform. The a term is also a constant used to attenuate the feedback circle. If no feedback is involved, it's sometimes helpful to think about block diagrams as processing whole signals. For example, if you think of a block diagram that consists only of multiplication with a constant, it would make a lot of sense to think of its output signal as a scaled version of the input signal. We wouldn't think of the network's processing or its output sample by sample. However, as soon as feedback is involved, without calculation or testing, this is the way we should think about the network. Before we look at the Max version of things, let's look at the difference equation of the network to get a better feeling of the notation. Try to find it yourself before looking at it too closely! In Max, or rather in MSP, we can introduce feedback as soon as we use a [tapin~] [tapout~] pair that introduces a delay. The minimum delay possible is the signal vector size. Another way is to simply use a [send~] and [receive~] pair in our loop. The [send~] and [receive~] pair will automatically introduce this minimum amount of delay if needed, so the delay will be introduced only if there is a feedback loop. If we need shorter delays and feedback, we have to go into the wonderful world of gen~. Here, our shortest delay time is one sample, and can be introduced via the [history] object. In the Fbdiagram.maxpat patcher, you can find a Max version, an MSP version, and a [gen~] version of our diagram. For the time being, let's just pretend that the gen domain is just another subpatcher/abstraction system that allows shorter delays with feedback and has a more limited set of objects that more or less work the same as the MSP ones. In the following screenshot, you can see the difference between the output of the MSP and the [gen~] domain. Obviously, the length of the delay time has quite an impact on the output. Also, don't forget that the MSP version's output will vary greatly depending on our vector size settings. Let's return to AM now. Feedback can, for example, be used to duplicate and shift our spectrum again and again. In the following screenshot, you can see a 1000 Hz sine wave that has been processed by a recursive AM to be duplicated and shifted up and down with a 100 Hz spacing: In the maybe surprising result, we can achieve with this technique is this: if we the modulating oscillator and the carrier have the same frequency, we end up with something that almost sounds like a sawtooth wave. Frequency modulation Frequency modulation or FM is a technique that allows us to create a lot of frequency components out of just two oscillators, which is why it was used a lot back in the days when oscillators were a rare, expensive good, or CPU performance was low. Still, especially when dealing with real-time synthesis, efficiency is a crucial factor, and the huge variety of sounds that can be achieved with just two oscillators and very few parameters can be very useful for live performance and so on. The idea of FM is of course to modulate an oscillator's frequency. The basic, admittedly useless form is depicted in the following screenshot: While trying to visualize what happens with the output in the time domain, we can imagine it as shown in the following screenshot. In the preceding screenshot, you can see the signal that is controlling the frequency. It is a sine wave with a frequency of 50 Hz, scaled and offset to range from -1000 to 5000, so the center or carrier frequency is 2000 Hz, which is modulated to an amount of 3000 Hz. You can see the output of the modulated oscillator in the following screenshot: If we extend the upper patch slightly, we end up with this: Although you can't see it in the screenshot, the sidebands are appearing with a 100 Hz spacing here, that is, with a spacing equal to the modulator's frequency. Pretty similar to AM right? But depending on the modulation amount, we get more and more sidebands. Controlling FM If the ratio between F(c) and F(m) is an integer, we end up with a harmonic spectrum, therefore, it may be more useful to rather control F(m) indirectly via a ratio parameter as it's done inside the SimpleRatioAndIndex subpatcher. Also, an Index parameter is typically introduced to make an FM patch even more controllable. The modulation index is defined as follows: Here, I is the index, Am is the amplitude of the modulation, what we called amount before, and fm is the modulator's frequency. So finally, after adding these two controls, we might arrive here: FM offers a wide range of possibilities, for example, the fact that we have a simple control for how harmonic/inharmonic our spectrum is can be useful to synthesize the mostly noisy attack phase of many instruments if we drive the ratio and index with an envelope as it's done in the SimpleEnvelopeDriven subpatcher. However, it's also very easy to synthesize very artificial, strange sounds. This basically has the following two reasons: Firstly, the partials appearing have amplitudes governed by Bessel functions that may seem quite unpredictable; the partials sometimes seem to have random amplitudes. Secondly, negative frequencies and fold back. If we generate partials with frequencies below 0 Hz, it is equivalent to creating the same positive frequency. For frequencies greater than the sample rate/2 (sample rate/2 is what's called the Nyquist rate), the frequencies reflect back into the spectrum that can be described by our sampling rate (this is an effect also called aliasing). So at a sampling rate of 44,100 Hz, a partial with a frequency of -100 Hz will appear at 100 Hz, and a partial with a frequency of 43100 kHz will appear at 1000 Hz, as shown in the following screenshot: So, for frequencies between the Nyquist frequency and the sampling frequency, what we hear is described by this: Here, fs is the sampling rate, f0 is the frequency we hear, and fi is the frequency we are trying to synthesize. Since FM leads to many partials, this effect can easily come up, and can both be used in an artistically interesting manner or sometimes appear as an unwanted error. In theory, an FM signal's partials extend to even infinity, but the amplitudes become negligibly small. If we want to reduce this behavior, the [poly~] object can be used to oversample the process, generating a bit more headroom for high frequencies. The phenomenon of aliasing can be understood by thinking of a real (in contrast to imaginary) digital signal as having a symmetrical and periodical spectrum; let's not go into too much detail here and look at it in the time domain: In the previous screenshot, we again tried to synthesize a sine wave with 43100 Hz (the dotted line) at a sampling rate of 44100 Hz. What we actually get is the straight black line, a sine with 1000 Hz. Each big black dot represents an actual sample, and there is only one single band-limited signal connecting them: the 1000 Hz wave that is only partly visible here (about half its wavelength). Feedback It is very common to use feedback with FM. We can even frequency modulate one oscillator with itself, making the algorithm even cheaper since we have only one table lookup. The idea of feedback FM quickly leads us to the idea of making networks of oscillators that can be modulated by each other, including feedback paths, but let's keep it simple for now. One might think that modulating one oscillator with itself should produce chaos; FM being a technique that is not the easiest to control, one shouldn't care for playing around with single operator feedback FM. But the opposite is the case. Single operator FM yields very predictable partials, as shown in the following screenshot, and in the Single OP FBFM subpatcher: Again, we are using a gen~ patch, since we want to create a feedback loop and are heading for a short delay in the loop. Note that we are using the [param] object to pass a message into the gen~ object. What should catch your attention is that although the carrier frequency has been adjusted to 1000 Hz, the fundamental frequency in the spectrum is around 600 Hz. What can help us here is switching to phase modulation. Phase modulation If you look at the gen~ patch in the previous screenshot, you see that we are driving our sine oscillator with a phasor. The cycle object's phase inlet assumes an input that ranges from 0 to 1 instead of from 0 to 2π, as one might think. To drive a sine wave through one full cycle in math, we can use a variable ranging from 0 to 2π, so in the following formula, you can imagine t being provided by a phasor, which is the running phase. The 2π multiplication isn't necessary in Max since if we are using [cycle~], we are reading out a wavetable actually instead of really computing the sine or cosine of the input. This is the most common form of denoting a running sinusoid with frequency f0 and phase φ. Try to come up with a formula that describes frequency modulation! Simplifying the phases by setting it to zero, we can denote FM as follows: This can be shown to be nearly identical to the following formula: Here, f0 is the frequency of the carrier, fm is the frequency of the modulator, and A is the modulation amount. Welcome to phase modulation. If you compare it, the previous formula actually just inserts a scaled sine wave where the phase φ used to be. So phase modulation is nearly identical to frequency modulation. Phase modulation has some advantages though, such as providing us with an easy method of synchronizing multiple oscillators. But let's go back to the Max side of things and look at a feedback phase modulation patch right away (ignoring simple phase modulation, since it really is so similar to FM): This gen~ patcher resides inside the One OP FBPM subpatcher and implements phase modulation using one oscillator and feedback. Interestingly, the spectrum is very similar to the one of a sawtooth wave, with the feedback amount having a similar effect of a low-pass filter, controlling the amount of partials. If you take a look at the subpatcher, you'll find the following three sound sources: Our feedback FM gen~ patcher A [saw~] object for comparison A poly~ object We have already mentioned the problem of aliasing and the [poly~] object has already been proposed to treat the problem. However, it allows us to define the quality of parts of patches in general, so let's talk about the object a bit before moving on since we will make great use of it. Before moving on, I would like to tell you that you can double-click on it to see what is loaded inside, and you will see that the subpatcher we just discussed contains a [poly~] object that contains yet another version of our gen~ patcher. Summary In this article, we've finally come to talking about audio. We've introduced some very common techniques and thought about refining them and getting things done properly and efficiently (think about poly~). By now, you should feel quite comfortable building synths that mix techniques such as FM, subtractive synthesis, and feature modulation, as well as using matrices for routing both audio and modulation signals where you need them. Further resources on this subject: Moodle for Online Communities [Article] Techniques for Creating a Multimedia Database [Article] Moodle 2.0 Multimedia: Working with 2D and 3D Maps [Article]
Read more
  • 0
  • 0
  • 14435

article-image-components
Packt
25 Nov 2014
14 min read
Save for later

Components

Packt
25 Nov 2014
14 min read
This article by Timothy Moran, author of Mastering KnockoutJS, teaches you how to use the new Knockout components feature. (For more resources related to this topic, see here.) In Version 3.2, Knockout added components using the combination of a template (view) with a viewmodel to create reusable, behavior-driven DOM objects. Knockout components are inspired by web components, a new (and experimental, at the time of writing this) set of standards that allow developers to define custom HTML elements paired with JavaScript that create packed controls. Like web components, Knockout allows the developer to use custom HTML tags to represent these components in the DOM. Knockout also allows components to be instantiated with a binding handler on standard HTML elements. Knockout binds components by injecting an HTML template, which is bound to its own viewmodel. This is probably the single largest feature Knockout has ever added to the core library. The reason we started with RequireJS is that components can optionally be loaded and defined with module loaders, including their HTML templates! This means that our entire application (even the HTML) can be defined in independent modules, instead of as a single hierarchy, and loaded asynchronously. The basic component registration Unlike extenders and binding handlers, which are created by just adding an object to Knockout, components are created by calling the ko.components.register function: ko.components.register('contact-list, { viewModel: function(params) { }, template: //template string or object }); This will create a new component named contact-list, which uses the object returned by the viewModel function as a binding context, and the template as its view. It is recommended that you use lowercase, dash-separated names for components so that they can easily be used as custom elements in your HTML. To use this newly created component, you can use a custom element or the component binding. All the following three tags produce equivalent results: <contact-list params="data: contacts"><contact-list> <div data-bind="component: { name: 'contact-list', params: { data: contacts }"></div> <!-- ko component: { name: 'contact-list', params: { data: contacts } --><!-- /ko --> Obviously, the custom element syntax is much cleaner and easier to read. It is important to note that custom elements cannot be self-closing tags. This is a restriction of the HTML parser and cannot be controlled by Knockout. There is one advantage of using the component binding: the name of the component can be an observable. If the name of the component changes, the previous component will be disposed (just like it would if a control flow binding removed it) and the new component will be initialized. The params attribute of custom elements work in a manner that is similar to the data-bind attribute. Comma-separated key/value pairs are parsed to create a property bag, which is given to the component. The values can contain JavaScript literals, observable properties, or expressions. It is also possible to register a component without a viewmodel, in which case, the object created by params is directly used as the binding context. To see this, we'll convert the list of contacts into a component: <contact-list params="contacts: displayContacts, edit: editContact, delete: deleteContact"> </contact-list> The HTML code for the list is replaced with a custom element with parameters for the list as well as callbacks for the two buttons, which are edit and delete: ko.components.register('contact-list', { template: '<ul class="list-unstyled" data-bind="foreach: contacts">'    +'<li>'      +'<h3>'        +'<span data-bind="text: displayName"></span> <small data-          bind="text: phoneNumber"></small> '        +'<button class="btn btn-sm btn-default" data-bind="click:          $parent.edit">Edit</button> '        +'<button class="btn btn-sm btn-danger" data-bind="click:          $parent.delete">Delete</button>'      +'</h3>'    +'</li>' +'</ul>' }); This component registration uses an inline template. Everything still looks and works the same, but the resulting HTML now includes our custom element. Custom elements in IE 8 and higher IE 9 and later versions as well as all other major browsers have no issue with seeing custom elements in the DOM before they have been registered. However, older versions of IE will remove the element if it hasn't been registered. The registration can be done either with Knockout, with ko.components.register('component-name'), or with the standard document.createElement('component-name') expression statement. One of these must come before the custom element, either by the script containing them being first in the DOM, or by the custom element being added at runtime. When using RequireJS, being in the DOM first won't help as the loading is asynchronous. If you need to support older IE versions, it is recommended that you include a separate script to register the custom element names at the top of the body tag or in the head tag: <!DOCTYPE html> <html> <body>    <script>      document.createElement('my-custom-element');    </script>    <script src='require.js' data-main='app/startup'></script>      <my-custom-element></my-custom-element> </body> </html> Once this has been done, components will work in IE 6 and higher even with custom elements. Template registration The template property of the configuration sent to register can take any of the following formats: ko.components.register('component-name', { template: [OPTION] }); The element ID Consider the following code statement: template: { element: 'component-template' } If you specify the ID of an element in the DOM, the contents of that element will be used as the template for the component. Although it isn't supported in IE yet, the template element is a good candidate, as browsers do not visually render the contents of template elements. The element instance Consider the following code statement: template: { element: instance } You can pass a real DOM element to the template to be used. This might be useful in a scenario where the template was constructed programmatically. Like the element ID method, only the contents of the elements will be used as the template: var template = document.getElementById('contact-list-template'); ko.components.register('contact-list', { template: { element: template } }); An array of DOM nodes Consider the following code statement: template: [nodes] If you pass an array of DOM nodes to the template configuration, then the entire array will be used as a template and not just the descendants: var template = document.getElementById('contact-list-template') nodes = Array.prototype.slice.call(template.content.childNodes); ko.components.register('contact-list', { template: nodes }); Document fragments Consider the following code statement: template: documentFragmentInstance If you pass a document fragment, the entire fragment will be used as a template instead of just the descendants: var template = document.getElementById('contact-list-template'); ko.components.register('contact-list', { template: template.content }); This example works because template elements wrap their contents in a document fragment in order to stop the normal rendering. Using the content is the same method that Knockout uses internally when a template element is supplied. HTML strings We already saw an example for HTML strings in the previous section. While using the value inline is probably uncommon, supplying a string would be an easy thing to do if your build system provided it for you. Registering templates using the AMD module Consider the following code statement: template: { require: 'module/path' } If a require property is passed to the configuration object of a template, the default module loader will load the module and use it as the template. The module can return any of the preceding formats. This is especially useful for the RequireJS text plugin: ko.components.register('contact-list', { template: { require: 'text!contact-list.html'} }); Using this method, we can extract the HTML template into its own file, drastically improving its organization. By itself, this is a huge benefit to development. The viewmodel registration Like template registration, viewmodels can be registered using several different formats. To demonstrate this, we'll use a simple viewmodel of our contacts list components: function ListViewmodel(params) { this.contacts = params.contacts; this.edit = params.edit; this.delete = function(contact) {    console.log('Mock Deleting Contact', ko.toJS(contact)); }; }; To verify that things are getting wired up properly, you'll want something interactive; hence, we use the fake delete function. The constructor function Consider the following code statement: viewModel: Constructor If you supply a function to the viewModel property, it will be treated as a constructor. When the component is instantiated, new will be called on the function, with the params object as its first parameter: ko.components.register('contact-list', { template: { require: 'text!contact-list.html'}, viewModel: ListViewmodel //Defined above }); A singleton object Consider the following code statement: viewModel: { instance: singleton } If you want all your component instances to be backed by a shared object—though this is not recommended—you can pass it as the instance property of a configuration object. Because the object is shared, parameters cannot be passed to the viewmodel using this method. The factory function Consider the following code statement: viewModel: { createViewModel: function(params, componentInfo) {} } This method is useful because it supplies the container element of the component to the second parameter on componentInfo.element. It also provides you with the opportunity to perform any other setup, such as modifying or extending the constructor parameters. The createViewModel function should return an instance of a viewmodel component: ko.components.register('contact-list', { template: { require: 'text!contact-list.html'}, viewModel: { createViewModel: function(params, componentInfo) {    console.log('Initializing component for',      componentInfo.element);    return new ListViewmodel(params); }} }); Registering viewmodels using an AMD module Consider the following code statement: viewModel: { require: 'module-path' } Just like templates, viewmodels can be registered with an AMD module that returns any of the preceding formats. Registering AMD In addition to registering the template and the viewmodel as AMD modules individually, you can register the entire component with a require call: ko.components.register('contact-list', { require: 'contact-list' }); The AMD module will return the entire component configuration: define(['knockout', 'text!contact-list.html'], function(ko, templateString) {   function ListViewmodel(params) {    this.contacts = params.contacts;    this.edit = params.edit;    this.delete = function(contact) {      console.log('Mock Deleting Contact', ko.toJS(contact));    }; }   return { template: templateString, viewModel: ListViewmodel }; }); As the Knockout documentation points out, this method has several benefits: The registration call is just a require path, which is easy to manage. The component is composed of two parts: a JavaScript module and an HTML module. This provides both simple organization and clean separation. The RequireJS optimizer, which is r.js, can use the text dependency on the HTML module to bundle the HTML code with the bundled output. This means your entire application, including the HTML templates, can be a single file in production (or a collection of bundles if you want to take advantage of lazy loading). Observing changes in component parameters Component parameters will be passed via the params object to the component's viewmodel in one of the following three ways: No observable expression evaluation needs to occur, and the value is passed literally: <component params="name: 'Timothy Moran'"></component> <component params="name: nonObservableProperty"> </component> <component params="name: observableProperty"></component> <component params="name: viewModel.observableSubProperty "></component> In all of these cases, the value is passed directly to the component on the params object. This means that changes to these values will change the property on the instantiating viewmodel, except for the first case (literal values). Observable values can be subscribed to normally. An observable expression needs to be evaluated, so it is wrapped in a computed observable: <component params="name: name() + '!'"></component> In this case, params.name is not the original property. Calling params.name() will evaluate the computed wrapper. Trying to modify the value will fail, as the computed value is not writable. The value can be subscribed to normally. An observable expression evaluates an observable instance, so it is wrapped in an observable that unwraps the result of the expression: <component params="name: isFormal() ? firstName : lastName"></component> In this example, firstName and lastName are both observable properties. If calling params.name() returned the observable, you will need to call params.name()() to get the actual value, which is rather ugly. Instead, Knockout automatically unwraps the expression so that calling params.name() returns the actual value of either firstName or lastName. If you need to access the actual observable instances to, for example, write a value to them, trying to write to params.name will fail, as it is a computed observable. To get the unwrapped value, you can use the params.$raw object, which provides the unwrapped values. In this case, you can update the name by calling params.$raw.name('New'). In general, this case should be avoided by removing the logic from the binding expression and placing it in a computed observable in the viewmodel. The component's life cycle When a component binding is applied, Knockout takes the following steps. The component loader asynchronously creates the viewmodel factory and template. This result is cached so that it is only performed once per component. The template is cloned and injected into the container (either the custom element or the element with the component binding). If the component has a viewmodel, it is instantiated. This is done synchronously. The component is bound to either the viewmodel or the params object. The component is left active until it is disposed. The component is disposed. If the viewmodel has a dispose method, it is called, and then the template is removed from the DOM. The component's disposal If the component is removed from the DOM by Knockout, either because of the name of the component binding or a control flow binding being changed (for example, if and foreach), the component will be disposed. If the component's viewmodel has a dispose function, it will be called. Normal Knockout bindings in the components view will be automatically disposed, just as they would in a normal control flow situation. However, anything set up by the viewmodel needs to be manually cleaned up. Some examples of viewmodel cleanup include the following: The setInterval callbacks can be removed with clearInterval. Computed observables can be removed by calling their dispose method. Pure computed observables don't need to be disposed. Computed observables that are only used by bindings or other viewmodel properties also do not need to be disposed, as garbage collection will catch them. Observable subscriptions can be disposed by calling their dispose method. Event handlers can be created by components that are not part of a normal Knockout binding. Combining components with data bindings There is only one restriction of data-bind attributes that are used on custom elements with the component binding: the binding handlers cannot use controlsDescendantBindings. This isn't a new restriction; two bindings that control descendants cannot be on a single element, and since components control descendant bindings that cannot be combined with a binding handler that also controls descendants. It is worth remembering, though, as you might be inclined to place an if or foreach binding on a component; doing this will cause an error. Instead, wrap the component with an element or a containerless binding: <ul data-bind='foreach: allProducts'> <product-details params='product: $data'></product-details> </ul> It's also worth noting that bindings such as text and html will replace the contents of the element they are on. When used with components, this will potentially result in the component being lost, so it's not a good idea. Summary In this article, we learned that the Knockout components feature gives you a powerful tool that will help you create reusable, behavior-driven DOM elements. Resources for Article: Further resources on this subject: Deploying a Vert.x application [Article] The Dialog Widget [Article] Top features of KnockoutJS [Article]
Read more
  • 0
  • 0
  • 2586

Packt
25 Nov 2014
7 min read
Save for later

Creating an Apache JMeter™ test workbench

Packt
25 Nov 2014
7 min read
This article is written by Colin Henderson, the author of Mastering GeoServer. This article will give you a brief introduction about how to create an Apache JMeter™ test workbench. (For more resources related to this topic, see here.) Before we can get into the nitty-gritty of creating a test workbench for Apache JMeter™, we must download and install it. Apache JMeter™ is a 100 percent Java application, which means that it will run on any platform provided there is a Java 6 or higher runtime environment present. The binaries can be downloaded from http://jmeter.apache.org/download_jmeter.cgi, and at the time of writing, the latest version is 2.11. No installation is required; just download the ZIP file and decompress it to a location you can access from a command-line prompt or shell environment. To launch JMeter on Linux, simply open shell and enter the following command: $ cd <path_to_jmeter>/bin$ ./jmeter To launch JMeter on Windows, simply open a command prompt and enter the following command: C:> cd <path_to_jmeter>\binC:> jmeter After a short time, JMeter GUI should appear, where we can construct our test plan. For ease and convenience, consider setting your system's PATH environment variable to the location of the JMeter bin directory. In future, you will be able to launch JMeter from the command line without having to CD first. The JMeter workbench will open with an empty configuration ready for us to construct our test strategy: The first thing we need to do is give our test plan a name; for now, let's call it GeoServer Stress Test. We can also provide some comments, which is good practice as it will help us remember for what reason we devised the test plan in future. To demonstrate the use of JMeter, we will create a very simple test plan. In this test plan, we will simulate a certain number of users hitting our GeoServer concurrently and requesting maps. To set this up, we first need to add Thread Group to our test plan. In a JMeter test, a thread is equivalent to a user: In the left-hand side menu, we need to right-click on the GeoServer Stress Test node and choose the Add | Threads (Users) | Thread Group menu option. This will add a child node to the test plan that we right-clicked on. The right-hand side panel provides options that we can set for the thread group to control how the user requests are executed. For example, we can name it something meaningful, such as Web Map Requests. In this test, we will simulate 30 users, making map requests over a total duration of 10 minutes, with a 10-second delay between each user starting. The number of users is set by entering a value for Number of Threads; in this case, 30. The Ramp-Up Period option controls the delay in starting each user by specifying the duration in which all the threads must start. So, in our case, we enter a duration of 300 seconds, which means all 30 users will be started by the end of 300 seconds. This equates to a 10-second delay between starting threads (300 / 30 = 10). Finally, we will set a duration for the test to run over by ticking the box for Scheduler, and then specifying a value of 600 seconds for Duration. By specifying a duration value, we override the End Time setting. Next, we need to provide some basic configuration elements for our test. First, we need to set the default parameters for all web requests. Right-click on the Web Map Requests thread group node that we just created, and then navigate to Add | Config Element | User Defined Variables. This will add a new node in which we can specify the default HTTP request parameters for our test: In the right-hand side panel, we can specify any number of variables. We can use these as replacement tokens later when we configure the web requests that will be sent during our test run. In this panel, we specify all the standard WMS query parameters that we don't anticipate changing across requests. Taking this approach is a good practice as it means that we can create a mix of tests using the same values, so if we change one, we don't have to change all the different test elements. To execute requests, we need to add Logic Controller. JMeter contains a lot of different logic controllers, but in this instance, we will use Simple Controller to execute a request. To add the controller, right-click on the Web Map Requests node and navigate to Add | Logic Controller | Simple Controller. A simple controller does not require any configuration; it is merely a container for activities we want to execute. In our case, we want the controller to read some data from our CSV file, and then execute an HTTP request to WMS. To do this, we need to add a CSV dataset configuration. Right-click on the Simple Controller node and navigate to Add | Config Element | CSV Data Set Config. The settings for the CSV data are pretty straightforward. The filename is set to the file that we generated previously, containing the random WMS request properties. The path can be specified as relative or absolute. The Variable Names property is where we specify the structure of the CSV file. The Recycle on EOF option is important as it means that the CSV file will be re-read when the end of the file is reached. Finally, we need to set Sharing mode to All threads to ensure the data can be used across threads. Next, we need to add a delay to our requests to simulate user activity; in this case, we will introduce a small delay of 5 seconds to simulate a user performing a map-pan operation. Right-click on the Simple Controller node, and then navigate to Add | Timer | Constant Timer: Simply specify the value we want the thread to be paused for in milliseconds. Finally, we need to add a JMeter sampler, which is the unit that will actually perform the HTTP request. Right-click on the Simple Controller node and navigate to Add | Sampler | HTTP Request. This will add an HTTP Request sampler to the test plan: There is a lot of information that goes into this panel; however, all it does is construct an HTTP request that the thread will execute. We specify the server name or IP address along with the HTTP method to use. The important part of this panel is the Parameters tab, which is where we need to specify all the WMS request parameters. Notice that we used the tokens that we specified in the CSV Data Set Config and WMS Request Defaults configuration components. We use the ${token_name} token, and JMeter replaces the token with the appropriate value of the referenced variable. We configured our test plan, but before we execute it, we need to add some listeners to the plan. A JMeter listener is the component that will gather the information from all of the test runs that occur. We add listeners by right-clicking on the thread group node and then navigating to the Add | Listeners menu option. A list of available listeners is displayed, and we can select the one we want to add. For our purposes, we will add the Graph Results, Generate Summary Results, Summary Report, and Response Time Graph listeners. Each listener can have its output saved to a datafile for later review. When completed, our test plan structure should look like the following: Before executing the plan, we should save it for use later. Summary In this article, we looked at how Apache JMeter™ can be used to construct and execute test plans to place loads on our servers so that we can analyze the results and gain an understanding of how well our servers perform. Resources for Article: Further resources on this subject: Geo-Spatial Data in Python: Working with Geometry [article] Working with Geo-Spatial Data in Python [article] Getting Started with GeoServer [article]
Read more
  • 0
  • 0
  • 3458
Packt
25 Nov 2014
26 min read
Save for later

Detecting Beacons – Showing an Advert

Packt
25 Nov 2014
26 min read
In this article, by Craig Gilchrist, author of the book Learning iBeacon, we're going to expand our knowledge and get an in-depth understanding of the broadcasting triplet, and we'll expand on some of the important classes within the Core Location framework. (For more resources related to this topic, see here.) To help demonstrate the more in-depth concepts, we'll build an app that shows different advertisements depending on the major and minor values of the beacon that it detects. We'll be using the context of an imaginary department store called Matey's. Matey's are currently undergoing iBeacon trials in their flagship London store and at the moment are giving offers on their different-themed restaurants and also on their ladies clothing to users who are using their branded app. Uses of the UUID/major/minor broadcasting triplet In the last article, we covered the reasons behind the broadcasting triplet; we're going to use the triplet with a more realistic scenario. Let's go over the three values again in some more detail. UUID – Universally Unique Identifier The UUID is meant to be unique to your app. It can be spoofed, but generally, your app would be the only app looking for that UUID. The UUID identifies a region, which is the maximum broadcast range of a beacon from its center point. Think of a region as a circle of broadcast with the beacon in the middle. If lots of beacons with the same UUID have overlapping broadcasting ranges, then the region is represented by the broadcasting range of all the beacons combined as shown in the following figure. The combined range of all the beacons with the same UUID becomes the region. Broadcast range More specifically, the region is represented by an instance of the CLBeaconRegion class, which we'll cover in more detail later in this article. The following code shows how to configure CLBeaconRegion: NSString * uuidString = @"78BC6634-A424-4E05-A2AE-A59A25CAC4A9";   NSUUID * regionUUID; regionUUID = [[NSUUID alloc] initWithUUIDString:uuidString"];    CLBeaconRegion * region; region = [[CLBeaconRegion alloc] initWithProximityUUID: regionUUID identifier:@"My Region"]; Generally, most apps will be monitoring only for one region. This is normally sufficient since the major and minor values are 16-bit unsigned integers, which means that each value can be a number up to 65,535 giving 4,294,836,225 unique beacon combinations per UUID. Since the major and minor values are used to represent a subsection of the use case, there may be a time when 65,535 combinations of a major value may not be enough and so, this would be the rare time that your app can monitor multiple regions with different UUIDs. Another more likely example is that your app has multiple use cases, which are more logically split by UUID. An example where an app has multiple use cases would be a loyalty app that has offers for many different retailers when the app is within the vicinity of the retail stores. Here you can have a different UUID for every retailer. Major The major value further identifies your use case. The major value should separate your use case along logical categories. This could be sections in a shopping mall or exhibits in a museum. In our example, a use case of the major value represents the different types of service within a department store. In some cases, you may wish to separate logical categories into more than one major value. This would only be if each category has more than 65,535 beacons. Minor The minor value ultimately identifies the beacon itself. If you consider the major value as the category, then the minor value is the beacon within that category. Example of a use case The example laid out in this article uses the following UUID/major/minor values to broadcast different adverts for Matey's: Department Food Women's clothing UUID 8F0C1DDC-11E5-4A07-8910-425941B072F9 Major 1 2 Minor 1 30 percent off on sushi at The Japanese Kitchen 50 percent off on all ladies' clothing   2 Buy one get one free at Tucci's Pizza N/A Understanding Core Location The Core Location framework lets you determine the current location or heading associated with the device. The framework has been around since 2008 and was present in iOS 2.0. Up until the release of iOS 7, the framework was only used for geolocation based on GPS coordinates and so was suitable only for outdoor location. The framework got a new set of classes and new methods were added to the existing classes to accommodate the beacon-based location functionality. Let's explore a few of these classes in more detail. The CLBeaconRegion class Geo-fencing (geofencing) is a feature in a software program that uses the global positioning system (GPS) or radio frequency identification (RFID) to define geographical boundaries. A geofence is a virtual barrier. The CLBeaconRegion class defines a geofenced boundary identified by a UUID and the collective range of all physical beacons with the same UUID. When a device matching the CLBeaconRegion UUID comes in range, the region triggers the delivery of an appropriate notification. CLBeaconRegion inherits CLRegion, which also serves as the superclass of CLCircularRegion. The CLCircularRegion class defines the location and boundaries for a circular geographic region. You can use instances of this class to define geofences for a specific location, but it shouldn't be confused with CLBeaconRegion. The CLCircularRegion class shares many of the same methods but is specifically related to a geographic location based on the GPS coordinates of the device. The following figure shows the CLRegion class and its descendants. The CLRegion class hierarchy The CLLocationManager class The CLLocationManager class defines the interface for configuring the delivery of location-and heading-related events to your application. You use an instance of this class to establish the parameters that determine when location and heading events should be delivered and to start and stop the actual delivery of those events. You can also use a location manager object to retrieve the most recent location and heading data. Creating a CLLocationManager class The CLLocationManager class is used to track both geolocation and proximity based on beacons. To start tracking beacon regions using the CLLocationManager class, we need to do the following: Create an instance of CLLocationManager. Assign an object conforming to the CLLocationManagerDelegate protocol to the delegate property. Call the appropriate start method to begin the delivery of events. All location- and heading-related updates are delivered to the associated delegate object, which is a custom object that you provide. Defining a CLLocationManager class line by line Consider the following steps to define a CLLocationManager class line by line: Every class that needs to be notified about CLLocationManager events needs to first import the Core Location framework (usually in the header file) as shown: #import <CoreLocation/CoreLocation.h> Then, once the framework is imported, the class needs to declare itself as implementing the CLLocationManagerDelegate protocol like the following view controller does: @interface MyViewController :   UIViewController<CLLocationManagerDelegate> Next, you need to create an instance of CLLocationManager and set your class as the instance delegate of CLLocationManager as shown:    CLLocationManager * locationManager =       [[CLLocationManager alloc] init];    locationManager.delegate = self; You then need a region for your location manager to work with: // Create a unique ID to identify our region. NSUUID * regionId = [[NSUUID alloc]   initWithUUIDString:@" AD32373E-9969-4889-9507-C89FCD44F94E"];   // Create a region to monitor. CLBeaconRegion * beaconRegion =   [[CLBeaconRegion alloc] initWithProximityUUID: regionId identifier:@"My Region"]; Finally, you need to call the appropriate start method using the beacon region. Each start method has a different purpose, which we'll explain shortly: // Start monitoring and ranging beacons. [locationManager startMonitoringForRegion:beaconRegion]; [locationManager startRangingBeaconsInRegion:beaconRegion]; Once the class is imported, you need to implement the methods of the CLLocationManagerDelegate protocol. Some of the most important delegate methods are explained shortly. This isn't an exhaustive list of the methods, but it does include all of the important methods we'll be using in this article. locationManager:didEnterRegion Whenever you enter a region that your location manager has been instructed to look for (by calling startRangingBeaconsInRegion), the locationManager:didEnterRegion delegate method is called. This method gives you an opportunity to do something with the region such as start monitoring for specific beacons, shown as follows: -(void)locationManager:(CLLocationManager *) manager didEnterRegion:(CLRegion *)region {    // Do something when we enter a region. } locationManager:didExitRegion Similarly, when you exit the region, the locationManager:didExitRegion delegate method is called. Here you can do things like stop monitoring for specific beacons, shown as follows: -(void)locationManager:(CLLocationManager *)manager   didExitRegion:(CLRegion *)region {    // Do something when we exit a region. } When testing your region monitoring code on a device, realize that region events may not happen immediately after a region boundary is crossed. To prevent spurious notifications, iOS does not deliver region notifications until certain threshold conditions are met. Specifically, the user's location must cross the region boundary and move away from that boundary by a minimum distance and remain at that minimum distance for at least 20 seconds before the notifications are reported. locationManager:didRangeBeacons:inRegion The locationManager:didRangeBeacons:inRegion method is called whenever a beacon (or a number of beacons) change distance from the device. For now, it's enough to know that each beacon that's returned in this array has a property called proximity, which returns a CLProximity enum value (CLProximityUnknown, CLProximityFar, CLProximityNear, and CLProximityImmediate), shown as follows: -(void)locationManager:(CLLocationManager *)manager   didRangeBeacons:(NSArray *)beacons inRegion: (CLBeaconRegion *)region {    // Do something with the array of beacons. } locationManager:didChangeAuthorizationStatus Finally, there's one more delegate method to cover. Whenever the users grant or deny authorization to use their location, locationManager:didChangeAuthorizationStatus is called. This method is passed as a CLAuthorizationStatus enum (kCLAuthorizationStatusNotDetermined, kCLAuthorizationStatusRestricted, kCLAuthorizationStatusDenied, and kCLAuthorizationStatusAuthorized), shown as follows: -(void)locationManager:(CLLocationManager *)manager   didChangeAuthorizationStatus:(CLAuthorizationStatus)status {    // Do something with the array of beacons. } Understanding iBeacon permissions It's important to understand that apps using the Core Location framework are essentially monitoring location, and therefore, they have to ask the user for their permission. The authorization status of a given application is managed by the system and determined by several factors. Applications must be explicitly authorized to use location services by the user, and the current location services must themselves be enabled for the system. A request for user authorization is displayed automatically when your application first attempts to use location services. Requesting the location can be a fine balancing act. Asking for permission at a point in an app, when your user wouldn't think it was relevant, makes it more likely that they will decline it. It makes more sense to tell the users why you're requesting their location and why it benefits them before requesting it so as not to scare away your more squeamish users. Building those kinds of information views isn't covered in this book, but to demonstrate the way a user is asked for permission, our app should show an alert like this: Requesting location permission If your user taps Don't Allow, then the location can't be enabled through the app unless it's deleted and reinstalled. The only way to allow location after denying it is through the settings. Location permissions in iOS 8 Since iOS 8.0, additional steps are required to obtain location permissions. In order to request location in iOS 8.0, you must now provide a friendly message in the app's plist by using the NSLocationAlwaysUsageDescription key, and also make a call to the CLLocationManager class' requestAlwaysAuthorization method. The NSLocationAlwaysUsageDescription key describes the reason the app accesses the user's location information. Include this key when your app uses location services in a potentially nonobvious way while running in the foreground or the background. There are two types of location permission requests as of iOS 8 as specified by the following plist keys: NSLocationWhenInUseUsageDescription: This plist key is required when you use the requestAlwaysAuthorization method of the CLLocationManager class to request authorization for location services. If this key is not present and you call the requestAlwaysAuthorization method, the system ignores your request and prevents your app from using location services. NSLocationAlwaysUsageDescription: This key is required when you use the requestWhenInUseAuthorization method of the CLLocationManager class to request authorization for location services. If the key is not present when you call the requestWhenInUseAuthorization method without including this key, the system ignores your request. Since iBeacon requires location services in the background, we will only ever use the NSLocationAlwaysUsageDescription key with the call to the CLLocationManager class' requestAlwaysAuthorization. Enabling location after denying it If a user denies enabling location services, you can follow the given steps to enable the service again on iOS 7: Open the iOS device settings and tap on Privacy. Go to the Location Services section. Turn location services on for your app by flicking the switch next to your app name. When your device is running iOS 8, you need to follow these steps: Open the iOS device settings and tap on Privacy. Go to your app in the Settings menu. Tap on Privacy. Tap on Location Services. Set the Allow Location Access to Always. Building the tutorial app To demonstrate the knowledge gained in this article, we're going to build an app for our imaginary department store Matey's. Matey's is trialing iBeacons with their app Matey's offers. People with the app get special offers in store as we explained earlier. For the app, we're going to start a single view application containing two controllers. The first is the default view controller, which will act as our CLLocationManagerDelegate, the second is a view controller that will be shown modally and shows the details of the offer relating to the beacon we've come into proximity with. The final thing to consider is that we'll only show each offer once in a session and we can only show an offer if one isn't showing. Shall we begin? Creating the app Let's start by firing up Xcode and choosing a new single view application just as we did in the previous article. Choose these values for the new project: Product Name: Matey's Offers Organization Name: Learning iBeacon Company Identifier: com.learning-iBeacon Class Prefix: LI Devices: iPhone Your project should now contain your LIAppDelegate and LIViewController classes. We're not going to touch the app delegate this time round, but we'll need to add some code to the LIViewController class since this is where all of our CLLocationManager code will be running. For now though, let's leave it to come back to later. Adding CLOfferViewController Our offer view controller will be used as a modal view controller to show the offer relating to the beacon that we come in contact with. Each of our offers is going to be represented with a different background color, a title, and an image to demonstrate the offer. Be sure to download the code relating to this article and add the three images contained therein to your project by dragging the images from finder into the project navigator: ladiesclothing.jpg pizza.jpg sushi.jpg Next, we need to create the view controller. Add a new file and be sure to choose the template Objective-c class from the iOS Cocoa Touch menu. When prompted, name this class LIOfferViewController and make it a subclass of UIViewController. Setting location permission settings We need to add our permission message to the applications so that when we request permission for the location, our dialog appears: Click on the project file in the project navigator to show the project settings. Click the Info tab of the Matey's Offers target. Under the Custom iOS Target Properties dictionary, add the NSLocationAlwaysUsageDescription key with the value. This app needs your location to give you wonderful offers. Adding some controls The offer view controller needs two controls to show the offer the view is representing, an image view and a label. Consider the following steps to add some controls to the view controller: Open the LIOfferViewController.h file and add the following properties to the header: @property (nonatomic, strong) UILabel * offerLabel; @property (nonatomic, strong) UIImageView * offerImageView; Now, we need to create them. Open the LIOfferViewController.m file and first, let's synthesize the controls. Add the following code just below the @implementation LIOfferViewController line: @synthesize offerLabel; @synthesize offerImageView; We've declared the controls; now, we need to actually create them. Within the viewDidLoad method, we need to create the label and image view. We don't need to set the actual values or images of our controls. This will be done by LIViewController when it encounters a beacon. Create the label by adding the following code below the call to [super viewDidLoad]. This will instantiate the label making it 300 points wide and appear 10 points from the left and top: UILabel * label = [[UILabel alloc]   initWithFrame:CGRectMake(10, 10, 300, 100)]; Now, we need to set some properties to style the label. We want our label to be center aligned, white in color, and with bold text. We also want it to auto wrap when it's too wide to fit the 300 point width. Add the following code: label setTextAlignment:NSTextAlignmentCenter]; [label setTextColor:[UIColor whiteColor]]; [label setFont:[UIFont boldSystemFontOfSize:22.f]]; label.numberOfLines = 0; // Allow the label to auto wrap. Now, we need to add our new label to the view and assign it to our property: [self.view addSubview:label]; self.offerLabel = label; Next, we need to create an image. Our image needs a nice border; so to do this, we need to add the QuartzCore framework. Add the QuartzCore framework like we did with CoreLocation in the previous article, and come to mention it, we'll need CoreLocation; so, add that too. Once that's done add #import <QuartzCore/QuartzCore.h> to the top of the LIOfferViewController.m file. Now, add the following code to instantiate the image view and add it to our view: UIImageView * imageView = [[UIImageView alloc]   initWithFrame:CGRectMake(10, 120, 300, 300)]; [imageView.layer setBorderColor:[[UIColor   whiteColor] CGColor]]; [imageView.layer setBorderWidth:2.f]; imageView.contentMode = UIViewContentModeScaleToFill; [self.view addSubview:imageView]; self.offerImageView = imageView; Setting up our root view controller Let's jump to LIViewController now and start looking for beacons. We'll start by telling LIViewController that LIOfferViewController exists and also that the view controller should act as a location manager delegate. Consider the following steps: Open LIViewController.h and add an import to the top of the file: #import <CoreLocation/CoreLocation.h> #import "LIOfferViewController.h" Now, add the CLLocationManagerDelegate protocol to the declaration: @interface LIViewController :   UIViewController<CLLocationManagerDelegate> LIViewController also needs three things to manage its roll: A reference to the current offer on display so that we know to show only one offer at a time An instance of CLLocationManager for monitoring beacons A list of offers seen so that we only show each offer once Let's add these three things to the interface in the CLViewController.m file (as they're private instances). Change the LIViewController interface to look like this: @interface LIViewController ()    @property (nonatomic, strong) CLLocationManager *       locationManager;    @property (nonatomic, strong) NSMutableDictionary *       offersSeen;    @property (nonatomic, strong) LIOfferViewController *       currentOffer; @end Configuring our location manager Our location manager needs to be configured when the root view controller is first created, and also when the app becomes active. It makes sense therefore that we put this logic into a method. Our reset beacon method needs to do the following things: Clear down our list of offers seen Request permission to the user's location Create a region and set our LIViewController instance as the delegate Create a beacon region and tell CLLocationManager to start ranging beacons Let's add the code to do this now: -(void)resetBeacons { // Initialize the location manager. self.locationManager = [[CLLocationManager alloc] init]; self.locationManager.delegate = self;   // Request permission. [self.locationManager requestAlwaysAuthorization];   // Clear the offers seen. self.offersSeen = [[NSMutableDictionary alloc]   initWithCapacity:3];    // Create a region. NSUUID * regionId = [[NSUUID alloc] initWithUUIDString: @"8F0C1DDC-11E5-4A07-8910-425941B072F9"];   CLBeaconRegion * beaconRegion = [[CLBeaconRegion alloc]   initWithProximityUUID:regionId identifier:@"Mateys"];   // Start monitoring and ranging beacons. [self.locationManager stopRangingBeaconsInRegion:beaconRegion]; [self.locationManager startMonitoringForRegion:beaconRegion]; [self.locationManager startRangingBeaconsInRegion:beaconRegion]; } Now, add the two calls to the reset beacon to ensure that the location manager is reset when the app is first started and then every time the app becomes active. Let's add this code now by changing the viewDidLoad method and adding the applicationDidBecomeActive method: -(void)viewDidLoad {    [super viewDidLoad];    [self resetBeacons]; }   - (void)applicationDidBecomeActive:(UIApplication *)application {    [self resetBeacons]; } Wiring up CLLocationManagerDelegate Now, we need to wire up the delegate methods of the CLLocationManagerDelegate protocol so that CLViewController can show the offer view when the beacons come into proximity. The first thing we need to do is to set the background color of the view to show whether or not our app has been authorized to use the device location. If the authorization has not yet been determined, we'll use orange. If the app has been authorized, we'll use green. Finally, if the app has been denied, we'll use red. We'll be using the locationManager:didChangeAuthorizationStatus delegate method to do this. Let's add the code now: -(void)locationManager:(CLLocationManager *)manager   didChangeAuthorizationStatus:(CLAuthorizationStatus) status {    switch (status) {        case kCLAuthorizationStatusNotDetermined:        {            // Set a lovely orange background            [self.view setBackgroundColor:[UIColor               colorWithRed:255.f/255.f green:147.f/255.f               blue:61.f/255.f alpha:1.f]];            break;        }        case kCLAuthorizationStatusAuthorized:        {             // Set a lovely green background.            [self.view setBackgroundColor:[UIColor               colorWithRed:99.f/255.f green:185.f/255.f               blue:89.f/255.f alpha:1.f]];            break;        }        default:        {             // Set a dark red background.            [self.view setBackgroundColor:[UIColor               colorWithRed:188.f/255.f green:88.f/255.f               blue:88.f/255.f alpha:1.f]];            break;        }    } } The next thing we need to do is to save the battery life by stopping and starting the ranging of beacons when we're within the region (except for when the app first starts). We do this by calling the startRangingBeaconsInRegion method with the locationManager:didEnterRegion delegate method and calling the stopRangingBeaconsInRegion method within the locationManager:didExitRegion delegate method. Add the following code to do what we've just described: -(void)locationManager:(CLLocationManager *)manager   didEnterRegion:(CLRegion *)region {    [self.locationManager       startRangingBeaconsInRegion:(CLBeaconRegion*)region]; } -(void)locationManager:(CLLocationManager *)manager   didExitRegion:(CLRegion *)region {    [self.locationManager       stopRangingBeaconsInRegion:(CLBeaconRegion*)region]; } Showing the advert To actually show the advert, we need to capture when a beacon is ranged by adding the locationManager:didRangeBeacons:inRegion delegate method to LIViewController. This method will be called every time the distance changes from an already discovered beacon in our region or when a new beacon is found for the region. The implementation is quite long so I'm going to explain each part of the method as we write it. Start by creating the method implementation as follows: -(void)locationManager:(CLLocationManager *)manager   didRangeBeacons:(NSArray *)beacons inRegion: (CLBeaconRegion *)region {   } We only want to show an offer associated with the beacon if we've not seen it before and there isn't a current offer being shown. We do this by checking the currentOffer property. If this property isn't nil, it means an offer is already being displayed and so, we need to return from the method. The locationManager:didRangeBeacons:inRegion method gets called by the location manager and gets passed to the region instance and an array of beacons that are currently in range. We only want to see each advert once in a session and so need to loop through each of the beacons to determine if we've seen it before. Let's add a for loop to iterate through the beacons and in the beacon looping do an initial check to see if there's an offer already showing: for (CLBeacon * beacon in beacons) {    if (self.currentOffer) return; >} Our offersSeen property is NSMutableDictionary containing all the beacons (and subsequently offers) that we've already seen. The key consists of the major and minor values of the beacon in the format {major|minor}. Let's create a string using the major and minor values and check whether this string exists in our offersSeen property by adding the following code to the loop: NSString * majorMinorValue = [NSString stringWithFormat: @"%@|%@", beacon.major, beacon.minor]; if ([self.offersSeen objectForKey:majorMinorValue]) continue; If offersSeen contains the key, then we continue looping. If the offer hasn't been seen, then we need to add it to the offers that are seen, before presenting the offer. Let's start by adding the key to our offers that are seen in the dictionary and then preparing an instance of LIOfferViewController: [self.offersSeen setObject:[NSNumber numberWithBool:YES]   forKey:majorMinorValue]; LIOfferViewController * offerVc = [[LIOfferViewController alloc]   init]; offerVc.modalPresentationStyle = UIModalPresentationFullScreen; Now, we're going prepare some variables to configure the offer view controller. Food offers show with a blue background while clothing offers show with a red background. We use the major value of the beacon to determine the color and then find out the image and label based on the minor value: UIColor * backgroundColor; NSString * labelValue; UIImage * productImage;        // Major value 1 is food, 2 is clothing. if ([beacon.major intValue] == 1) {       // Blue signifies food.    backgroundColor = [UIColor colorWithRed:89.f/255.f       green:159.f/255.f blue:208.f/255.f alpha:1.f];       if ([beacon.minor intValue] == 1) {        labelValue = @"30% off sushi at the Japanese Kitchen.";        productImage = [UIImage imageNamed:@"sushi.jpg"];    }    else {        labelValue = @"Buy one get one free at           Tucci's Pizza.";        productImage = [UIImage imageNamed:@"pizza.jpg"];    } } else {    // Red signifies clothing.    backgroundColor = [UIColor colorWithRed:188.f/255.f       green:88.f/255.f blue:88.f/255.f alpha:1.f];    labelValue = @"50% off all ladies clothing.";    productImage = [UIImage imageNamed:@"ladiesclothing.jpg"]; } Finally, we need to set these values on the view controller and present it modally. We also need to set our currentOffer property to be the view controller so that we don't show more than one color at the same time: [offerVc.view setBackgroundColor:backgroundColor]; [offerVc.offerLabel setText:labelValue]; [offerVc.offerImageView setImage:productImage]; [self presentViewController:offerVc animated:YES  completion:nil]; self.currentOffer = offerVc; Dismissing the offer Since LIOfferViewController is a modal view, we're going to need a dismiss button; however, we also need some way of telling it to our root view controller (LIViewController). Consider the following steps: Add the following code to the LIViewController.h interface to declare a public method: -(void)offerDismissed; Now, add the implementation to LIViewController.h. This method simply clears the currentOffer property as the actual dismiss is handled by the offer view controller: -(void)offerDismissed {    self.currentOffer = nil; } Now, let's jump back to LIOfferViewController. Add the following code to the end of the viewDidLoad method of LIOfferViewController to create a dismiss button: UIButton * dismissButton = [[UIButton alloc]   initWithFrame:CGRectMake(60.f, 440.f, 200.f, 44.f)]; [self.view addSubview:dismissButton]; [dismissButton setTitle:@"Dismiss"   forState:UIControlStateNormal]; [dismissButton setTitleColor:[UIColor whiteColor] forState:UIControlStateNormal]; [dismissButton addTarget:self   action:@selector(dismissTapped:)   forControlEvents:UIControlEventTouchUpInside]; As you can see, the touch up event calls @selector(dismissTapped:), which doesn't exist yet. We can get a handle of LIViewController through the app delegate (which is an instance of LIAppDelegate). In order to use this, we need to import it and LIViewController. Add the following imports to the top of LIOfferViewController.m: #import "LIViewController.h" #import "LIAppDelegate.h" Finally, let's complete the tutorial by adding the dismissTapped method: -(void)dismissTapped:(UIButton*)sender {    [self dismissViewControllerAnimated:YES completion:^{        LIAppDelegate * delegate =           (LIAppDelegate*)[UIApplication           sharedApplication].delegate;        LIViewController * rootVc =           (LIViewController*)delegate.         window.rootViewController;        [rootVc offerDismissed];    }]; } Now, let's run our app. You should be presented with the location permission request as shown in the Requesting location permission figure, from the Understanding iBeacon permissions section. Tap on OK and then fire up the companion app. Play around with the Chapter 2 beacon configurations by turning them on and off. What you should see is something like the following figure: Our app working with the companion OS X app Remember that your app should only show one offer at a time and your beacon should only show each offer once per session. Summary Well done on completing your first real iBeacon powered app, which actually differentiates between beacons. In this article, we covered the real usage of UUID, major, and minor values. We also got introduced to the Core Location framework including the CLLocationManager class and its important delegate methods. We introduced the CLRegion class and discussed the permissions required when using CLLocationManager. Resources for Article: Further resources on this subject: Interacting with the User [Article] Physics with UIKit Dynamics [Article] BSD Socket Library [Article]
Read more
  • 0
  • 0
  • 8060

article-image-no-nodistinct
Packt
25 Nov 2014
4 min read
Save for later

No to nodistinct

Packt
25 Nov 2014
4 min read
This article is written by Stephen Redmond, the author of Mastering QlikView. There is a great skill in creating the right expression to calculate the right answer. Being able to do this in all circumstances relies on having a good knowledge of creating advanced expressions. Of course, the best path to mastery in this subject is actually getting out and doing it, but there is a great argument here for regularly practicing with dummy or test datasets. (For more resources related to this topic, see here.) When presented with a problem that needs to be solved, all the QlikView masters will not necessarily know immediately how to answer it. What they will have though is a very good idea of where to start, that is, what to try and what not to try. This is what I hope to impart to you here. Knowing how to create many advanced expressions will arm you to know where to apply them—and where not to apply them. This is one area of QlikView that is alien to many people. For some reason, they fear the whole idea of concepts such as Aggr. However, the reality is that these concepts are actually very simple and supremely logical. Once you get your head around them, you will wonder what all the fuss was about. No to nodistinct The Aggr function has as an optional clause, that is, the possibility of stating that the aggregation will be either distinct or nodistinct. The default option is distinct, and as such, is rarely ever stated. In this default operation, the aggregation will only produce distinct results for every combination of dimensions—just as you would expect from a normal chart or straight table. The nodistinct option only makes sense within a chart, one that has more dimensions than are in the Aggr statement. In this case, the granularity of the chart is lower than the granularity of Aggr, and therefore, QlikView will only calculate that Aggr for the first occurrence of lower granularity dimensions and will return null for the other rows. If we specify nodistinct, the same result will be calculated across all of the lower granularity dimensions. This can be difficult to understand without seeing an example, so let's look at a common use case for this option. We will start with a dataset: ProductSales:Load * Inline [Product, Territory, Year, SalesProduct A, Territory A, 2013, 100Product B, Territory A, 2013, 110Product A, Territory B, 2013, 120Product B, Territory B, 2013, 130Product A, Territory A, 2014, 140Product B, Territory A, 2014, 150Product A, Territory B, 2014, 160Product B, Territory B, 2014, 170]; We will build a report from this data using a pivot table: Now, we want to bring the value in the Total column into a new column under each year, perhaps to calculate a percentage for each year. We might think that, because the total is the sum for each Product and Territory, we might use an Aggr in the following manner: Sum(Aggr(Sum(Sales), Product, Territory)) However, as stated previously, because the chart includes an additional dimension (Year) than Aggr, the expression will only be calculated for the first occurrence of each of the lower granularity dimensions (in this case, for Year = 2013): The commonly suggested fix for this is to use Aggr without Sum and with nodistinct as shown: Aggr(NoDistinct Sum(Sales), Product, Territory) This will allow the Aggr expression to be calculated across all the Year dimension values, and at first, it will appear to solve the problem: The problem occurs when we decide to have a total row on this chart: As there is no aggregation function surrounding Aggr, it does not total correctly at the Product or Territory dimensions. We can't add an aggregation function, such as Sum, because it will break one of the other totals. However, there is something different that we can do; something that doesn't involve Aggr at all! We can use our old friend Total: Sum(Total<Product, Territory> Sales) This will calculate correctly at all the levels: There might be other use cases for using a nodistinct clause in Aggr, but they should be reviewed to see whether a simpler Total will work instead. Summary We discussed an important function, the Aggr function. We now know that the Aggr function is extremely useful, but we don't need to apply it in all circumstances where we have vertical calculations. Resources for Article: Further resources on this subject: Common QlikView script errors [article] Introducing QlikView elements [article] Creating sheet objects and starting new list using Qlikview 11 [article]
Read more
  • 0
  • 0
  • 2023
Modal Close icon
Modal Close icon