Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-find-and-install-add-ons-expand-plone-functionality
Packt
10 Jun 2010
11 min read
Save for later

Find and Install Add-Ons that Expand Plone Functionality

Packt
10 Jun 2010
11 min read
(For more resources on Plone, see here.) Background It seems like every application platform uses a different name for its add-ons: modules, components, libraries, packages, extensions, plug-ins, and more. Add-on packages for the Zope web application server are generally called Products. A Zope product is a bundle of Zope or Plone functionality contained in a Python module or modules. Like Plone, add-on products are distributed in source code, so that you may always read and examine them. Plone itself is actually a set of tightly connected Zope products and Python modules. Plone add-on products may be divided into three major categories: Skins or themes that change Plone’s look and feel or add visual elements like portlets. These are typically the simplest of Plone products. Products that add new content types with specialized functionality. Some are simple extensions of built-in types, others have custom workflows and behaviours. Products that add to or change the behaviour of Plone itself. Where to Find Products Plone.org’s Products section at http://plone.org/products is the place to look for Plone products. At the time of this writing, the Plone.org contains listings for 765 products and 1,901 product releases. The Plone Products section is itself built with a Plone product, the Plone Software Center – often called the PSC – that adds content types for projects, software releases, project roadmaps, issue trackers, and project documentation. Using the Plone Product Pages Visiting the Plone product pages for the first time may be a bewildering experience due to the number of available products. However, by specifying a product category and target Plone version, you will quickly narrow the product selection to the point where it’s worth reading descriptions and following the links to product pages. Product pages typically contain product descriptions, software releases, and a list of available documentation, issue tracker, version control repository, and contact resources. Each release will have release notes, a change log, and a list of Plone versions with which the release has been tested. If the release has a product package available, it will be available here for download. Some releases do not have associated software packages. This may be because the release is still in a planning stage, and the listing is mainly meant to document the product’s development roadmap; or because the development is still in an early stage, and the software is only available from a version-control repository. The release notes commonly include a list of dependencies, and you should make special note of that along with compatible Plone versions. Many products require the installation of other, supporting products. Some require that your server or test workstation have particular system libraries or utilities. Product pages may also have links to a variety of additional resources: product-specific documentation, other release pages, an issue tracker, a roadmap for future development, contact form for the project, and a version-control repository. Playing it Safe with Add-On Products Plone 3 is probably one of the most rigorously tested open-source software packages in existence. While no software is defect free, Plone’s core development team is on the leading edge of software development methodologies and work under a strong testing culture that requires that they prove their components work correctly before they ever become part of Plone. Plone’s library of add-on products is a very different story. Add-on products are contributed by a diverse community of developers. Some add-on products follow the same development and maintenance methodologies as Plone itself; others are haphazard experiments. To complicate matters, today’s haphazard experiment may be – if it succeeds – next year’s rigorously developed and reliable product. (Much of the Plone core codebase began as add-on products.) And, this year’s reliable standby may lose the devotion of its developers and not be upgraded to work with the next version of Plone. If you’re new to the world of open source software, this may seem dismaying. Don’t be discouraged. It is not hard to evaluate the status of a product, and the Plone community is happy to help. Be encouraged by evidence of continual, exciting innovation. Most importantly, stop thinking of yourself as a consumer. Take an interest in the community process that produces good products. Test some early releases and file bug reports and feature requests. Participate in, or help document, test, and fund the development of the products that are most important to you. Product Choice Strategy Trying out new Plone add-on products is great fun, but incorporating them into production websites requires planning and judgement if you’re going to have good long-run results. New versions of Plone pose a particular challenge. Major new releases of Plone don’t just add features: with every major version of Plone the application programming interface (API) and presentation templates change. This is not done arbitrarily, and there is usually a good deal of warning before a major change, but it means that add-on products often need to be updated before they will work with a major new version of Plone. Probably worthwhile to point out that major versions are released very ~18 months, and that minor version upgrades generally do not pose compatibility problems for the vast majority of add-on products. This means that when a new version of Plone appears on the scene, you won’t be able to migrate your Plone site to use it until compatible product versions are available for all the add-on products in use on the site. If you’re using mainstream, well-supported products, this may happen very quickly. Many products are upgraded to work with new Plone versions during the beta and release-candidate stages of Plone development. Some products take longer, and some may not make the jump at all. The products least likely to be updated are often ones made obsolete by new functionality. This creates a somewhat ironic situation when a new version of Plone arrives: the quickest adopters are often those with the least history with the platform. The slowest adopters are sometimes the sites that are most heavily invested in the new features. Consider, as a prime example, Plone.org, a very active, very large, community site which must be conservatively managed and stick with proven versions of add-on products. Plone.org often does not migrate to a new Plone version until many months after release. Is this a problem? Not really – unless you need both the newest features of the newest Plone version and the functionality of a more slowly developed add-on product. If that’s the case, prepare to make an investment of time or money in supporting product development and possibly writing some custom migration scripts. If you want to be more conservative, try the following strategy: Enjoy testing many products and keeping up with new developments by trying them out on a test server. Learn the built-in Plone functionality well, and use it in preference to add-on products whenever possible. Make sure you have a good understanding of the maturity level and degree of developer support for add-on products. Incorporate the smallest number of add-on products reasonably possible into your production sites. Don’t be just a consumer: when you commit to a product, help support it by filing bug reports and feature requests, contributing translations, documentation or code, and answering questions about it on the Plone mailing lists or #plone IRC channel. Evaluating a Product Judging the maturity of a Plone product is generally easy. Start with a product’s project page on Plone.org. The product page may offer you a "Current release" and one or more "Experimental releases". Anything marked as a current release should be stable on its tested Plone versions. If you need a release to work with an earlier version of Plone than the ones supported by the current release, follow the "List all releases..." link. Releases in the "Experimental" list will be marked as "alpha", "beta", or "Release Candidate." These terms are well-defined in practice: Alpha releases are truly experimental, and are usually posted in order to get early feedback. Interfaces and implementations are likely still in flux. Download an alpha release only for testing in an experimental environment, and only for purposes of previewing new features and giving feedback to developers. Do not plan on keeping any content you develop using an alpha release, as there may be no upgrade path to later releases. With a beta release, feature sets and programming interfaces should be stable or changing only incrementally. It’s reasonable to start testing the integration of the product with the platform and with other products. There will typically be an upgrade path to future releases. Bug reports will be welcome and will help develop the product. Release candidates have a fixed feature set and no known major issues. Templates and messages should be complete, so that translators may work on language files with some confidence that their work won’t be lost. If you encounter a bug in release-candidate products, please immediately file an issue report. Products may be re-released repeatedly at any release state. For alpha, beta and RC releases, each additional release changes the release count, but not the version number. So, "PloneFormGen 1.2" (Beta release 6) is the sixth beta release of version 1.2 of PloneFormGen. Once a product release reaches current release status, new releases for maintenance will increment the version number by 0.0.1. "PloneFormGen 1.1.3" is thus the third maintenance release of version 1.1 of that product. Don’t make too much of version numbers or release counts. Release status is a better indicator of maturity. If your site is mission-critical, don’t use beta releases on it. However, if you test carefully before deploying, you may find that some products are ready for live use when late in their beta development on sites where an error or glitch wouldn’t be intolerable. Testing a Product Conscientious Plone site administrators maintain an off-line mirror of their production sites on a secondary server – or even a desktop computer – that they may use for testing purposes. Always test a new product on a test server. Before deploying, test it on a server that has precisely the combination of products in use on your production server. Ideally, test with a copy of the database of your live server. Check the functionality of not only the new product, but also the products you’re already using. The latter is particularly important if you’re using products that alter the base functionality of Plone or Zope. Looking to the Future Evaluating product maturity and testing the product will help you judge its current status, but what about the future? What are the signs of a product that’s likely to be well-maintained and available for future versions of Plone? There are no guarantees, but here are some signs that experienced Plone integrators look for: Developing in public. This is open-source software. Look to see if the product is being developed with a public roadmap for the future, and with a public version-control repository. Plone.org provides product authors with great tools for indicating release plans, and makes a Subversion (SVN) version-control repository available to all product authors. Look to see if they’re using these facilities. Issue tracker status. Every released product should have a public issue (bug) tracker. Look for it. Look to see if it’s being maintained, and if issues are actively responded to. No issue tracker, or lots of old, uncategorized issues are bad signs. Support for multiple Plone versions. If a product has been around a while look to see if versions are available for at least a couple of Plone releases. This might be the previous and current releases, or the current and next releases. Internationalization. Excellent products attract translations. Good development methodologies. This is the hardest criterion for a non-developer to judge, but a forthcoming version of the Plone Software Center will ask developers to rate themselves on compliance with a set of community standards. My guess is that product developers will be pretty honest about these ratings. Several of these criteria have something in common: they allow the Plone community to participate in product maintenance and development. The best projects belong to the community, and not any single author. One of the best ways to get a quick read on the quality of an add-on product is to hop on the #plone IRC channel and ask. Chances are you’ll run into someone who can share their experiences and offer insight. You may even run into the product author him/herself!
Read more
  • 0
  • 0
  • 3367

article-image-article-movie-recommendation
Packt
16 Jun 2017
14 min read
Save for later

Article: Movie Recommendation

Packt
16 Jun 2017
14 min read
In this article by Robert Layton author of the book Learning Data Mining with Python - Second Edition is the second revision of Learning Data Mining with Python by Robert Layton improves upon the first book with updated examples, more in-depth discussion and exercises for your future development with data analytics. In this snippet from the book, we look at movie recommendation with a technique known as Affinity Analysis. (For more resources related to this topic, see here.) Affinity Analysis Affinity Analysis is the task of determining when objects are used in similar ways. We focused on whether the objects themselves are similar. The data for Affinity Analysis are often described in the form of a transaction. Intuitively, this comes from a transaction at a store—determining when objects are purchased together as a way to recommend products to users that they might purchase. Other use cases for Affinity Analysis include: Fraud detection Customer segmentation Software optimization Product recommendations Affinity Analysis is usually much more exploratory than classification. At the very least, we often simply rank the results and choose the top 5 recommendations (or some other number), rather than expect the algorithm to give us a specific answer. Algorithms for Affinity Analysis A brute force solution, testing all possible combinations, is not efficient enough for real-world use. We could expect even a small store to have hundreds of items for sale, while many online stores would have thousands (or millions!). As we add more items, the time it takes to compute all rules increases significantly faster. Specifically, the total possible number of rules is 2n - 1. Even the drastic increase in computing power couldn't possibly keep up with the increases in the number of items stored online. Therefore, we need algorithms that work smarter, as opposed to computers that work harder. The Apriori algorithm addresses the exponential problem of creating sets of items that occur frequently within a database, called frequent itemsets. Once these frequent itemsets are discovered, creating association rules is straightforward. The intuition behind Apriori is both simple and clever. First, we ensure that a rule has sufficient support within the dataset. Defining a minimum support level is the key parameter for Apriori. To build a frequent itemset, for an itemset (A, B) to have a support of at least 30, both A and B must occur at least 30 times in the database. This property extends to larger sets as well. For an itemset (A, B, C, D) to be considered frequent, the set (A, B, C) must also be frequent (as must D). Apriori discovers larger frequent itemsets by building off smaller frequent itemsets. The picture below outlines the full process: The Movie Recommendation Problem Product recommendation is a big business. Online stores use it to up-sell to customers by recommending other products that they could buy. Making better recommendations leads to better sales. When online shopping is selling to millions of customers every year, there is a lot of potential money to be made by selling more items to these customers. Grouplens, a research group at the University of Minnesota, has released several datasets that are often used for testing algorithms in this area. They have released several versions of a movie rating dataset, which have different sizes. There is a version with 100,000 reviews, one with 1 million reviews and one with 10 million reviews. The datasets are available from http://grouplens.org/datasets/movielens/ and the dataset we are going to use in this article is the MovieLens 100K dataset (with 100,000 reviews). Download this dataset and unzip it in your data folder. Start a new Jupyter Notebook and type the following code: import os import pandas as pd data_folder = os.path.join(os.path.expanduser("~"), "Data", "ml-100k") ratings_filename = os.path.join(data_folder, "u.data") Ensure that ratings_filename points to the u.data file in the unzipped folder. Loading with pandas The MovieLens dataset is in a good shape; however, there are some changes from the default options in pandas.read_csv that we need to make. When loading the file, we set the delimiter parameter to the tab character, tell pandas not to read the first row as the header (with header=None) and to set the column names with given values. Let's look at the following code: all_ratings = pd.read_csv(ratings_filename, delimiter="t", header=None, names = ["UserID", "MovieID", "Rating", "Datetime"]) While we won't use it in this article, you can properly parse the date timestamp using the following line. Dates for reviews can be an important feature in recommendation prediction, as movies that are rated together often have more similar rankings than movies ranked separately. Accounting for this can improve models significantly. all_ratings["Datetime"] = pd.to_datetime(all_ratings['Datetime'], unit='s') Understanding the Apriori algorithm and its implementation The goal of this article is to produce rules of the following form: if a person recommends this set of movies, they will also recommend this movie. We will also discuss extensions where a person recommends a set of movies is likely to recommend another particular movie. To do this, we first need to determine if a person recommends a movie. We can do this by creating a new feature Favorable, which is True if the person gave a favorable review to a movie: all_ratings["Favorable"] = all_ratings["Rating"] > 3 We will sample our dataset to form a training data. This also helps reduce the size of the dataset that will be searched, making the Apriori algorithm run faster. We obtain all reviews from the first 200 users: ratings = all_ratings[all_ratings['UserID'].isin(range(200))] Next, we can create a dataset of only the favorable reviews in our sample: favorable_ratings = ratings[ratings["Favorable"]] We will be searching the user's favorable reviews for our itemsets. So, the next thing we need is the movies which each user has given a favorable rating. We can compute this by grouping the dataset by the UserID and iterating over the movies in each group: favorable_reviews_by_users = dict((k, frozenset(v.values)) for k, v in favorable_ratings.groupby("UserID")["MovieID"]) In the preceding code, we stored the values as a frozenset, allowing us to quickly check if a movie has been rated by a user. Sets are much faster than lists for this type of operation, and we will use them in a later code. Finally, we can create a DataFrame that tells us how frequently each movie has been given a favorable review: num_favorable_by_movie = ratings[["MovieID", "Favorable"]].groupby("MovieID").sum() We can see the top five movies by running the following code: num_favorable_by_movie.sort_values(by="Favorable", ascending=False).head() Implementing the Apriori algorithm On the first iteration of Apriori, the newly discovered itemsets will have a length of 2, as they will be supersets of the initial itemsets created in the first step. On the second iteration (after applying the fourth step and going back to step 2), the newly discovered itemsets will have a length of 3. This allows us to quickly identify the newly discovered itemsets, as needed in the second step. We can store our discovered frequent itemsets in a dictionary, where the key is the length of the itemsets. This allows us to quickly access the itemsets of a given length, and therefore the most recently discovered frequent itemsets, with the help of the following code: frequent_itemsets = {} We also need to define the minimum support needed for an itemset to be considered frequent. This value is chosen based on the dataset but try different values to see how that affects the result. I recommend only changing it by 10 percent at a time though, as the time the algorithm takes to run will be significantly different! Let's set a minimum support value: min_support = 50 To implement the first step of the Apriori algorithm, we create an itemset with each movie individually and test if the itemset is frequent. We use frozenset, as they allow us to perform faster set-based operations later on, and they can also be used as keys in our counting dictionary (normal sets cannot). Let's look at the following example of frozenset code: frequent_itemsets[1] = dict((frozenset((movie_id,)), row["Favorable"]) for movie_id, row in num_favorable_by_movie.iterrows() if row["Favorable"] > min_support) We implement the second and third steps together for efficiency by creating a function that takes the newly discovered frequent itemsets, creates the supersets, and then tests if they are frequent. First, we set up the function to perform these steps: from collections import defaultdict def find_frequent_itemsets(favorable_reviews_by_users, k_1_itemsets, min_support): counts = defaultdict(int) for user, reviews in favorable_reviews_by_users.items(): for itemset in k_1_itemsets: if itemset.issubset(reviews): for other_reviewed_movie in reviews - itemset: current_superset = itemset | frozenset((other_reviewed_movie,)) counts[current_superset] += 1 return dict([(itemset, frequency) for itemset, frequency in counts.items() if frequency >= min_support]) In keeping with our rule of thumb of reading through the data as little as possible, we iterate over the dataset once per call to this function. While this doesn't matter too much in this implementation (our dataset is relatively small compared to the average computer), single-pass is a good practice to get into for larger applications. Let's have a look at the core of this function in detail. We iterate through each user, and each of the previously discovered itemsets, and then check if it is a subset of the current set of reviews, which are stored in k_1_itemsets (note that here, k_1 means k-1). If it is, this means that the user has reviewed each movie in the itemset. This is done by the itemset.issubset(reviews) line. We can then go through each individual movie that the user has reviewed (that is not already in the itemset), create a superset by combining the itemset with the new movie and record that we saw this superset in our counting dictionary. These are the candidate frequent itemsets for this value of k. We end our function by testing which of the candidate itemsets have enough support to be considered frequent and return only those that have a support more than our min_support value. This function forms the heart of our Apriori implementation and we now create a loop that iterates over the steps of the larger algorithm, storing the new itemsets as we increase k from 1 to a maximum value. In this loop, k represents the length of the soon-to-be discovered frequent itemsets, allowing us to access the previously most discovered ones by looking in our frequent_itemsets dictionary using the key k - 1. We create the frequent itemsets and store them in our dictionary by their length. Let's look at the code: for k in range(2, 20): # Generate candidates of length k, using the frequent itemsets of length k-1 # Only store the frequent itemsets cur_frequent_itemsets = find_frequent_itemsets(favorable_reviews_by_users, frequent_itemsets[k-1], min_support) if len(cur_frequent_itemsets) == 0: print("Did not find any frequent itemsets of length {}".format(k)) sys.stdout.flush() break else: print("I found {} frequent itemsets of length {}".format(len(cur_frequent_itemsets), k)) sys.stdout.flush() frequent_itemsets[k] = cur_frequent_itemsets Extracting association rules After the Apriori algorithm has completed, we have a list of frequent itemsets. These aren't exactly association rules, but they can easily be converted into these rules. For each itemset, we can generate a number of association rules by setting each movie to be the conclusion and the remaining movies as the premise.  candidate_rules = [] for itemset_length, itemset_counts in frequent_itemsets.items(): for itemset in itemset_counts.keys(): for conclusion in itemset: premise = itemset - set((conclusion,)) candidate_rules.append((premise, conclusion)) In these rules, the first partis the list of movies in the premise, while the number after it is the conclusion. In the first case, if a reviewer recommends movie 79, they are also likely to recommend movie 258. The process of computing confidence starts by creating dictionaries to store how many times we see the premise leading to the conclusion (a correct example of the rule) and how many times it doesn't (an incorrect example). We then iterate over all reviews and rules, working out whether the premise of the rule applies and, if it does, whether the conclusion is accurate. correct_counts = defaultdict(int) incorrect_counts = defaultdict(int) for user, reviews in favorable_reviews_by_users.items(): for candidate_rule in candidate_rules: premise, conclusion = candidate_rule if premise.issubset(reviews): if conclusion in reviews: correct_counts[candidate_rule] += 1 else: incorrect_counts[candidate_rule] += 1 We then compute the confidence for each rule by dividing the correct count by the total number of times the rule was seen: rule_confidence = {candidate_rule: (correct_counts[candidate_rule] / float(correct_counts[candidate_rule] + incorrect_counts[candidate_rule])) for candidate_rule in candidate_rules} Now we can print the top five rules by sorting this confidence dictionary and printing the results: from operator import itemgetter sorted_confidence = sorted(rule_confidence.items(), key=itemgetter(1), reverse=True) for index in range(5): print("Rule #{0}".format(index + 1)) premise, conclusion = sorted_confidence[index][0] print("Rule: If a person recommends {0} they will also recommend {1}".format(premise, conclusion)) print(" - Confidence: {0:.3f}".format(rule_confidence[(premise, conclusion)])) print("") The resulting printout shows only the movie IDs, which isn't very helpful without the names of the movies also. The dataset came with a file called u.items, which stores the movie names and their corresponding MovieID (as well as other information, such as the genre). We can load the titles from this file using pandas. Additional information about the file and categories is available in the README file that came with the dataset. The data in the files is in CSV format, but with data separated by the | symbol; it has no header and the encoding is important to set. The column names were found in the README file. movie_name_filename = os.path.join(data_folder, "u.item") movie_name_data = pd.read_csv(movie_name_filename, delimiter="|", header=None, encoding = "mac-roman") movie_name_data.columns = ["MovieID", "Title", "Release Date", "Video Release", "IMDB", "<UNK>", "Action", "Adventure", "Animation", "Children's", "Comedy", "Crime", "Documentary", "Drama", "Fantasy", "Film-Noir", "Horror", "Musical", "Mystery", "Romance", "Sci-Fi", "Thriller", "War", "Western"] Let's also create a helper function for finding the name of a movie by its ID: def get_movie_name(movie_id): title_object = movie_name_data[movie_name_data["MovieID"] == movie_id]["Title"] title = title_object.values[0] return title We can now adjust our previous code for printing out the top rules to also include the titles: for index in range(5): print("Rule #{0}".format(index + 1)) premise, conclusion = sorted_confidence[index][0] premise_names = ", ".join(get_movie_name(idx) for idx in premise) conclusion_name = get_movie_name(conclusion) print("Rule: If a person recommends {0} they will also recommend {1}".format(premise_names, conclusion_name)) print(" - Confidence: {0:.3f}".format(rule_confidence[(premise, conclusion)])) print("") The results gives a recommendation for movies, based on previous movies that person liked. Give it a shot and see if it matches your expectations! Learning Data Mining with Python In this short section of Learning Data Mining with Python, Revision 2, we performed Affinity Analysis in order to recommend movies based on a large set of reviewers. We did this in two stages. First, we found frequent itemsets in the data using the Apriori algorithm. Then, we created association rules from those itemsets. We performed training on a subset of our data in order to find the association rules, and then tested those rules on the rest of the data—a testing set. We could extend this concept to use cross-fold validation to better evaluate the rules. This would lead to a more robust evaluation of the quality of each rule. We cover topics such as classification, clusters, text analysis, image recognition, TensorFlow and Big Data. Each section comes with a practical real-world example, steps through the code in detail and provides suggestions for your to continue your (machine) learning. Summary In this article we have covered more in-depth discussion and exercises for your future development with data analytics. In this snippet from the book, we look at movie recommendation with a technique known as Affinity Analysis. The most recent upgrades to the HTMLG online editor are the tag manager and the attribute filter. Try it for free and purchase a subscription if you like it! Resources for Article: Further resources on this subject: Expanding Your Data Mining Toolbox [article] Data mining [article] Big Data Analysis [article]
Read more
  • 0
  • 0
  • 3366

article-image-overview-web-services-sakai
Packt
06 Jul 2011
16 min read
Save for later

An overview of web services in Sakai

Packt
06 Jul 2011
16 min read
Connecting to Sakai is straightforward, and simple tasks, such as automatic course creation, take only a few lines of programming effort. There are significant advantages to having web services in the enterprise. If a developer writes an application that calls a number of web services, then the application does not need to know the hidden details behind the services. It just needs to agree on what data to send. This loosely couples the application to the services. Later, if you can replace one web service with another, programmers do not need to change the code on the application side. SOAP works well with most organizations' firewalls, as SOAP uses the same protocol as web browsers. System administrators have a tendency to protect an organization's network by closing unused ports to the outside world. This means that most of the time there is no extra network configuration effort required to enable web services. Another simplifying factor is that a programmer does not need to know the details of SOAP or REST, as there are libraries and frameworks that hide the underlying magic. For the Sakai implementation of SOAP, to add a new service is as simple as writing a small amount of Java code within a text file, which is then compiled automatically and run the first time the service is called. This is great for rapid application development and deployment, as the system administrator does not need to restart Sakai for each change. Just as importantly, the Sakai services use the well-known libraries from the Apache Axis project. SOAP is an XML message passing protocol that, in the case of Sakai sites, sits on top of the Hyper Text Transfer Protocol (HTTP). HTTP is the protocol used by web browsers to obtain web pages from a server. The client sends messages in XML format to a service, including the information that the service needs. Then the service returns a message with the results or an error message. The architects introduced SOAP-based web services first to Sakai , adding RESTful services later. Unlike SOAP, instead of sending XML via HTTP posts to one URL that points to a service, REST sends to a URL that includes information about the entity, such as a user, with which the client wishes to interact. For example, a REST URL for viewing an address book item could look similar to http://host/direct/addressbook_item/15. Applying URLs in this way makes for understandable, human-readable address spaces. This more intuitive approach simplifies coding. Further, SOAP XML passing requires that the client and the server parse the XML and at times, the parsing effort is expensive in CPU cycles and response times. The Entity Broker is an internal service that makes life easier for programmers and helps them manipulate entities. Entities in Sakai are managed pieces of data such as representations of courses, users, grade books, and so on. In the newer versions of Sakai, the Entity Broker has the power to expose entities as RESTful services. In contrast, for SOAP services, if you wanted a new service, you would need to write it yourself. Over time, the Entity Broker exposes more and more entities RESTfully, delivering more hooks free to integrate with other enterprise systems. Both SOAP and REST services sit on top of the HTTP protocol. Protocols This section explains how web browsers talk to servers in order to gather web pages. It explains how to use the telnet command and a visual tool called TCPMON (http://ws.apache.org/commons/tcpmon/tcpmontutorial.html) to gain insight into how web services and Web 2.0 technologies work. Playing with Telnet It turns out that message passing occurs via text commands between the browser and the server. Web browsers use HTTP to get web pages and the embedded content from the server and to send form information to the server. HTTP talks between the client and server via text (7-bit ASCII) commands. When humans talk with each other, they have a wide vocabulary. However, HTTP uses fewer than twenty words. You can directly experiment with HTTP using a Telnet client to send your commands to a web server. For example, if your demonstration Sakai instance is running on port 8080, the following command will get you the login page: telnet localhost 8080 GET /portal/login The GET command does what it sounds like and gets a web page. Forms can use the GET verb to send data at the end of the URL. For example, GET /portal/login?name=alan&age=15 is sending the variables name=alan and age=15 to the server. Installing TCPMON You can use the TCPMON tool to view requests and responses from a web browser such as Firefox. One of TCPMON's abilities is that it can act as an invisible man in the middle, recording the messages between the web browser and the server. Once set up, the requests sent from the browser go to TCPMON and it passes the request on to the server. The server passes back a response and then TCPMON, a transparent proxy, returns the response to the web browser. This allows us to look at all requests and responses graphically. First, you can set up TCPMON to listenon a given port number—by convention, normally port 8888—and then you can configure your web browser to send its requests through the proxy. Then, you can type the address of a given page into the web browser, but instead of going directly to the relevant server, the browser sends the request to the proxy, which then passes it on and passes the response back. TCPMON displays both the request and the responses in a window. You can download TCPMON here. After downloading and unpacking, you can—from within the build directory—run either tcpmon.bat for the Windows environment or tcpmon.sh for the UNIX/Linux environment. To configure a proxy, you can click on the Admin tab and then set the Listen Port to 8888 and select the Proxy radio button. After that, clicking on Add will create a new tab, where the requests and responses will be displayed later. Your favorite web browser now has to recognize the newly-setup proxy. For Firefox 3, you can do this by selecting the menu option Edit/Preferences, and then choosing the Advanced tab and the Network tab, as shown in the next screenshot. You will need to set the proxy options, HTTP proxy to 127.0.0.1, and the port number to 8888. If you do this, you will need to ensure that the No proxies text input is blank. Clicking on the OK button enables the new settings. (Move the mouse over the image to enlarge.) To use the Proxy from within Internet Explorer 7 for a Local Area Network (LAN), you can edit the dialog box found under Tools | Internet Options | Connections | LAN settings. Once the proxy is working, typing http://localhost:8080/portal/login in the address bar will seamlessly return the login page of your local Sakai instance. Otherwise, you will see an error message similar to Proxy Server Refused Connection for Firefox or Internet Explorer cannot display the webpage. To turn off the proxy settings, simply select the No Proxies radio box and click on OK for Firefox 3, and for Internet Explorer 7, unselect the Use a proxy server for the LAN tick box and click on OK Requests and returned status codes When TCPMON is running a proxy on port 8888, it allows you to view the requests from the browser and the response in an extra tab, as shown in the following screenshot. Notice the extra information that the browser sends as part of the request. HTTP/1.1 defines the protocol and version level and the lines below GET are the header variables. The User-Agent defines which client sends the request. The Accept headers tell the server what the capabilities of the browser are, and the Cookie header defines the value stored in a cookie. HTTP is stateless, in principle; each response is based only on the current request. However, to get around this, persistent information can be stored in cookies. Web browsers normally store their representation of a cookie as a little text file or in a small database on the end users' computers. Sakai uses the supporting features of a servlet container, such as Tomcat, to maintain state in cookies. A cookie stores a session ID, and when the server sees the session ID, it can look up the request's server-side state. This state contains information such as whether the user is logged in, or what he or she has ordered. The web browser deletes the local representation of the cookie each time the browser closes. A cookie that is deleted when a web browser closes is known as a session cookie. The server response starts with the protocol followed by a status number. HTTP/1.1 200 OK tells the web browser that the server is using HTTP version 1.1 and was able to return the requested web page successfully. 2xx status codes imply success. 3xx status codes imply some form of redirection and tell the web browser where to try to pick up the requested resource. 4xx status codes are for client errors, such as malformed requests or lack of permission to obtain the resource. 4xx states are fertile grounds for security managers to look in log files for attempted hacking. 5xx status codes mostly have to do with a failure of the server itself and are mostly of interest to system administrators and programmers during the debugging cycle. In most cases, 5xx status numbers are about either high server load or a broken piece of code. Sakai is changing rapidly and even with the most vigorous testing, there are bound to be the occasional hiccups. You will find accurate details of the full range of status codes at: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html. Another important part of the response is Content-Type, which tells the web browser which type of material the response is returning, so the browser knows how to handle it. For example, the web browser may want to run a plug-in for video types and display text natively. Content-Length in characters is normally also given. After the header information is finished, there is a newline followed by the content itself. Web browsers interpret any redirects that are returned by sending extra requests. Web browsers also interpret any HTML pages and make multiple requests for resources such as JavaScript files and images. Modern browsers do not wait until the server returns all the requests, but render the HTML page live as the server returns the parts. The GET verb is not very efficient for posting a large amount of data, as the URL has a length limit of around 2000 characters. Further, the end user can see the form data, and the browser may encode entities such as spaces to make the URL unreadable. There is also a security aspect: if you are typing passwords in forms using GET, others may see your password or other details. This is not a good idea, especially at Internet Cafés where the next user who logs on can see the password in the browsing history. The POST verb is a better choice. Let us take as an example the Sakai demonstration login page (http://localhost:8080/portal/login). The login page itself contains a form tag that points to the relogin page with the POST method. <form method="post" action="http://localhost:8080/portal/relogin" enctype="application/x-www-form-urlencoded"> Note that the HTML tag also defines the content type. Key features of the POST request compared to GET are: The form values are stored as content after the header values There is a newline between the end of the header and the data The request mentions data and the amount of data by the use of the Content-Length header value The essential POST values for a login form with user admin (eid=admin) and password admin (pw=admin) will look like: POST http://localhost:8080/portal/relogin HTTP/1.1 Content-Type: application/x-www-form-urlencoded Content-Length: 31 eid=admin&pw=admin&submit=Login POST requests can contain much more information than GET requests, and the requests hide the values from the address bar of the web browser. This is not secure. The header is just as visible as the URL, so POST values are also neither hidden nor secure. The only viable solution is for your web browser to encrypt your transactions using SSL/TLS (http://www.ietf.org/rfc/rfc2246.txt) for security, and this occurs every time you connect to a server using an HTTPS URL. SOAP Sakai uses the Apache Axis framework, which the developers have configured to accept SOAP calls via POST. SOAP sends messages in a specific XML format with the Content-Type, otherwise known as MIME type, application/soap+xml. A programmer does not need to know more than that, as the client libraries take care of the majority of the excruciating low-level details. An example SOAP message generated by the Perl module, SOAP::Lite (http://www.soaplite.com/), for creating a login session in Sakai will look like the following POST data: <?xml version="1.0" encoding="UTF-8"?> <soap:Envelope soap_encodingStyle= "http://schemas.xmlsoap.org/soap/encoding/" > <c-gensym3 xsi_type="xsd:string">admin</c-gensym3> <c-gensym5 xsi_type="xsd:string">admin</c-gensym5> </login> </soap:Body> </soap:Envelope> There is an envelope with a body containing data for the service to consume. The important point to remember is that both the client and the server have to be able to parse the specific XML schema. SOAP messages can include extra security features, but Sakai does not require these. The architects expect organizations to encrypt web services using SSL/TSL. The last extra SOAP-related complexity is the Web Service Description Language (http://www.w3.org/TR/wsdl). Web services may change location or exist in multiple locations for redundancy. The service writer can define the location of the services and the data types involved with those services in another file, in XML format. JSON Also worth mentioning is JavaScript Object Notation (JSON), which is another popular format, passed using HTTP. When web developers realized that they could force browsers to load parts of a web page in at a time, it significantly improved the quality of the web browsing experience for the end user. This asynchronous loading enables all kinds of whiz-bang features, such as when you type in a search term and can choose from a set of search term completions before pressing on the Submit button. Asynchronous loading delivers more responsive and richer web pages that feel more like traditional desktop applications than a plain old web page. JSON is one of the formats of choice for passing asynchronous requests and responses. The asynchronous communication normally occurs through HTTP GET or POST, but with a specific content structure that is designed to be human readable and script language parser-friendly. JSON calls have the file extension .json as part of the URL. As mentioned in RFC 4627, an example image object communicated in JSON looks like: { "Image": { "Width": 800, "Height": 600, "Title": "View from 15th Floor", "Thumbnail": { "Url": "http://www.example.com/image/481989943", "Height": 125, "Width": "100" }, "IDs": [116, 943, 234, 38793] } } By confusing the boundaries between client and server, a lot of the presentation and business logic is locked on the client side in scripting languages such as JavaScript. The scripting language orchestrates the loading of parts of pages and the generation of widget sets. Frameworks such as jQuery (http://jquery.com/) and MyFaces (http://myfaces.apache.org/) significantly ease the client-side programming burden. REST To understand REST, you need to understand the other verbs in HTTP (http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html). The full HTTP set is OPTIONS, GET, HEAD, POST, PUT, DELETE, and TRACE. The HEAD verb returns from the server only the headers of the response without the content, and is useful for clients that want to see if the content has changed since the last request. PUT requests that the content in the request be stored at a particular location mentioned in the request. DELETE is for deleting the entity. REST uses the URL of the request to route to the resource, and the HTTP verb GET is used to get a resource, PUT to update, DELETE to delete, and POST to add a new resource. In general, POST request is for creating an item, PUT for updating an item, DELETE for deleting an item, and GET for returning information on the item. In SOAP, you are pointing directly towards the service the client calls or indirectly via the web service description. However, in REST, part of the URL describes the resource or resources you wish to work with. For example, a hypothetical address book application that lists all e-mail addresses in HTML format would look similar to the following: GET /email To list the addresses in XML format or JSON format: GET /email.xml GET /email.json To get the first e-mail address in the list: GET /email/1 To create a new e-mail address, of course remembering to add the rest of e-mail details to the end of the GET: POST /email In addition, to delete address 5 from the list use the following command: DELETE /email/5 To obtain address 5 in other formats such as JSON or XML, then use file extensions at the end of the URL, for example: GET /email/5.json GET /email/5.xml RESTful services are intuitively more descriptive than SOAP services, and they enable easy switching of the format from HTML to JSON to fuel the dynamic and asynchronous loading of websites. Due to the direct use of HTTP verbs by REST, this methodology also fits well with the most common application type: CRUD (Create, Read, Update, and Delete) applications, such as the site or user tools within Sakai. Now that we have discussed the theory, in the next section we shall discuss which Sakai-related SOAP services already exist.
Read more
  • 0
  • 0
  • 3366

article-image-customizing-default-expression-engine-v167-website-template
Packt
22 Oct 2009
5 min read
Save for later

Customizing the Default Expression Engine (v1.6.7) website template

Packt
22 Oct 2009
5 min read
Introduction The blogging revolution has changed the nature of the internet and as more and more individuals and organizations choose the blog format as the preferred web solution (i.e content driven websites which are regularly updated by one or more users). The question of which blogging application to use becomes an increasingly important one. I have tested all the popular blogging solutions and the one I have found to be the most impressive for designers with a little CSS and XHTML know-how is Expression Engine. After spending a little time on the Expression Engine support forums I have noticed that a high number of the support questions are relating to solving CSS problems within EE. This demonstrates that many designers who choose EE are used to working from within Adobe Dreamweaver (or other WYSWYG applications) and although there are no compatibility issues between the two systems, it is clear that they are making the transition from graphics based web design to lovely CSS web based design. When you are installing Expression Engine 1.6.7 for the first time, you are asked to choose a theme from a drop-down menu, which you may expect would offer you a satisfactory selection of pre-installed themes to choose from. Sadly this is not the case in Expression Engine (EE) version 1.6.7, and if you have not downloaded and manually saved a theme inside the right folder within the EE system file structure you will be given only one theme to choose from: the dreaded "default weblog theme". Once you complete the installation, no other themes can be imported or installed, so you can either restart the installation, opt to select the default weblog theme or start from a blank canvas (sorry about that). The good news is that the next release of EE (v.2.0) ships with a very nice default template, but the bad news is that you will have to wait a few months longer to get a copy and you will need to renew your license to upgrade for a fee.  Even then you will probably want to modify the new and improved default template in EE 2.0, or you may due to your needs opt to  choose another EE template altogether so that your website does not look like a thousand other websites based on the same template (not good). This article will demonstrate how to use Cascading Style Sheets (CSS) to improve the layout of the default EE website/blog template, and how to keep the structure (XHTML) and the presentation (CSS) entirely separate-which is best practice in web development. This article is intended as a practical guide to intermediate level web designers wanting to program with CSS more effectively within Expression Engine, and to create better websites, which adhere to current WC3 web standards. It will not attempt teach you the fundamentals of programming with CSS and XHTML or how to install, use or develop a website with EE. This article will demonstrate how to use CSS to effectively take control of the appearance of any EE template. If you are new to EE then it is recommended that you consult the book "Building a Website with Expression Engine 1.6.7", published by Packt Publishing and visiting www.expressionengine.com to consult the EE user guide. If you get stuck at any time when using Expression Engine you can visit the EE support forums via the man EE site to get help with EE, XHTML and CSS issues and for regular updates, the Elislab EE feed within EE is an excellent source of news from the EE community. The trouble with templates Lets open up EE’s default "weblog" template in Firefox. I have the very useful "developers toolbar" add-on installed. You can see the many options which are available lined-up across the bottom of Firefox’s toolbar. When you select "Outline > Outline Current Element", Firefox renders an outline around the block of the element and is set by default to display the name of the selected element. This add-on features many other timesaving and task-facilitating functions, which range from DOM tools for JavaScript development to nifty layout tools like displaying a ruler inside the Firefox window. The default template is a useful guide to some basic EE tags embedded into the XHTML, but the CSS should render a more clean and simple to customize design-so lets make some much needed changes. I will not be looking onto the EE tags in this article because EE tags are very powerful and are beyond the scope of this article. Inside the template module create a new template group and call it "site_extended". Template groups in EE organize your templates into virtual folders. We will make a copy of the existing template group and templates so all the changes we make are non-destructive. Choose, do not duplicate any existing template groups, but select "Make the index template in this group your site's home page?" and press submit. Easy. Next create a new template and call it "site_extended_css" and lets duplicate the site/site_css template. This powerful feature instructs EE to clone an existing template with a new name and location. Now let's create a copy of the default site weblog and call it "index_extended". Select "duplicate an existing template", choose "site/index" from the options drop-down list. The first part of the location is "site"/ being the template group and the site/"index" the actual template. Now your template management tab should look like: Notice that the index template has a red star next to it.
Read more
  • 0
  • 0
  • 3366

article-image-managing-posts-wordpress-plugin
Packt
14 Oct 2009
8 min read
Save for later

Managing Posts with WordPress Plugin

Packt
14 Oct 2009
8 min read
Programming the Manage panel The Manage Posts screen can be changed to show extra columns, or remove unwanted columns in the listing. Let's say that we want to show the post type—Normal, Photo, or Link. Remember the custom field post-type that we added to our posts? We can use it now to differentiate post types. Time for action – Add post type column in the Manage panel We want to add a new column to the Manage panel, and we will call it Type. The value of the column will represent the post type—Normal, Photo, or Link. Expand the admin_menu() function to load the function to handle Manage Page hooks: add_submenu_page('post-new.php', __('Add URL',$this->plugin_domain) , __('URL', $this->plugin_domain) , 1 ,'add-url', array(&$this, 'display_form') );// handle Manage page hooksadd_action('load-edit.php', array(&$this, 'handle_load_edit') );} Add the hooks to the columns on the Manage screen: // Manage page hooksfunction handle_load_edit(){ // handle Manage screen functions add_filter('manage_posts_columns', array(&$this, 'handle_posts_columns')); add_action('manage_posts_custom_column', array(&$this, 'handle_posts_custom_column'), 10, 2);} Then implement the function to add a new Column, remove the author and replace the date with our date format: // Handle Column headerfunction handle_posts_columns($columns){ // add 'type' column $columns['type'] = __('Type',$this->plugin_domain); return $columns;} For date key replacement, we need an extra function:     function array_change_key_name( $orig, $new, &$array ){ foreach ( $array as $k => $v ) $return[ ( $k === $orig ) ? $new : $k ] = $v; return ( array ) $return;} And finally, insert a function to handle the display of information in that column: // Handle Type column displayfunction handle_posts_custom_column($column_name, $id){ // 'type' column handling based on post type if( $column_name == 'type' ) { $type=get_post_meta($id, 'post-type', true); echo $type ? $type : __('Normal',$this->plugin_domain); }} Don't forget to add the Manage page to the list of localized pages: // pages where our plugin needs translation$local_pages=array('plugins.php', 'post-new.php', 'edit.php');if (in_array($pagenow, $local_pages)) As a result, we now have a new column that displays the post type using information from a post custom field. What just happened? We have used the load-edit.php action to specify that we want our hooks to be assigned only on the Manage Posts page (edit.php). This is similar to the optimization we did when we loaded the localization files. The handle_posts_columns is a filter that accepts the columns as a parameter and allows you to insert a new column: function handle_posts_columns($columns){ $columns['type'] = __('Type',$this->plugin_domain); return $columns;} You are also able to remove a column. This example would remove the Author column: unset($columns['author']); To handle information display in that column, we use the handle_posts_custom_column action. The action is called for each entry (post), whenever an unknown column is encountered. WordPress passes the name of the column and current post ID as parameters. That allows us to extract the post type from a custom field: function handle_posts_custom_column($column_name, $id){ if( $column_name == 'type' ) { $type=get_post_meta($id, 'post-type', true); It also allows us to print it out: echo $type ? $type : __('Normal',$this->plugin_domain); }} Modifying an existing column We can also modify an existing column. Let's say we want to change the way Date is displayed. Here are the changes we would make to the code: // Handle Column headerfunction handle_posts_columns($columns){ // add 'type' column $columns['type'] = __('Type',$this->plugin_domain); // remove 'author' column //unset($columns['author']); // change 'date' column $columns = $this->array_change_key_name( 'date', 'date_new', $columns ); return $columns;}// Handle Type column displayfunction handle_posts_custom_column($column_name, $id){ // 'type' column handling based on post type if( $column_name == 'type' ) { $type=get_post_meta($id, 'post-type', true); echo $type ? $type : __('Normal',$this->plugin_domain); } // new date column handling if( $column_name == 'date_new' ) { the_time('Y-m-d <br > g:i:s a'); } }function array_change_key_name( $orig, $new, &$array ){ foreach ( $array as $k => $v ) $return[ ( $k === $orig ) ? $new : $k ] = $v; return ( array ) $return;} The example replaces the date column with our own date_new column and uses it to display the date with our preferred formatting. Manage screen search filter WordPress allows us to show all the posts by date and category, but what if we want to show all the posts depending on post type? No problem! We can add a new filter select box straight to the Manage panel. Time for action – Add a search filter box Let's start by adding two more hooks to the handle_load_edit() function. The restrict_manage_posts function draws the search box and the posts_where alters the database query to select only the posts of the type we want to show. // Manage page hooksfunction handle_load_edit(){ // handle Manage screen functions add_filter('manage_posts_columns', array(&$this, 'handle_posts_columns')); add_action('manage_posts_custom_column', array(&$this, 'handle_posts_custom_column'), 10, 2); // handle search box filter add_filter('posts_where', array(&$this, 'handle_posts_where')); add_action('restrict_manage_posts', array(&$this, 'handle_restrict_manage_posts'));} Let's write the corresponding function to draw the select box: // Handle select box for Manage pagefunction handle_restrict_manage_posts(){ ?> <select name="post_type" id="post_type" class="postform"> <option value="0">View all types</option> <option value="normal" <?php if( $_GET['post_type']=='normal') echo 'selected="selected"' ?>><?php _e ('Normal',$this->plugin_domain); ?></option> <option value="photo" <?php if( $_GET['post_type']=='photo') echo 'selected="selected"' ?>><?php _e ('Photo',$this->plugin_domain); ?></option> <option value="link" <?php if( $_GET['post_type']=='link') echo 'selected="selected"' ?>><?php _e ('Link',$this->plugin_domain); ?></option> </select> <?php} And finally, we need a function that will change the query to retrieve only the posts of the selected type: // Handle query for Manage pagefunction handle_posts_where($where){ global $wpdb; if( $_GET['post_type'] == 'photo' ) { $where .= " AND ID IN (SELECT post_id FROM {$wpdb->postmeta} WHERE meta_key='post-type' AND metavalue='".__ ('Photo',$this->plugin_domain)."' )"; } else if( $_GET['post_type'] == 'link' ) { $where .= " AND ID IN (SELECT post_id FROM {$wpdb->postmeta} WHERE meta_key='post-type' AND metavalue='".__ ('Link',$this->plugin_domain)."' )"; } else if( $_GET['post_type'] == 'normal' ) { $where .= " AND ID NOT IN (SELECT post_id FROM {$wpdb->postmeta} WHERE meta_key='post-type' )"; } return $where;} What just happened? We have added a new select box to the header of the Manage panel. It allows us to filter the post types we want to show. We added the box using the restrict_manage_posts action that is triggered at the end of the Manage panel header and allows us to insert HTML code, which we used to draw a select box. To actually perform the filtering, we use the posts_where filter, which is run when a query is made to fetch the posts from the database. if( $_GET['post_type'] == 'photo' ){ $where .= " AND ID IN (SELECT post_id FROM {$wpdb->postmeta} WHERE meta_key='post-type' AND metavalue='".__ ('Photo',$this->plugin_domain)."' )"; If a photo is selected, we inspect the WordPress database postmeta table and select posts that have the post-type key with the value, Photo. At this point, we have a functional plugin. What we can do further to improve it is to add user permissions checks, so that only those users allowed to write posts and upload files are allowed to use it. Quick referencemanage_posts_columns($columns): This acts as a filter for adding/removing columns in the Manage Posts panel. Similarly, we use the function, manage_pages_columns for the Manage Pages panel.manage_posts_custom_column($column, $post_id): This acts as an action to display information for the given column and post. Alternatively, manage_pages_custom_column for Manage Pages panel.posts_where($where): This acts as a filter for the where clause in the query that gets the posts.restrict_manage_posts: This acts as an action that runs at the end of the Manage panel header and allows you to insert HTML.
Read more
  • 0
  • 0
  • 3365

article-image-adding-authentication
Packt
23 Jan 2015
15 min read
Save for later

Adding Authentication

Packt
23 Jan 2015
15 min read
This article written by Mat Ryer, the author of Go Programming Blueprints, is focused on high-performance transmission of messages from the clients to the server and back again, but our users have no way of knowing who they are talking to. One solution to this problem is building of some kind of signup and login functionality and letting our users create accounts and authenticate themselves before they can open the chat page. (For more resources related to this topic, see here.) Whenever we are about to build something from scratch, we must ask ourselves how others have solved this problem before (it is extremely rare to encounter genuinely original problems), and whether any open solutions or standards already exist that we can make use of. Authorization and authentication are hardly new problems, especially in the world of the Web, with many different protocols out there to choose from. So how do we decide the best option to pursue? As always, we must look at this question from the point of view of the user. A lot of websites these days allow you to sign in using your accounts existing elsewhere on a variety of social media or community websites. This saves users the tedious job of entering all their account information over and over again as they decide to try out different products and services. It also has a positive effect on the conversion rates for new sites. In this article, we will enhance our chat codebase to add authentication, which will allow our users to sign in using Google, Facebook, or GitHub and you'll see how easy it is to add other sign-in portals too. In order to join the chat, users must first sign in. Following this, we will use the authorized data to augment our user experience so everyone knows who is in the room, and who said what. In this article, you will learn to: Use the decorator pattern to wrap http.Handler types to add additional functionality to handlers Serve HTTP endpoints with dynamic paths Use the Gomniauth open source project to access authentication services Get and set cookies using the http package Encode objects as Base64 and back to normal again Send and receive JSON data over a web socket Give different types of data to templates Work with channels of your own types Handlers all the way down For our chat application, we implemented our own http.Handler type in order to easily compile, execute, and deliver HTML content to browsers. Since this is a very simple but powerful interface, we are going to continue to use it wherever possible when adding functionality to our HTTP processing. In order to determine whether a user is authenticated, we will create an authentication wrapper handler that performs the check, and passes execution on to the inner handler only if the user is authenticated. Our wrapper handler will satisfy the same http.Handler interface as the object inside it, allowing us to wrap any valid handler. In fact, even the authentication handler we are about to write could be later encapsulated inside a similar wrapper if needed. Diagram of a chaining pattern when applied to HTTP handlers The preceding figure shows how this pattern could be applied in a more complicated HTTP handler scenario. Each object implements the http.Handler interface, which means that object could be passed into the http.Handle method to directly handle a request, or it can be given to another object, which adds some kind of extra functionality. The Logging handler might write to a logfile before and after the ServeHTTP method is called on the inner handler. Because the inner handler is just another http.Handler, any other handler can be wrapped in (or decorated with) the Logging handler. It is also common for an object to contain logic that decides which inner handler should be executed. For example, our authentication handler will either pass the execution to the wrapped handler, or handle the request itself by issuing a redirect to the browser. That's plenty of theory for now; let's write some code. Create a new file called auth.go in the chat folder: package main import ( "net/http" ) type authHandler struct { next http.Handler } func (h *authHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { if _, err := r.Cookie("auth"); err == http.ErrNoCookie { // not authenticated w.Header().Set("Location", "/login") w.WriteHeader(http.StatusTemporaryRedirect) } else if err != nil { // some other error panic(err.Error()) } else { // success - call the next handler h.next.ServeHTTP(w, r) } } func MustAuth(handler http.Handler) http.Handler { return &authHandler{next: handler} } The authHandler type not only implements the ServeHTTP method (which satisfies the http.Handler interface) but also stores (wraps) http.Handler in the next field. Our MustAuth helper function simply creates authHandler that wraps any other http.Handler. This is the pattern in general programming practice that allows us to easily add authentication to our code in main.go. Let us tweak the following root mapping line: http.Handle("/", &templateHandler{filename: "chat.html"}) Let us change the first argument to make it explicit about the page meant for chatting. Next, let's use the MustAuth function to wrap templateHandler for the second argument: http.Handle("/chat", MustAuth(&templateHandler{filename: "chat.html"})) Wrapping templateHandler with the MustAuth function will cause execution to run first through our authHandler, and only to templateHandler if the request is authenticated. The ServeHTTP method in our authHandler will look for a special cookie called auth, and use the Header and WriteHeader methods on http.ResponseWriter to redirect the user to a login page if the cookie is missing. Build and run the chat application and try to hit http://localhost:8080/chat: go build -o chat ./chat -host=":8080" You need to delete your cookies to clear out previous auth tokens, or any other cookies that might be left over from other development projects served through localhost. If you look in the address bar of your browser, you will notice that you are immediately redirected to the /login page. Since we cannot handle that path yet, you'll just get a 404 page not found error. Making a pretty social sign-in page There is no excuse for building ugly apps, and so we will build a social sign-in page that is as pretty as it is functional. Bootstrap is a frontend framework used to develop responsive projects on the Web. It provides CSS and JavaScript code that solve many user-interface problems in a consistent and good-looking way. While sites built using Bootstrap all tend to look the same (although there are plenty of ways in which the UI can be customized), it is a great choice for early versions of apps, or for developers who don't have access to designers. If you build your application using the semantic standards set forth by Bootstrap, it becomes easy for you to make a Bootstrap theme for your site or application and you know it will slot right into your code. We will use the version of Bootstrap hosted on a CDN so we don't have to worry about downloading and serving our own version through our chat application. This means that in order to render our pages properly, we will need an active Internet connection, even during development. If you prefer to download and host your own copy of Bootstrap, you can do so. Keep the files in an assets folder and add the following call to your main function (it uses http.Handle to serve the assets via your application): http.Handle("/assets/", http.StripPrefix("/assets", http.FileServer(http.Dir("/path/to/assets/")))) Notice how the http.StripPrefix and http.FileServer functions return objects that satisfy the http.Handler interface as per the decorator pattern that we implement with our MustAuth helper function. In main.go, let's add an endpoint for the login page: http.Handle("/chat", MustAuth(&templateHandler{filename: "chat.html"})) http.Handle("/login", &templateHandler{filename: "login.html"}) http.Handle("/room", r) Obviously, we do not want to use the MustAuth method for our login page because it will cause an infinite redirection loop. Create a new file called login.html inside our templates folder, and insert the following HTML code: <html> <head> <title>Login</title> <link rel="stylesheet" href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css"> </head> <body> <div class="container"> <div class="page-header"> <h1>Sign in</h1> </div> <div class="panel panel-danger"> <div class="panel-heading"> <h3 class="panel-title">In order to chat, you must be signed in</h3> </div> <div class="panel-body"> <p>Select the service you would like to sign in with:</p> <ul> <li> <a href="/auth/login/facebook">Facebook</a> </li> <li> <a href="/auth/login/github">GitHub</a> </li> <li> <a href="/auth/login/google">Google</a> </li> </ul> </div> </div> </div> </body> </html> Restart the web server and navigate to http://localhost:8080/login. You will notice that it now displays our sign-in page: Endpoints with dynamic paths Pattern matching for the http package in the Go standard library isn't the most comprehensive and fully featured implementation out there. For example, Ruby on Rails makes it much easier to have dynamic segments inside the path: "auth/:action/:provider_name" This then provides a data map (or dictionary) containing the values that it automatically extracted from the matched path. So if you visit auth/login/google, then params[:provider_name] would equal google, and params[:action] would equal login. The most the http package lets us specify by default is a path prefix, which we can do by leaving a trailing slash at the end of the pattern: "auth/" We would then have to manually parse the remaining segments to extract the appropriate data. This is acceptable for relatively simple cases, which suits our needs for the time being since we only need to handle a few different paths such as: /auth/login/google /auth/login/facebook /auth/callback/google /auth/callback/facebook If you need to handle more advanced routing situations, you might want to consider using dedicated packages such as Goweb, Pat, Routes, or mux. For extremely simple cases such as ours, the built-in capabilities will do. We are going to create a new handler that powers our login process. In auth.go, add the following loginHandler code: // loginHandler handles the third-party login process. // format: /auth/{action}/{provider} func loginHandler(w http.ResponseWriter, r *http.Request) { segs := strings.Split(r.URL.Path, "/") action := segs[2] provider := segs[3] switch action { case "login": log.Println("TODO handle login for", provider) default: w.WriteHeader(http.StatusNotFound) fmt.Fprintf(w, "Auth action %s not supported", action) } } In the preceding code, we break the path into segments using strings.Split before pulling out the values for action and provider. If the action value is known, we will run the specific code, otherwise we will write out an error message and return an http.StatusNotFound status code (which in the language of HTTP status code, is a 404 code). We will not bullet-proof our code right now but it's worth noticing that if someone hits loginHandler with too few segments, our code will panic because it expects segs[2] and segs[3] to exist. For extra credit, see whether you can protect against this and return a nice error message instead of a panic if someone hits /auth/nonsense. Our loginHandler is only a function and not an object that implements the http.Handler interface. This is because, unlike other handlers, we don't need it to store any state. The Go standard library supports this, so we can use the http.HandleFunc function to map it in a way similar to how we used http.Handle earlier. In main.go, update the handlers: http.Handle("/chat", MustAuth(&templateHandler{filename: "chat.html"})) http.Handle("/login", &templateHandler{filename: "login.html"}) http.HandleFunc("/auth/", loginHandler) http.Handle("/room", r) Rebuild and run the chat application: go build –o chat ./chat –host=":8080" Hit the following URLs and notice the output logged in the terminal: http://localhost:8080/auth/login/google outputs TODO handle login for google http://localhost:8080/auth/login/facebook outputs TODO handle login for facebook We have successfully implemented a dynamic path-matching mechanism that so far just prints out TODO messages; we need to integrate with authentication services in order to make our login process work. OAuth2 OAuth2 is an open authentication and authorization standard designed to allow resource owners to give clients delegated access to private data (such as wall posts or tweets) via an access token exchange handshake. Even if you do not wish to access the private data, OAuth2 is a great option that allows people to sign in using their existing credentials, without exposing those credentials to a third-party site. In this case, we are the third party and we want to allow our users to sign in using services that support OAuth2. From a user's point of view, the OAuth2 flow is: A user selects provider with whom they wish to sign in to the client app. The user is redirected to the provider's website (with a URL that includes the client app ID) where they are asked to give permission to the client app. The user signs in from the OAuth2 service provider and accepts the permissions requested by the third-party application. The user is redirected back to the client app with a request code. In the background, the client app sends the grant code to the provider, who sends back an auth token. The client app uses the access token to make authorized requests to the provider, such as to get user information or wall posts. To avoid reinventing the wheel, we will look at a few open source projects that have already solved this problem for us. Open source OAuth2 packages Andrew Gerrand has been working on the core Go team since February 2010, that is two years before Go 1.0 was officially released. His goauth2 package (see https://code.google.com/p/goauth2/) is an elegant implementation of the OAuth2 protocol written entirely in Go. Andrew's project inspired Gomniauth (see https://github.com/stretchr/gomniauth). An open source Go alternative to Ruby's omniauth project, Gomniauth provides a unified solution to access different OAuth2 services. In the future, when OAuth3 (or whatever next-generation authentication protocol it is) comes out, in theory, Gomniauth could take on the pain of implementing the details, leaving the user code untouched. For our application, we will use Gomniauth to access OAuth services provided by Google, Facebook, and GitHub, so make sure you have it installed by running the following command: go get github.com/stretchr/gomniauth Some of the project dependencies of Gomniauth are kept in Bazaar repositories, so you'll need to head over to http://wiki.bazaar.canonical.com to download them. Tell the authentication providers about your app Before we ask an authentication provider to help our users sign in, we must tell them about our application. Most providers have some kind of web tool or console where you can create applications to kick this process. Here's one from Google: In order to identify the client application, we need to create a client ID and secret. Despite the fact that OAuth2 is an open standard, each provider has their own language and mechanism to set things up, so you will most likely have to play around with the user interface or the documentation to figure it out in each case. At the time of writing this, in Google Developer Console , you navigate to APIs & auth | Credentials and click on the Create new Client ID button. In most cases, for added security, you have to be explicit about the host URLs from where requests will come. For now, since we're hosting our app locally on localhost:8080, you should use that. You will also be asked for a redirect URI that is the endpoint in our chat application and to which the user will be redirected after successfully signing in. The callback will be another action on our loginHandler, so the redirection URL for the Google client will be http://localhost:8080/auth/callback/google. Once you finish the authentication process for the providers you want to support, you will be given a client ID and secret for each provider. Make a note of these, because we will need them when we set up the providers in our chat application. If we host our application on a real domain, we have to create new client IDs and secrets, or update the appropriate URL fields on our authentication providers to ensure that they point to the right place. Either way, it's not bad practice to have a different set of development and production keys for security. Summary This article shows how to add OAuth to our chat application so that we can keep track of who is saying what, but let them log in using Google, Facebook, or GitHub. We also learned how to use handlers for efficient coding. This article also thought us how to make a pretty social sign-in page. Resources for Article: Further resources on this subject: WebSockets in Wildfly [article] Using Socket.IO and Express together [article] The Importance of Securing Web Services [article]
Read more
  • 0
  • 0
  • 3363
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-consistency-conflicts
Packt
10 Aug 2016
11 min read
Save for later

Consistency Conflicts

Packt
10 Aug 2016
11 min read
In this article by Robert Strickland, author of the book Cassandra 3.x High Availability - Second Edition, we will discuss how for any given call, it is possible to achieve either strong consistency or eventual consistency. In the former case, we can know for certain that the copy of the data that Cassandra returns will be the latest. In the case of eventual consistency, the data returned may or may not be the latest, or there may be no data returned at all if the node is unaware of newly inserted data. Under eventual consistency, it is also possible to see deleted data if the node you're reading from has not yet received the delete request. (For more resources related to this topic, see here.) Depending on the read_repair_chance setting and the consistency level chosen for the read operation, Cassandra might block the client and resolve the conflict immediately, or this might occur asynchronously. If data in conflict is never requested, the system will resolve the conflict the next time nodetool repair is run. How does Cassandra know there is a conflict? Every column has three parts: key, value, and timestamp. Cassandra follows last-write-wins semantics, which means that the column with the latest timestamp always takes precedence. Now, let's discuss one of the most important knobs a developer can turn to determine the consistency characteristics of their reads and writes. Consistency levels On every read and write operation, the caller must specify a consistency level, which lets Cassandra know what level of consistency to guarantee for that one call. The following table details the various consistency levels and their effects on both read and write operations: Consistency level Reads Writes ANY This is not supported for reads. Data must be written to at least one node, but permits writes via hinted handoff. Effectively allows a write to any node, even if all nodes containing the replica are down. A subsequent read might be impossible if all replica nodes are down. ONE The replica from the closest node will be returned. Data must be written to at least one replica node (both commit log and memtable). Unlike ANY, hinted handoff writes are not sufficient. TWO The replicas from the two closest nodes will be returned. The same as ONE, except two replicas must be written. THREE The replicas from the three closest nodes will be returned. The same as ONE, except three replicas must be written. QUORUM Replicas from a quorum of nodes will be compared, and the replica with the latest timestamp will be returned. Data must be written to a quorum of replica nodes (both commit log and memtable) in the entire cluster, including all data centers. SERIAL Permits reading uncommitted data as long as it represents the current state. Any uncommitted transactions will be committed as part of the read. Similar to QUORUM, except that writes are conditional based on the support for lightweight transactions. LOCAL_ONE Similar to ONE, except that the read will be returned by the closest replica in the local data center. Similar to ONE, except that the write must be acknowledged by at least one node in the local data center. LOCAL_QUORUM Similar to QUORUM, except that only replicas in the local data center are compared. Similar to QUORUM, except the quorum must only be met using the local data center. LOCAL_SERIAL Similar to SERIAL, except only local replicas are used. Similar to SERIAL, except only writes to local replicas must be acknowledged. EACH_QUORUM The opposite of LOCAL_QUORUM; requires each data center to produce a quorum of replicas, then returns the replica with the latest timestamp. The opposite of LOCAL_QUORUM; requires a quorum of replicas to be written in each data center. ALL Replicas from all nodes in the entire cluster (including all data centers) will be compared, and the replica with the latest timestamp will be returned. Data must be written to all replica nodes (both commit log and memtable) in the entire cluster, including all data centers. As you can see, there are numerous combinations of read and write consistency levels, all with different ultimate consistency guarantees. To illustrate this point, let's assume that you would like to guarantee absolute consistency for all read operations. On the surface, it might seem as if you would have to read with a consistency level of ALL, thus sacrificing availability in the case of node failure. But there are alternatives depending on your use case. There are actually two additional ways to achieve strong read consistency: Write with consistency level of ALL: This has the advantage of allowing the read operation to be performed using ONE, which lowers the latency for that operation. On the other hand, it means the write operation will result in UnavailableException if one of the replica nodes goes offline. Read and write with QUORUM or LOCAL_QUORUM: Since QUORUM and LOCAL_QUORUM both require a majority of nodes, using this level for both the write and the read will result in a full consistency guarantee (in the same data center when using LOCAL_QUORUM), while still maintaining availability during a node failure. You should carefully consider each use case to determine what guarantees you actually require. For example, there might be cases where a lost write is acceptable, or occasions where a read need not be absolutely current. At times, it might be sufficient to write with a level of QUORUM, then read with ONE to achieve maximum read performance, knowing you might occasionally and temporarily return stale data. Cassandra gives you this flexibility, but it's up to you to determine how to best employ it for your specific data requirements. A good rule of thumb to attain strong consistency is that the read consistency level plus write consistency level should be greater than the replication factor. If you are unsure about which consistency levels to use for your specific use case, it's typically safe to start with LOCAL_QUORUM (or QUORUM for a single data center) reads and writes. This configuration offers strong consistency guarantees and good performance while allowing for the inevitable replica failure. It is important to understand that even if you choose levels that provide less stringent consistency guarantees, Cassandra will still perform anti-entropy operations asynchronously in an attempt to keep replicas up to date. Repairing data Cassandra employs a multifaceted anti-entropy mechanism that keeps replicas in sync. Data repair operations generally fall into three categories: Synchronous read repair: When a read operation requires comparing multiple replicas, Cassandra will initially request a checksum from the other nodes. If the checksum doesn't match, the full replica is sent and compared with the local version. The replica with the latest timestamp will be returned and the old replica will be updated. This means that in normal operations, old data is repaired when it is requested. Asynchronous read repair: Each table in Cassandra has a setting called read_repair_chance (as well as its related setting, dclocal_read_repair_chance), which determines how the system treats replicas that are not compared during a read. The default setting of 0.1 means that 10 percent of the time, Cassandra will also repair the remaining replicas during read operations. Manually running repair: A full repair (using nodetool repair) should be run regularly to clean up any data that has been missed as part of the previous two operations. At a minimum, it should be run once every gc_grace_seconds, which is set in the table schema and defaults to 10 days. One might ask what the consequence would be of failing to run a repair operation within the window specified by gc_grace_seconds. The answer relates to Cassandra's mechanism to handle deletes. As you might be aware, all modifications (or mutations) are immutable, so a delete is really just a marker telling the system not to return that record to any clients. This marker is called a tombstone. Cassandra performs garbage collection on data marked by a tombstone each time a compaction occurs. If you don't run the repair, you risk deleted data reappearing unexpectedly. In general, deletes should be avoided when possible as the unfettered buildup of tombstones can cause significant issues. In the course of normal operations, Cassandra will repair old replicas when their records are requested. Thus, it can be said that read repair operations are lazy, such that they only occur when required. With all these options for replication and consistency, it can seem daunting to choose the right combination for a given use case. Let's take a closer look at this balance to help bring some additional clarity to the topic. Balancing the replication factor with consistency There are many considerations when choosing a replication factor, including availability, performance, and consistency. Since our topic is high availability, let's presume your desire is to maintain data availability in the case of node failure. It's important to understand exactly what your failure tolerance is, and this will likely be different depending on the nature of the data. The definition of failure is probably going to vary among use cases as well, as one case might consider data loss a failure, whereas another accepts data loss as long as all queries return. Achieving the desired availability, consistency, and performance targets requires coordinating your replication factor with your application's consistency level configurations. In order to assist you in your efforts to achieve this balance, let's consider a single data center cluster of 10 nodes and examine the impact of various configuration combinations (where RF corresponds to the replication factor): RF Write CL Read CL Consistency Availability Use cases 1 ONE QUORUM ALL ONE QUORUM ALL Consistent Doesn't tolerate any replica loss Data can be lost and availability is not critical, such as analysis clusters 2 ONE ONE Eventual Tolerates loss of one replica Maximum read performance and low write latencies are required, and sometimes returning stale data is acceptable 2 QUORUM ALL ONE Consistent Tolerates loss of one replica on reads, but none on writes Read-heavy workloads where some downtime for data ingest is acceptable (improves read latencies) 2 ONE QUORUM ALL Consistent Tolerates loss of one replica on writes, but none on reads Write-heavy workloads where read consistency is more important than availability 3 ONE ONE Eventual Tolerates loss of two replicas Maximum read and write performance are required, and sometimes returning stale data is acceptable 3 QUORUM ONE Eventual Tolerates loss of one replica on write and two on reads Read throughput and availability are paramount, while write performance is less important, and sometimes returning stale data is acceptable 3 ONE QUORUM Eventual Tolerates loss of two replicas on write and one on reads Low write latencies and availability are paramount, while read performance is less important, and sometimes returning stale data is acceptable 3 QUORUM QUORUM Consistent Tolerates loss of one replica Consistency is paramount, while striking a balance between availability and read/write performance 3 ALL ONE Consistent Tolerates loss of two replicas on reads, but none on writes Additional fault tolerance and consistency on reads is paramount at the expense of write performance and availability 3 ONE ALL Consistent Tolerates loss of two replicas on writes, but none on reads Low write latencies and availability are paramount, but read consistency must be guaranteed at the expense of performance and availability 3 ANY ONE Eventual Tolerates loss of all replicas on write and two on read Maximum write and read performance and availability are paramount, and often returning stale data is acceptable (note that hinted writes are less reliable than the guarantees offered at CL ONE) 3 ANY QUORUM Eventual Tolerates loss of all replicas on write and one on read Maximum write performance and availability are paramount, and sometimes returning stale data is acceptable 3 ANY ALL Consistent Tolerates loss of all replicas on writes, but none on reads Write throughput and availability are paramount, and clients must all see the same data, even though they might not see all writes immediately There are also two additional consistency levels, SERIAL and LOCAL_SERIAL, which can be used to read the latest value, even if it is part of an uncommitted transaction. Otherwise, they follow the semantics of QUORUM and LOCAL_QUORUM, respectively. As you can see, there are numerous possibilities to consider when choosing these values, especially in a scenario involving multiple data centers. This discussion will give you greater confidence as you design your applications to achieve the desired balance. Summary In this article, we introduced the foundational concept of consistency. In our discussion, we outlined the importance of the relationship between replication factor and consistency level, and their impact on performance, data consistency, and availability. Resources for Article: Further resources on this subject: Cassandra Design Patterns [Article] Cassandra Architecture [Article] About Cassandra [Article]
Read more
  • 0
  • 0
  • 3362

article-image-hibernate-types
Packt
27 Nov 2009
3 min read
Save for later

Hibernate Types

Packt
27 Nov 2009
3 min read
Hibernate allows transparent persistence, which means the application is absolutely isolated from the underlying database storage format. Three players in the Hibernate scene implement this feature: Hibernate dialect, Hibernate types, and HQL. The Hibernate dialect allows us to use a range of different databases, supporting different, proprietary variants of SQL and column types. In addition, HQL allows us to query persisted objects, regardless of their relational persisted form in the database. Hibernate types are a representation of databases SQL types, provide an abstraction of the underlying database types, and prevent the application from getting involved with the actual database column types. They allow us to develop the application without worrying about the target database and the column types that the database supports. Instead, we get involved with mapping Java types to Hibernate types. The database dialect, as part of Hibernate, is responsible for transforming Java types to SQL types, based on the target database. This gives us the flexibility to change the database to one that may support different column types or SQL without changing the application code. Built-in types Hibernate includes a rich and powerful range of built-in types. These types satisfy most needs of a typical application, providing a bridge between basic Java types and common SQL types. Java types mapped with these types range from basic, simple types, such as long and int, to large and complex types, such as Blob and Clob. The following table categorizes Hibernate built-in types with corresponding Java and SQL types: Java Type Hibernate Type Name SQL Type Primitives Boolean or boolean boolean BIT true_false CHAR(1)('T'or'F') yes_no CHAR(1)('Y'or'N') Byte or byte byte TINYINT char or Character character CHAR double or Double double DOUBLE float or float float FLOAT int or Integer integer INTEGER long or Long long BIGINT short or Short short SMALLINT String java.lang.String string VARCHAR character CHAR(1) text CLOB Arbitrary Precision Numeric java.math.BigDecimal big_decimal NUMERIC Byte Array byte[] or Byte[] binary VARBINARY   Time and Date java.util.Date date DATE time TIME timestamp TIMESTAMP java.util.Calendar calendar TIMESTAMP calendar_date DATE java.sql.Date date DATE java.sql.Time time TIME java.sql.Timestamp timestamp TIMESTAMP Localization java.util.Locale locale VARCHAR java.util.TimeZone timezone java.util.Currency currency Class Names java.lang.Class class VARCHAR Any Serializable Object java.io.Serializable Serializable VARBINARY JDBC Large Objects java.sql.Blob blob BLOB java.sql.Clob clob CLOB
Read more
  • 0
  • 0
  • 3361

article-image-author-podcast-aleksander-seovic-talks-about-oracle-coherence-35
Packt
17 Feb 2010
1 min read
Save for later

Author Podcast - Aleksander Seovic Talks About Oracle Coherence 3.5

Packt
17 Feb 2010
1 min read
Aleksander Seovic is the author of Oracle Coherence 3.5, which will help you to design and build scalable, reliable, high-performance applications using software of the same name. The book is due out in March, but you can get a flavour of it in his interview with Cameron Purdy, below. For more information on Aleksander's book, visit: http://www.packtpub.com/oracle-coherence-3-5/book. Listen Here      
Read more
  • 0
  • 0
  • 3361

article-image-unsupervised-learning
Packt
28 Sep 2016
11 min read
Save for later

Unsupervised Learning

Packt
28 Sep 2016
11 min read
In this article by Bastiaan Sjardin, Luca Massaron, and Alberto Boschetti, the authors of the book Large Scale Machine Learning with Python, we will try to create new features and variables at scale in the observation matrix. We will introduce the unsupervised methods and illustrate principal component analysis (PCA)—an effective way to reduce the number of features. (For more resources related to this topic, see here.) Unsupervised methods Unsupervised learning is a branch of machine learning whose algorithms reveal inferences from data without an explicit label (unlabeled data). The goal of such techniques is to extract hidden patterns and group similar data. In these algorithms, the unknown parameters of interests of each observation (the group membership and topic composition, for instance) are often modeled as latent variables (or a series of hidden variables), hidden in the system of observed variables that cannot be observed directly, but only deduced from the past and present outputs of the system. Typically, the output of the system contains noise, which makes this operation harder. In common problems, unsupervised methods are used in two main situations: With labeled datasets to extract additional features to be processed by the classifier/regressor down to the processing chain. Enhanced by additional features, they may perform better. With labeled or unlabeled datasets to extract some information about the structure of the data. This class of algorithms is commonly used during the Exploratory Data Analysis (EDA) phase of the modeling. First at all, before starting with our illustration, let's import the modules that will be necessary along the article in our notebook: In : import matplotlib import numpy as np import pandas as pd import matplotlib.pyplot as plt from matplotlib import pylab %matplotlib inline import matplotlib.cm as cm import copy import tempfile import os   Feature decomposition – PCA PCA is an algorithm commonly used to decompose the dimensions of an input signal and keep just the principal ones. From a mathematical perspective, PCA performs an orthogonal transformation of the observation matrix, outputting a set of linear uncorrelated variables, named principal components. The output variables form a basis set, where each component is orthonormal to the others. Also, it's possible to rank the output components (in order to use just the principal ones) as the first component is the one containing the largest possible variance of the input dataset, the second is orthogonal to the first (by definition) and contains the largest possible variance of the residual signal, and the third is orthogonal to the first two and contains the largest possible variance of the residual signal, and so on. A generic transformation with PCA can be expressed as a projection to a space. If just the principal components are taken from the transformation basis, the output space will have a smaller dimensionality than the input one. Mathematically, it can be expressed as follows: Here, X is a generic point of the training set of dimension N, T is the transformation matrix coming from PCA, and  is the output vector. Note that the symbol indicates a dot product in this matrix equation. From a practical perspective, also note that all the features of X must be zero-centered before doing this operation. Let's now start with a practical example; later, we will explain math PCA in depth. In this example, we will create a dummy dataset composed of two blobs of points—one cantered in (-5, 0) and the other one in (5,5).Let's use PCA to transform the dataset and plot the output compared to the input. In this simple example, we will use all the features, that is, we will not perform feature reduction: In:from sklearn.datasets.samples_generator import make_blobs from sklearn.decomposition import PCA X, y = make_blobs(n_samples=1000, random_state=101, centers=[[-5, 0], [5, 5]]) pca = PCA(n_components=2) X_pca = pca.fit_transform(X) pca_comp = pca.components_.T test_point = np.matrix([5, -2]) test_point_pca = pca.transform(test_point) plt.subplot(1, 2, 1) plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='none') plt.quiver(0, 0, pca_comp[:,0], pca_comp[:,1], width=0.02, scale=5, color='orange') plt.plot(test_point[0, 0], test_point[0, 1], 'o') plt.title('Input dataset') plt.subplot(1, 2, 2) plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, edgecolors='none') plt.plot(test_point_pca[0, 0], test_point_pca[0, 1], 'o') plt.title('After "lossless" PCA') plt.show()   As you can see, the output is more organized than the original features' space and, if the next task is a classification, it would require just one feature of the dataset, saving almost 50% of the space and computation needed. In the image, you can clearly see the core of PCA: it's just a projection of the input dataset to the transformation basis drawn in the image on the left in orange. Are you unsure about this? Let's test it: In:print "The blue point is in", test_point[0, :] print "After the transformation is in", test_point_pca[0, :] print "Since (X-MEAN) * PCA_MATRIX = ", np.dot(test_point - pca.mean_, pca_comp) Out:The blue point is in [[ 5 -2]] After the transformation is in [-2.34969911 -6.2575445 ] Since (X-MEAN) * PCA_MATRIX = [[-2.34969911 -6.2575445 ]   Now, let's dig into the core problem: how is it possible to generate T from the training set? It should contain orthonormal vectors, and the vectors should be ranked according the quantity of variance (that is, the energy or information carried by the observation matrix) that they can explain. Many solutions have been implemented, but the most common implementation is based on Singular Value Decomposition (SVD). SVD is a technique that decomposes any matrix M into three matrixes () with special properties and whose multiplication gives back M again: Specifically, given M, a matrix of m rows and n columns, the resulting elements of the equivalence are as follows: U is a matrix m x m (square matrix), it's unitary, and its columns form an orthonormal basis. Also, they're named left singular vectors, or input singular vectors, and they're the eigenvectors of the matrix product .  is a matrix m x n, which has only non-zero elements on its diagonal. These values are named singular values, are all non-negative, and are the eigenvalues of both  and . W is a unitary matrix n x n (square matrix), its columns form an orthonormal basis, and they're named right (or output) singular vectors. Also, they are the eigenvectors of the matrix product . Why is this needed? The solution is pretty easy: the goal of PCA is to try and estimate the directions where the variance of the input dataset is larger. For this, we first need to remove the mean from each feature and then operate on the covariance matrix . Given that, by decomposing the matrix X with SVD, we have the columns of the matrix W that are the principal components of the covariance (that is, the matrix T we are looking for), the diagonal of  that contains the variance explained by the principal components, and the columns of U the principal components. Here's why PCA is always done with SVD. Let's see it now on a real example. Let's test it on the Iris dataset, extracting the first two principal components (that is, passing from a dataset composed by four features to one composed by two): In:from sklearn import datasets iris = datasets.load_iris() X = iris.data y = iris.target print "Iris dataset contains", X.shape[1], "features" pca = PCA(n_components=2) X_pca = pca.fit_transform(X) print "After PCA, it contains", X_pca.shape[1], "features" print "The variance is [% of original]:", sum(pca.explained_variance_ratio_) plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, edgecolors='none') plt.title('First 2 principal components of Iris dataset') plt.show() Out:Iris dataset contains 4 features After PCA, it contains 2 features The variance is [% of original]: 0.977631775025 This is the analysis of the outputs of the process: The explained variance is almost 98% of the original variance from the input. The number of features has been halved, but only 2% of the information is not in the output, hopefully just noise. From a visual inspection, it seems that the different classes, composing the Iris dataset, are separated from each other. This means that a classifier working on such a reduced set will have comparable performance in terms of accuracy, but will be faster to train and run prediction. As a proof of the second point, let's now try to train and test two classifiers, one using the original dataset and another using the reduced set, and print their accuracy: In:from sklearn.linear_model import SGDClassifier from sklearn.cross_validation import train_test_split from sklearn.metrics import accuracy_score def test_classification_accuracy(X_in, y_in): X_train, X_test, y_train, y_test = train_test_split(X_in, y_in, random_state=101, train_size=0.50) clf = SGDClassifier('log', random_state=101) clf.fit(X_train, y_train) return accuracy_score(y_test, clf.predict(X_test)) print "SGDClassifier accuracy on Iris set:", test_classification_accuracy(X, y) print "SGDClassifier accuracy on Iris set after PCA (2 compo-nents):", test_classification_accuracy(X_pca, y) Out:SGDClassifier accuracy on Iris set: 0.586666666667 SGDClassifier accuracy on Iris set after PCA (2 components): 0.72 As you can see, this technique not only reduces the complexity and space of the learner down in the chain, but also helps achieve generalization (exactly as a Ridge or Lasso regularization). Now, if you are unsure how many components should be in the output, typically as a rule of thumb, choose the minimum number that is able to explain at least 90% (or 95%) of the input variance. Empirically, such a choice usually ensures that only the noise is cut off. So far, everything seems perfect: we found a great solution to reduce the number of features, building some with very high predictive power, and we also have a rule of thumb to guess the right number of them. Let's now check how scalable this solution is: we're investigating how it scales when the number of observations and features increases. The first thing to note is that the SVD algorithm, the core piece of PCA, is not stochastic; therefore, it needs the whole matrix in order to be able to extract its principal components. Now, let's see how scalable PCA is in practice on some synthetic datasets with an increasing number of features and observations. We will perform a full (lossless) decomposition (the augment while instantiating the object PCA is None), as asking for a lower number of features doesn't impact the performance (it's just a matter of slicing the output matrixes of SVD). In the following code, we first create matrices with 10 thousand points and 20, 50, 100, 250, 1,000, and 2,500 features to be processed by PCA. Then, we create matrixes with 100 features and 1, 5, 10, 25, 50, and 100 thousands observations to be processed with PCA: In:import time def check_scalability(test_pca): pylab.rcParams['figure.figsize'] = (10, 4) # FEATURES n_points = 10000 n_features = [20, 50, 100, 250, 500, 1000, 2500] time_results = [] for n_feature in n_features: X, _ = make_blobs(n_points, n_features=n_feature, random_state=101) pca = copy.deepcopy(test_pca) tik = time.time() pca.fit(X) time_results.append(time.time()-tik) plt.subplot(1, 2, 1) plt.plot(n_features, time_results, 'o--') plt.title('Feature scalability') plt.xlabel('Num. of features') plt.ylabel('Training time [s]') # OBSERVATIONS n_features = 100 n_observations = [1000, 5000, 10000, 25000, 50000, 100000] time_results = [] for n_points in n_observations: X, _ = make_blobs(n_points, n_features=n_features, random_state=101) pca = copy.deepcopy(test_pca) tik = time.time() pca.fit(X) time_results.append(time.time()-tik) plt.subplot(1, 2, 2) plt.plot(n_observations, time_results, 'o--') plt.title('Observations scalability') plt.xlabel('Num. of training observations') plt.ylabel('Training time [s]') plt.show() check_scalability(PCA(None)) Out: As you can clearly see, PCA based on SVD is not scalable: if the number of features increases linearly, the time needed to train the algorithm increases exponentially. Also, the time needed to process a matrix with a few hundred observations becomes too high and (not shown in the image) the memory consumption makes the problem unfeasible for a domestic computer (with 16 or less GB of RAM).It seems clear that a PCA based on SVD is not the solution for big data: fortunately, in the recent years, many workarounds have been introduced. Summary In this article, we've introduced a popular unsupervised learner able to scale to cope with big data. PCA is able to reduce the number of features by creating ones containing the majority of variance (that is, the principal ones). You can also refer the following books on the similar topics: R Machine Learning Essentials: https://www.packtpub.com/big-data-and-business-intelligence/r-machine-learning-essentials R Machine Learning By Example: https://www.packtpub.com/big-data-and-business-intelligence/r-machine-learning-example Machine Learning with R - Second Edition: https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-r-second-edition Resources for Article: Further resources on this subject: Machine Learning Tasks [article] Introduction to Clustering and Unsupervised Learning [article] Clustering and Other Unsupervised Learning Methods [article]
Read more
  • 0
  • 0
  • 3359
article-image-notes-field
Packt
03 Jan 2017
7 min read
Save for later

Notes from the field

Packt
03 Jan 2017
7 min read
In this article by Donabel Santos author of the book Tableau 10 Business Intelligence Cookbook would like to offer you perhaps a personal, and maybe a not-so-conventional way to introduce Tableau. I’d like to highlight a few key concepts and tricks that I think would be useful to you as you go along. These are certainly points I highlight on the board whenever I do training on Tableau. If you feel like we are jumping too far ahead, please go ahead and start with the following section Tableau Primer. Come back to this section when you are ready for the tips and tricks. (For more resources related to this topic, see here.) Instead of thinking of Tableau as this software tool that has a steep learning curve, it is useful to think of it as a blank slate. You will draw on it, keep on adding things, removing things until something makes sense or something insightful pops out. After you work with Tableau for a while and get more comfortable with its functionalities, it might even feel like an extension of your brain to some degree. When you get access to data, you might automatically open Tableau to try and understand what’s in that data. Undo is your best friend Do not be afraid to make mistakes, and do not be afraid to explore in Tableau. Do not come in with strict prejudice – for example thinking that you can only use a time series graph when you have a measure and a date field. The best way to learn and explore how powerful Tableau is to try anything and everything. It’s one of the best tools to experiment. If you make a mistake, or if you don’t like what you see, no sweat. Just click on this friendly undo button and you are back to your previous view. If you are more of a shortcut person, it will be Ctrl + Z on a PC or Command + Z on a Mac. It doesn’t change your original data This is another common concern that comes up in my training sessions or whenever I talk to people about Tableau. No, Tableau does not write back to your data source. All the changes you make will be stored in Tableau like creating calculated fields, changing data types, editing aliases will be stored in your Tableau workbook or data source. Drag and drop Tableau is a highly drag and drop software. Although you can use the menu or a right click instead of a drag and drop for the same tasks, dragging and dropping is often faster. It also flows with your train of thought. Look for visual cues Tableau leverages its visual culture in your design area, so when you create views in Tableau, some of the visual cues and icons can help you along the way. A number of the visual cues have been discussed in this section. However, there may be some lesser known (or less noticeable) visual cues: Italicized field names mean they are Tableau-generated fields: Dual axis charts create fused pills. Notice the area when the two pills touch – they’re straight instead of curved: When you zoom in to maps, or when you search for a place, your map gets pinned (or fixed to this place) until you unpin it: Know the difference between blue (discrete) and green (continuous) Knowing the difference between blue and green will take you far in the Tableau world. The data type icons you will find beside your field names in the side bar are colored either blue or green. When you drag fields onto shelves and cards, the pills are also colored blue and green. Simply speaking, blue means discrete and green means continuous. Discrete means individual, separate, countable and finite. Continuous means range, and technically, there is an infinite number of values within this range. What’s more important is how these are manifested in Tableau. A blue discrete field will produce header, and a green continuous field will produce an axis. If dropped onto the Color shelf, for example, a blue discrete field will use individual, finite colors. A green continuous field will use a range (gradient) of colors. Some confusion also arises when we see that, by default, Tableau places numeric fields under Measures and are colored green, and categorical information under Dimensions are colored blue. These won’t always be the case. We can have numeric values that are discrete – for example an Order Number. We can also see non-numerical, discrete fields under Measures. Learn a few key shortcuts Shortcuts are great, but it’s typically faster to work when you know a few of them. Here are some of my favorite shortcuts: Shortcut What it does Right click + Drag Opens the Drop Field menu, which allows you to specify exactly which variation of the field you want to use Double click Adds the field to the view I particularly like this when creating text tables. After you place your first measure in Text, you can add more measures to your text table by double clicking on the succeeding measures Ctrl + Arrow Adjusts the height/width of the rows/columns in the view Ctrl + H Presentation mode You can find the complete list of shortcuts here: http://bit.ly/tableau-shortcuts Unpackage option The .twbx file is a Tableau packaged workbook, which means it packages local files with your Tableau workbook. When you right click a .twbx file in a machine that has Tableau Desktop installed in it, you will see a new option called Unpackage. When you unpack a .twbx file, you will get the .twb file and another folder that contains all the local files that were used in the original workbook: Just keep in mind that data (at least the file-based data sources and extracts) get packaged with your .twbx files. This is an important security and data governance consideration when you are deciding how to share your workbooks with others. Table calculations are calculations on your table. How you structure or lay out your table (or view) will affect your table calculations. Table calculations are highly influenced by: Layout Filters Scope and Direction Let’s say, for example, you are calculating Percent of Total in your view. If you swap the fields in your Rows and Columns, i.e. changing the layout, your numbers will change If you filter some of the products out, your numbers will change If you decide to compute Pane Down instead of Table Across, your numbers will change If you’re looking for the common use cases for table calculations, check out the Tableau article entitled Top 10 Tableau Table Calculations which can be found here: http://bit.ly/top10tablecalcs LODs Rock Many of the tasks that required complex table calculations or data blending have been greatly simplified by LODs (Level of Detail expressions). LODs allow us to have multiple levels of detail within a single view, and this increases the possibilities in Tableau. To learn more about Level of Detail expressions, I encourage you to check out the following: Understanding Level of Detail Expressions: http://bit.ly/UnderstandingLOD Top 15 LOD Expressions: http://bit.ly/top15LOD It is possible …. Another common question that comes up is can I do <this> or is it possible to do <this>. The answer to many of the questions is yes, and many will include calculations and/or parameters. However, not all solutions will be quick and straightforward. Some may require multiple calculated fields, table calculations, LOD expressions, regular expressions, R scripts etc. Summary In this article we have seen the basics of Tableau as this software tool that has a steep learning curve, it is useful to think of it as a blank slate. You will draw on it, keep on adding things, removing things until something makes sense or something insightful pops out. After you work with Tableau for a while and get more comfortable with its functionalities, it might even feel like an extension of your brain to some degree. When you get access to data, you might automatically open Tableau to try and understand what’s in that data. Resources for Article: Further resources on this subject: Say Hi to Tableau [article] Getting Started with Tableau Public [article] R and its Diverse Possibilities [article]
Read more
  • 0
  • 0
  • 3356

article-image-execution-test-plans
Packt
28 Oct 2014
23 min read
Save for later

Execution of Test Plans

Packt
28 Oct 2014
23 min read
In this article by Bayo Erinle, author of JMeter Cookbook, we will cover the following recipes: Using the View Results Tree listener Using the Aggregate Report listener Debugging with Debug Sampler Using Constant Throughput Timer Using the JSR223 postprocessor Analyzing Response Times Over Time Analyzing transactions per second (For more resources related to this topic, see here.) One of the critical aspects of performance testing is knowing the right tools to use to attain your desired targets. Even when you settle on a tool, it is helpful to understand its features, component sets, and extensions, and appropriately apply them when needed. In this article, we will go over some helpful components that will aid you in recording robust and realistic test plans while effectively analyzing reported results. We will also cover some components to help you debug test plans. Using the View Results Tree listener One of the most often used listeners in JMeter is the View Results Tree listener. This listener shows a tree of all sample responses, giving you quick navigation of any sample's response time, response codes, response content, and so on. The component offers several ways to view the response data, some of which allow you to debug CSS/jQuery, regular expressions, and XPath queries, among other things. In addition, the component offers the ability to save responses to file, in case you need to store them for offline viewing or run some other processes on them. Along with the various bundled testers, the component provides a search functionality that allows you to quickly search for the responses of relevant items. How to do it… In this recipe, we will cover how to add the View Results Tree listener to a test plan and then use its in-built testers to test the response and derive expressions that we can use in postprocessor components. Perform the following steps: Launch JMeter. Add Thread Group to the test plan by navigating to Test Plan | Add | Threads (Users) | Thread Group. Add HTTP Request to the thread group by navigating to Thread Group | Add | Sampler | HTTP Request. Fill in the following details:    Server Name or IP: dailyjs.com Add the View Results Tree listener to the test plan by navigating to Test Plan | Add | Listener | View Results Tree. Save and run the test plan. Once done, navigate to the View Results Tree component and click on the Response Data tab. Observe some of the built-in renders. Switch to the HTML render view by clicking on the dropdown and use the search textbox to search for any word on the page. Switch to the HTML (download resources) render view by clicking on the dropdown. Switch to the XML render view by clicking on the dropdown. Notice the entire HTML DOM structure is presented as the XML node elements. Switch to the RegExp Tester render view by clicking on the dropdown and try out some regular expression queries. Switch to the XPath Query Tester render view and try out some XPath queries. Switch to the CSS/jQuery Tester render view and try out some jQuery queries, for example, selecting all links inside divs marked with a class preview (Selector: div.preview a, Attribute: href, CSS/jQuery Implementation: JSOUP). How it works… As your test plans execute, the View Result Tree listener reports each sampler in your test plans individually. The Sampler Result tab of the component gives you a summarized view of the request and response including information such as load time, latency, response headers, body content sizes, response code and messages, response header content, and so on. The Request tab shows the actual request that got fulfilled by the sampler, which could be any of the acceptable requests the server can fulfill (for example, GET, POST, PUT, DELETE, and so on) along with details of the request headers. Finally, the Response Data tab gives the rendered view of the response received back from the server. The component includes several built-in renders along with tester components (CSS/JQuery, RegExp, and XPath) that allow us to test and come up with the right expressions or queries needed to use in postprocessor components within our test plans. This is a huge time saver as it means we don't have to exercise the same tests repeatedly to nail down such expressions. There's more… As with most things bundled with JMeter, additional view renders can be added to the View Result Tree component. The defaults included are Document, HTML, HTML (download resources), JSON, Text, and XML. Should any of these not suit your needs, you can create additional ones by implementing org.apache.jmeter.visualizers.ResultRender interface and/or extending org.apache.jmeter.visualizers.SamplerResultTab abstract class, bundling up the compiled classes as a JAR file and placing them in the $JMETER_HOME/lib/ext directory to make them available for JMeter. The View Result Tree listener consumes a lot of memory and CPU resources, and should not be used during load testing. Use it only to debug and validate the test plans. See also The Debugging with Debug Sampler recipe The detailed component reference for the View Results Tree listener can be found at http://jmeter.apache.org/usermanual/component_reference.html#View_Results_Tree Using the Aggregate Report listener Another often used listener in JMeter is the Aggregate Report listener. This listener creates a row for each uniquely named request in the test plan. Each row gives a summarized view of useful information including Request Count, Average, Median, Min, Max, 90% Line, Error Rate, Throughput, Requests/second, and KB/sec. The 90% Line column is particularly worth paying close attention to as you execute your tests. This figure gives you the time it takes for the majority of threads/users to execute a particular request. It is measured in milliseconds. Higher numbers here are indicative of slow requests and/or components within the application under test. Equally important is the Error % column, which reports the failure rate of each sampled request. It is reasonable to have some level of failure when exercising test runs, but too high a number is an indication of either errors in scripts or certain components in the application under test. Finally, of interest to stack holders might be the number of requests per second, which the Throughput column reports. The throughput values are approximate and let you know just how many requests per second the server is able to handle. How to do it… In this recipe, we will cover how to add an Aggregate Report listener to a test plan and then see the summarized view of our execution: Launch JMeter. Open the ch7_shoutbox.jmx script bundled with the code samples. Alternatively, you can download it from https://github.com/jmeter-cookbook/bundled-code/scripts/ch7/ch7_shoutbox.jmx. Add the Aggregate Report listener to Thread Group by navigating to Thread Group | Add | Listener | Aggregate Report. Save and run the test plan. Observe the real-time summary of results in the listener as the test proceeds. How it works… As your test plans execute, the Aggregate Report listener reports each sampler in your test plan on a separate row. Each row is packed with useful information. The Label column reflects the sample name, # Samples gives a count of each sampler, and Average, Mean, Min, and Max all give you the respective times of each sampler. As mentioned earlier, you should pay close attention to the 90% Line and Error % columns. This can help quickly pinpoint problematic components within the application under test and/or scripts. The Throughput column gives an idea of the responsiveness of the application under test and/or server. This can also be indicative of the capacity of the underlying server that the application under test runs on. This entire process is demonstrated in the following screenshot: Using the Aggregate Report listener See also http://jmeter.apache.org/usermanual/component_reference.html#Summary_Report Debugging with Debug Sampler Often, in the process of recording a new test plan or modifying an existing one, you will need to debug the scripts to finally get your desired results. Without such capabilities, the process will be a mix of trial and error and will become a time-consuming exercise. Debug Sampler is a nifty little component that generates a sample containing the values of all JMeter variables and properties. The generated values can then be seen in the Response Data tab of the View Results Tree listener. As such, to use this component, you need to have a View Results Tree listener added to your test plan. This component is especially useful when dealing with postprocessor components as it helps to verify the correct or expected values that were extracted during the test run. How to do it… In this recipe, we will see how we can use Debug Sampler to debug a postprocessor in our test plans. Perform the following steps: Launch JMeter. Open the prerecorded script ch7_debug_sampler.jmx bundled with the book. Alternatively, you can download it from http://git.io/debug_sampler. Add Debug Sampler to the test Thread Group by navigating to Thread Group | Add | Sampler | Debug Sampler. Save and run the test. Navigate to the View Results Tree listener component. Switch to RegExp Tester by clicking on the dropdown. Observe the response data of the Get All Requests sampler. What we want is a regular expression that will help us extract the ID of entries within this response. After a few attempts, we settle at "id":(d+). Enable all the currently disabled samplers, that is, Request/Create Holiday Request, Modify Holiday, Get All Requests, and Delete Holiday Request. You can achieve this by selecting all the disabled components, right-clicking on them, and clicking on Enable. Add the Regular Expression Extractor postprocessor to the Request/Create Holiday Request sampler by navigating to Request/Create Holiday Request | Add | Post Processors | Regular Expression Extractor. Fill in the following details:    Reference Name: id    Regular Expression: "id":(d+)    Template: $1$    Match No.: 0    Default Value: NOT_FOUND Save and rerun the test. Observe the ID of the newly created holiday request and whether it was correctly extracted and reported in Debug Sampler. How it works… Our goal was to test a REST API endpoint that allows us to list, modify, and delete existing resources or create new ones. When we create a new resource, the identifier (ID) is autogenerated from the server. To perform any other operations on the newly created resource, we need to grab its autogenerated ID, store that in a JMeter variable, and use it further down the execution chain. In step 7, we were able to observe the format of the server response for the resource when we executed the Get All Requests sampler. With the aid of RegExp Tester, we were able to nail down the right regular expression to use to extract the ID of a resource, that is, "id":(d+). Armed with this information, we added a Regular Expression Extractor postprocessor component to the Request/Create Holiday Request sampler and used the derived expression to get the ID of the newly created resource. We then used the ID stored in JMeter to modify and delete the resource down the execution chain. After test completion, with the help of Debug Sampler, we were able to verify whether the resource ID was properly extracted by the Regular Expression Extractor component and stored in JMeter as an ID variable. Using Constant Throughput Timer While running test simulations, it is sometimes necessary to be able to specify the throughput in terms of the number of requests per minute. This is the function of Constant Throughput Timer. This component introduces pauses to the test plan in such a way as to keep the throughput as close as possible to the target value specified. Though the name implies it is constant, various factors affect the behavior, such as server capacity, other timers or time-consuming elements in the test plan, and so on. As a result, the targeted throughput could be lowered. How to do it… In this recipe, we will add Constant Throughput Timer to our test plan and see how we can specify the expected throughput with it. Perform the following steps: Launch JMeter. Open the prerecorded script ch7_constant_throughput.jmx bundled with the book. Alternatively, you can download it from http://git.io/constant_throughput. Add Constant Throughput Timer to Thread Group by navigating to Thread Group | Add | Timer | Constant Throughput Timer. Fill in the following details:    Target throughput (in samples per minute): 200    Calculate Throughput based on: this thread only Save and run the test plan. Allow the test to run for about 5 minutes. Observe the result in the Aggregate Result listener as the test is going on. Stop the test manually as it is currently set to run forever. How it works… The goal of the Constant Throughput Timer component is to get your test plan samples as close as possible to a specified desired throughput. It achieves this by introducing variable pauses to the test plan in such a manner that will keep numbers as close as possible to the desired throughput. That said, throughput will be lowered if the server resources of the system under test can't handle the load. Also, other elements (for example, other timers, the number of specified threads, and so on) within the test plan can affect attaining the desired throughput. In our recipe, we have specified the throughput rate to be calculated based on a single thread, but Constant Throughput Timer also allows throughput to be calculated based on all active threads and all active threads in the current thread group. Each of these settings can be used to alter the behavior of the desired throughput. As a rule of thumb, avoid using other timers at the same time you use Constant Throughput Timer, since you'll not achieve the desired throughput. See also The Using Throughput Shaping Timer recipe http://jmeter.apache.org/usermanual/component_reference.html#timers Using the JSR223 postprocessor The JSR223 postprocessor allows you to use precompiled scripts within test plans. The fact that the scripts are compiled before they are actually used brings a significant performance boost compared to other postprocessors. This also allows a variety of programming languages to be used, including Java, Groovy, BeanShell, JEXL, and so on. This allows us to harness the powerful language features in those languages within our test plans. JSR223 components, for example, could help us tackle preprocessor or postprocessor elements and samplers, allowing us more control over how elements are extracted from responses and stored as JMeter variables. How to do it… In this recipe, we will see how to use a JSR223 postprocessor within our test plan. We have chosen Groovy (http://groovy.codehaus.org/) as our choice of scripting language, but any of the other supporting languages will do: Download the standard set of plugins from http://jmeter-plugins.org/. Install the plugins by doing the following:    Extract the ZIP archive to the location of your chosen directory    Copy the lib folder in the extracted directory into the $JMETER_HOME directory Download the groovy-all JAR file from http://devbucket-afriq.s3.amazonaws.com/jmeter-cookbook/groovy-all-2.3.3.jar and add it to the $JMETER_HOME/lib directory. Launch JMeter. Add Thread Group by navigating to Test Plan | Add | Threads(Users) | Thread Group. Add Dummy Sampler to Thread Group by navigating to Thread Group | Add | Sampler | jp@gc - Dummy Sampler. In the Response Data text area, add the following content: <records>   <car name='HSV Maloo' make='Holden' year='2006'>       <country>Australia</country>       <record type='speed'>Production Pickup Truck with speed of 271kph</record>   </car>   <car name='P50' make='Peel' year='1962'>       <country>Isle of Man</country>       <record type='size'>Smallest Street-Legal Car at 99cm wide and 59 kg in weight</record>   </car>   <car name='Royale' make='Bugatti' year='1931'>       <country>France</country>       <record type='price'>Most Valuable Car at $15 million</record>   </car></records> Download the Groovy script file from http://git.io/8jCXMg to any location of your choice. Alternatively, you can get it from the code sample bundle accompanying the book (ch7_jsr223.groovy). Add JSR223 PostProcessor as a child of Dummy Sampler by navigating to jp@gc - Dummy Sampler | Add | Post Processors | JSR223 PostProcessor. Select Groovy as the language of choice in the Language drop-down box. In the File Name textbox, put in the absolute path to where the Groovy script file is, for example, /tmp/scripts/ch7/ch7_jsr223.groovy. Add the View Results Tree listener to the test plan by navigating to Test Plan | Add | Listener | View Results Tree. Add Debug Sampler to Thread Group by navigating to Thread Group | Add | Sampler | Debug Sampler. Save and run the test. Observe the Response Data tab of Debug Sampler and see how we now have the JMeter variables car_0, car_1, and car_2, all extracted from the Response Data tab and populated by our JSR223 postprocessor component. How it works… JMeter exposes certain variables to the JSR223 component, allowing it to get hold of sample details and information, perform logic operations, and store the results as JMeter variables. The exposed attributes include Log, Label, Filename, Parameters, args[], ctx, vars, props, prev, sampler, and OUT. Each of these allows access to important and useful information that can be used during the postprocessing of sampler responses. The log gives access to Logger (an instance of an Apache Commons Logging log instance; see http://bit.ly/1xt5dmd), which can be used to write log statements to the logfile. The Label and Filename attributes give us access to the sample label and script file name respectively. The Parameters and args[] attributes give us access to parameters sent to the script. The ctx attribute gives access to the current thread's JMeter context (http://bit.ly/1lM31MC). vars gives access to write values into JMeter variables (http://bit.ly/1o5DDBr), exposing them to the result of the test plan. The props attribute gives us access to JMeterProperties. The sampler attribute gives us access to the current sampler while OUT allows us to write log statements to the standard output, that is, System.out. Finally, the prev sample gives access to previous sample results (http://bit.ly/1rKn8Cs), allowing us to get useful information such as the response data, headers, assertion results, and so on. In our script, we made use of the prev and vars attributes. With prev, we were able to get hold of the XML response from the sample. Using Groovy's XmlSlurper (http://bit.ly/1AoRMnb), we were able to effortlessly process the XML response and compose the interesting bits, storing them as JMeter variables using the vars attribute. Using this technique, we are able to accomplish tasks that might have otherwise been cumbersome to achieve using any other postprocessor elements we have seen in other recipes. We are able to take full advantage of the language features of any chosen scripting language. In our case, we used Groovy, but any other supported scripting languages you are comfortable with will do as well. See also http://jmeter.apache.org/api http://jmeter.apache.org/usermanual/component_reference.html#BSF_PostProcessor http://jmeter.apache.org/api/org/apache/jmeter/threads/JMeterContext.html http://jmeter.apache.org/api/org/apache/jmeter/threads/JMeterVariables.html http://jmeter.apache.org/api/org/apache/jmeter/samplers/SampleResult.html Analyzing Response Times Over Time An important aspect of performance testing is the response times of the application under test. As such, it is often important to visually see the response times over a duration of time as the test plan is executed. Out of the box, JMeter comes with the Response Time Graph listener for this purpose, but it is limited and lacks some features. Such features include the ability to focus on a particular sample when viewing chat results, controlling the granularity of timeline values, selectively choosing which samples appear or not in the resulting chart, controlling whether to use relative graphs or not, and so on. To address all these and more, the Response Times Over Time listener extension from the JMeter plugins project comes to the rescue. It shines in areas where the Response Time Graph falls short. How to do it… In this recipe, we will see how to use the Response Times Over Time listener extension in our test plan and get the response times of our samples over time. Perform the following steps: Download the standard set of plugins from http://jmeter-plugins.org/. Install the plugins by doing the following:    Extract the ZIP archive to the location of your chosen directory    Copy the lib folder in the extracted directory into the $JMETER_HOME directory Launch JMeter. Open any of your existing prerecorded scripts or record a new one. Alternatively, you can open the ch7_response_times_over_time.jmx script accompanying the book or download it from http://git.io/response_times_over_time. Add the Response Times Over Time listener to the test plan by navigating to Test Plan | Add | Listener | jp@gc - Response Times Over Time. Save and execute the test plan. View the resulting chart in the tab by clicking on the Response Times Over Time component. Observe the time elapsed on the x axis and the response time in milliseconds on the y axis for all samples contained in the test plan. Navigate to the Rows tab and exclude some of the samples from the chart by unchecking the selection boxes next to the samples. Switch back to the Chart tab and observe that the chart now reflects your changes, allowing you to focus in on interested samples. Switch to the Settings tab and see all the available configuration options. Change some options and repeat the test execution. This is shown in the following screenshot: Analyzing Response Times Over Time How it works… Just like its name implies, the Response Times Over Time listener extension displays the average response time in milliseconds for each sampler in the test plan. It comes with various configuration options that allow you to customize the resulting graph to your heart's content. More importantly, it allows you to focus in on specific samples in your test plan, helping you pinpoint potential bottlenecks or problematic modules within the application under test. For graphs to be more meaningful, it helps to give samples sensible descriptive names and tweak the granularity of the elapsed time to a higher number in the Settings tab if you have long running tests. After test execution, data of any chart can also be exported to a CSV file for further analysis or use as you desire. Any listener that charts results will have some impact on performance and shouldn't be used during high volume load testing. Analyzing transactions per second Sometimes we are tasked with testing backend services, application program interfaces (APIs), or some other components that may not necessarily have a graphical user interface (GUI) attached to it, for example, a classic web application. At such times, the measure of the responsiveness of the module, for example, will be how many transactions per second it can withstand before slowness is observed. For example, Transactions Per Second (TPS) is useful information for stakeholders who are providing services that can be consumed by various third-party components or other services. Good examples of these include the Google search engine, which can be consumed by third-parties, and the Twitter and Facebook APIs, which allow developers to integrate their application with Twitter and Facebook respectively. The Transactions Per Second listener extension component from the JMeter plugins project allows us to measure the transactions per second. It plots a chart of the transactions per second over an elapsed duration of time. How to do it… In this recipe, we will see how to use the Transactions Per Second listener extension in our test plan and get the transactions per second for a test API service: Download the standard set of plugins from http://jmeter-plugins.org/. Install the plugins by doing the following:    Extract the ZIP archive to the location of your chosen directory    Copy the lib folder in the extracted directory into the $JMETER_HOME directory Launch JMeter. Open the ch7_transaction_per_sec.jmx script accompanying the book or download it from http://git.io/trans_per_sec. Add the Transactions Per Second listener to the test plan by navigating to Test Plan | Add | Listener | jp@gc - Transactions per Second. Save and execute the test plan. View the resulting chart in the tab by clicking on the Transactions Per Second component. Observe the time elapsed on the x axis and the transactions/sec on the y axis for all samples contained in the test plan. Navigate to the Rows tab and exclude some of the samples from the chart by unchecking the selection boxes next to the samples. Switch back to the Chart tab and observe that the chart now reflects your changes, allowing you to focus in on interesting samples. Switch to the Settings tab and see all the available configuration options. Change some options and repeat the test execution. How it works… The Transactions Per Second listener extension displays the transactions per second for each sample in the test plan by counting the number of successfully completed transactions each second. It comes with various configuration options that allow you to customize the resulting graph. Such configurations allow you to focus in on specific samples of interest in your test plan, helping you to get at impending bottlenecks within the application under test. It is helpful to give your samples sensible descriptive names to help make better sense of the resulting graphs and data points. This is shown in the following screenshot: Analyzing Transactions per Second Summary In this article, you learned how to build a test plan using the steps mentioned in the recipe. Furthermore, you saw how to debug and analyze the result of a test plan after building it. Resources for Article: Further resources on this subject: Functional Testing with JMeter [article] Performance Testing Fundamentals [article] Common performance issues [article]
Read more
  • 0
  • 0
  • 3356

article-image-taking-control-reactivity-inputs-and-outputs
Packt
23 Oct 2013
7 min read
Save for later

Taking Control of Reactivity, Inputs, and Outputs

Packt
23 Oct 2013
7 min read
(For more resources related to this topic, see here.) Showing and hiding elements of the UI We'll start easy with a simple function that you are certainly going to need if you build even a moderately complex application. Those of you who have been doing extra credit exercises and/or experimenting with your own applications will probably have already wished for this or, indeed, have already found it. conditionalPanel() allows you to show/hide UI elements based on other selections within the UI. The function takes a condition (in JavaScript, but the form and syntax will be familiar from many languages) and a UI element, and displays the UI only when the condition is true. This is actually used a couple of times in the advanced GA application and indeed in all the applications I've ever written of even moderate complexity. The following is a simpler example (from ui.R, of course, in the first section, within sidebarPanel()), which allows users who request a smoothing line to decide what type they want: conditionalPanel(condition = "input.smoother == true",selectInput("linearModel", "Linear or smoothed",list("lm", "loess"))) As you can see, the condition appears very R/Shiny-like, except with the "." operator familiar to JavaScript users in place of "$", and with "true" in lower case. This is a very simple but powerful way of making sure that your UI is not cluttered with irrelevant material. Giving names to tabPanel elements In order to further streamline the UI, we're going to hide the hour selector when the monthly graph is displayed and the date selector when the hourly graph is displayed. The difference is illustrated in the following screenshot with side-by-side pictures, hourly figures UI on the left-hand side and monthly figures on the right-hand side: In order to do this, we're going to have to first give the tabs of the tabbed output names. This is done as follows (with the new code in bold): tabsetPanel(id ="theTabs",tabPanel("Summary", textOutput("textDisplay"),value = "summary"),tabPanel("Monthly figures",plotOutput("monthGraph"), value = "monthly"),tabPanel("Hourly figures",plotOutput("hourGraph"), value = "hourly")) As you can see, the whole panel is given an ID (theTabs), and then each tabPanel is also given a name (summary, monthly, and hourly). They are referred to in the server.R file very simply as input$theTabs. Let's have a quick look at a chunk of code in server.R that references the tab names; this code makes sure that we subset based on date only when the date selector is actually visible, and by hour only when the hour selector is actually visible. Our function to calculate and pass data now looks like the following (new code again bolded): passData <- reactive({if(input$theTabs != "hourly"){analytics <- analytics[analytics$Date %in%seq.Date(input$dateRange[1], input$dateRange[2],by = "days"),]}if(input$theTabs != "monthly"){analytics <- analytics[analytics$Hour %in%as.numeric(input$minimumTime) :as.numeric(input$maximumTime),]}analytics <- analytics[analytics$Domain %in%unlist(input$domainShow),]analytics}) As you can see, subsetting by month is carried out only when the date display is visible (that is, when the hourly tab is not shown), and vice versa. Finally, we can make our changes to ui.R to remove parts of the UI based on tab selection: conditionalPanel(condition = "input.theTabs != 'hourly'",dateRangeInput(inputId = "dateRange",label = "Date range",start = "2013-04-01",max = Sys.Date())),conditionalPanel(condition = "input.theTabs != 'monthly'",sliderInput(inputId = "minimumTime",label = "Hours of interest- minimum",min = 0,max = 23,value = 0,step = 1),sliderInput(inputId = "maximumTime",label = "Hours of interest- maximum",min = 0,max = 23,value = 23,step = 1)) Note the use in the latter example of two UI elements within the same conditionalPanel() call; it is worth noting that it helps you keep your code clean and easy to debug. Reactive user interfaces Another trick you will definitely want up your sleeve at some point is a reactive user interface. This enables you to change your UI (for example, the number or content of radio buttons) based on reactive functions. For example, consider an application that I wrote related to survey responses across a broad range of health services in different areas. The services are related to each other in quite a complex hierarchy, and over time, different areas and services respond (or cease to exist, or merge, or change their name...), which means that for each time period the user might be interested in, there would be a totally different set of areas and services. The only sensible solution to this problem is to have the user tell you which area and date range they are interested in and then give them back the correct list of services that have survey responses within that area and date range. The example we're going to look at is a little simpler than this, just to keep from getting bogged down in too much detail, but the principle is exactly the same and you should not find this idea too difficult to adapt to your own UI. We are going to imagine that your users are interested in the individual domains from which people are accessing the site, rather than just have them lumped together as the NHS domain and all others. To this end, we will have a combo box with each individual domain listed. This combo box is likely to contain a very high number of domains across the whole time range, so we will let users constrain the data by date and only have the domains that feature in that range return. Not the most realistic example, but it will illustrate the principle for our purposes. Reactive user interface example – server.R The big difference is that instead of writing your UI definition in your ui.R file, you place it in server.R, and wrap it in renderUI(). Then all you do is point to it from your ui.R file. Let's have a look at the relevant bit of the server.R file: output$reacDomains <- renderUI({domainList = unique(as.character(passData()$networkDomain))selectInput("subDomains", "Choose subdomain", domainList)}) The first line takes the reactive dataset that contains only the data between the dates selected by the user and gives all the unique values of domains within it. The second line is a widget type we have not used yet which generates a combo box. The usual id and label arguments are given, followed by the values that the combo box can take. This is taken from the variable defined in the first line. Reactive user interface example – ui.R The ui.R file merely needs to point to the reactive definition as shown in the following line of code (just add it in to the list of widgets within sidebarPanel()): uiOutput("reacDomains") You can now point to the value of the widget in the usual way, as input$subDomains. Note that you do not use the name as defined in the call to renderUI(), that is, reacDomains, but rather the name as defined within it, that is, subDomains. Summary It's a relatively small but powerful toolbox with which you can build a vast array of useful and intuitive applications with comparatively little effort. This article looked at fine-tuning the UI using conditionalPanel() and observe(), and changing our UI reactively. Resources for Article: Further resources on this subject: Fine Tune the View layer of your Fusion Web Application [Article] Building tiny Web-applications in Ruby using Sinatra [Article] Spring Roo 1.1: Working with Roo-generated Web Applications [Article]
Read more
  • 0
  • 0
  • 3356
article-image-so-what-mongodb
Packt
02 Aug 2013
6 min read
Save for later

So, what is MongoDB?

Packt
02 Aug 2013
6 min read
(For more resources related to this topic, see here.) What is a document? While it may vary for various implementations of different Document Oriented Databases available, as far as MongoDB is concerned it is a BSON document, which stands for Binary JSON. JSON (JavaScript Object Notation) is an open standard developed for human readable data exchange. Though a thorough knowledge of JSON is not really important to understand MongoDB, for keen readers the URL to its RFC is http://tools.ietf.org/html/rfc4627. Also, the BSON specification can be found at http://bsonspec.org/. Since MongoDB stores the data as BSON documents, it is a Document Oriented Database. What does a document look like? Consider the following example where we represent a person using JSON: {"firstName":"Jack","secondName":"Jones","age":30,"phoneNumbers":[{fixedLine:"1234"},{mobile:"5678"}],"residentialAddress":{lineOne:"…",lineTwo:"…",city:"…",state:"…",zip:"…",country:"…"}} As we can see, a JSON document always starts and ends with curly braces and has all the content within these braces. Multiple fields and values are separated by commas, with a field name always being a string value and the value being of any type ranging from string, numbers, date, array, another JSON document, and so on. For example in "firstName":"Jack", the firstName is the name of the field whereas Jack is the value of the field. Need for MongoDB Many of you would probably be wondering why we need another database when we already have good old relational databases. We will try to see a few drivers from its introduction back in 2009. Relational databases are extremely rich in features. But these features don't come for free; there is a price to pay and it is done by compromising on the scalability and flexibility. Let us see these one by one. Scalability It is a factor used to measure the ease with which a system can accommodate the growing amount of work or data. There are two ways in which you can scale your system: scale up, also known as scale vertically or scale out, also known as scale horizontally. Vertical scalability can simply be put up as an approach where we say "Need more processing capabilities? Upgrade to a bigger machine with more cores and memory". Unfortunately, with this approach we hit a wall as it is expensive and technically we cannot upgrade the hardware beyond a certain level. You are then left with an option to optimize your application, which might not be a very feasible approach for some systems which are running in production for years. On the other hand, Horizontal scalability can be described as an approach where we say "Need more processing capabilities? Simple, just add more servers and multiply the processing capabilities". Theoretically this approach gives us unlimited processing power but we have more challenges in practice. For many machines to work together, there would be a communication overhead between them and the probability of any one of these machines being down at a given point of time is much higher. MongoDB enables us to scale horizontally easily, and at the same time addresses the problems related to scaling horizontally to a great extent. The end result is that it is very easy to scale MongoDB with increasing data as compared to relational databases. Ease of development MongoDB doesn't have the concept of creation of schema as we have in relational databases. The document that we just saw can have an arbitrary structure when we store them in the database. This feature makes it very easy for us to model and store relatively unstructured/ complex data, which becomes difficult to model in a relational database. For example, product catalogues of an e-commerce application containing various items and each having different attributes. Also, it is more natural to use JSON in application development than tables from relational world. Ok, it looks good, but what is the catch? Where not to use MongoDB? To achieve the goal of letting MongoDB scale out easily, it had to do away with features like joins and multi document/distributed transactions. Now, you must be wondering it is pretty useless as we have taken away two of the most important features of the relational database. However, to mitigate the problems of joins is one of the reasons why MongoDB is document oriented. If you look at the preceding JSON document for the person, we have the address and the phone number as a part of the document. In relational database, these would have been in separate tables and retrieved by joining these tables together. Distributed/Multi document transactions inhibit MongoDB to scale out and hence are not supported and nor there is a way to mitigate it. MongoDB still is atomic but the atomicity for inserts and updates is guaranteed at document level and not across multiple documents. Hence, MongoDB is not a good fit for scenarios where complex transactions are needed, such as in an OLTP banking applications. This is an area where good old relational database still rules. To conclude, let us take a look at the following image. This graph is pretty interesting and was presented by Dwight Merriman, Founder and CEO of 10gen, the MongoDB company in one of his online courses. As we can see, we have on one side some products like Memcached which is very low on functionality but high on scalability and performance. On the other end we have RDBMS (Relational Database Management System) which is very rich in features but not that scalable. According to the research done while developing MongoDB, this graph is not linear and there is a point in it after which the scalability and performance fall steeply on adding more features to the product. MongoDB sits on this point where it gives maximum possible features without compromising too much on the scalability and performance. Summary In this article, we saw the features displayed by MongoDB, how a document looks like, and how it is better than relational databases. Resources for Article : Further resources on this subject: Building a Chat Application [Article] Ruby with MongoDB for Web Development [Article] Comparative Study of NoSQL Products [Article]
Read more
  • 0
  • 0
  • 3355

article-image-processing-case
Packt
18 Mar 2014
4 min read
Save for later

Processing the Case

Packt
18 Mar 2014
4 min read
(For more resources related to this topic, see here.) Changing the time zone The correct use of the Time Zone feature is of the utmost importance for computer forensics because it might reflect the wrong MAC time of files contained in the evidence, making a professional use the wrong information in an investigation report. Based on this, you must configure the time zone to reflect the location where the evidence was acquired. For example, if you conducted the acquisition of a computer that was located in Los Angeles, US, and bring the evidence to Sao Paulo, Brazil, where your lab is situated, you should adjust the time zone to Los Angeles so that the MAC time of files can reflect the actual moment of its modification, alteration, or creation. The FTK allows you to make that time zone change at the same time that you add a new evidence to the case. Select the time zone of the evidence where it was seized from the drop-down list in the Time Zone field. This is required to add evidence in the case. Take a look at the following screenshot: You can also change the value of Time Zone after adding the evidence. In the menu toolbar, click on View and then click on Time Zone Display. Mounting compound files To locate important information during your investigation, you should expand individual compound file types. This lets you see the child files that are contained within a container, such as ZIP or RAR files. You can access this feature from the case manager's new case wizard, or from the Add Evidence or Additional Analysis dialogs. The following are some of the compound files that you can mount: E-mail files: PST, NSF, DBX, and MSG Compressed files: ZIP, RAR, GZIP, TAR, BZIP, and 7-ZIP System files: Windows thumbnails, registry, PKCS7, MS Office, and EVT If you don't mount compound files, the child files will not be located in keyword searches or filters. To expand compound files, perform the following steps: Do one of the following: For new cases, click on the Custom button in the New Case Options dialog For existing cases, go to Evidence | Additional Analysis Select Expand Compound Files. Click on Expansion Options…. In the Compound File Expansions Options dialog, select the types of files that you want to mount. Click on OK: File and folder export You may need to export part of the files or folders to help you perform some action outside of the FTK platform, or simply for the evidence presentation. To export files or folders you need to perform the following steps: Select one or more files that you would like to export. Right-click on the selection and select Export. A new dialog will open. You can configure some settings before exporting as follows: File Options: This field has advanced options to export files and folders. You can use the default options for a simple export. Items to Include: This field has the selection of files and folders that you will export. The options can be checked, listed, highlighted, or selected all together. Destination base path: This field has the folder to save the files. Take a look at the following screenshot: Column settings Columns are responsible for presenting the information property or metadata related to evidence data. By default, the FTK presents the most commonly used columns. However, you can add or remove columns to aid you in quickly finding relevant information. To manage columns in FTK, in the File List view, right-click on column bars and select Column Settings…. The number of columns available is huge. You can add or remove the columns that you need by just selecting the type and clicking on the Add button: The FTK has some templates of columns settings. You can access them by clicking on Manage and navigating to Columns | Manage Columns: You can use some ready-made templates, edit them, or create your own.
Read more
  • 0
  • 0
  • 3352
Modal Close icon
Modal Close icon