Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Data

1210 Articles
article-image-visualization-dashboard-design
Packt
10 Jan 2017
18 min read
Save for later

Visualization Dashboard Design

Packt
10 Jan 2017
18 min read
In this article by David Baldwin, the author of the book Mastering Tableau, we will cover how you need to create some effective dashboards. (For more resources related to this topic, see here.) Since that fateful week in Manhattan, I've read Edward Tufte, Stephen Few, and other thought leaders in the data visualization space. This knowledge has been very fruitful. For instance, quite recently a colleague told me that one of his clients thought a particular dashboard had too many bar charts and he wanted some variation. I shared the following two quotes: Show data variation, not design variation. –Edward Tufte in The Visual Display of Quantitative Information Variety might be the spice of life, but, if it is introduced on a dashboard for its own sake, the display suffers. –Stephen Few in Information Dashboard Design Those quotes proved helpful for my colleague. Hopefully the following information will prove helpful to you. Additionally I would also like to draw attention to Alberto Cairo—a relatively new voice providing new insight. Each of these authors should be considered a must-read for anyone working in data visualization. Visualization design theory Dashboard design Sheet selection Visualization design theory Any discussion on designing dashboards should begin with information about constructing well-designed content. The quality of the dashboard layout and the utilization of technical tips and tricks do not matter if the content is subpar. In other words we should consider the worksheets displayed on dashboards and ensure that those worksheets are well-designed. Therefore, our discussion will begin with a consideration of visualization design principles. Regarding these principles, it's tempting to declare a set of rules such as: To plot change over time, use a line graph To show breakdowns of the whole, use a treemap To compare discrete elements, use a bar chart To visualize correlation, use a scatter plot But of course even a cursory review of the preceding list brings to mind many variations and alternatives! Thus, we will consider various rules while always keeping in mind that rules (at least rules such as these) are meant to be broken. Formatting rules The following formatting rules encompass fonts, lines, and bands. Fonts are, of course, an obvious formatting consideration. Lines and bands, however, may not be something you typically think of when formatting, especially when considering formatting from the perspective of Microsoft Word. But if we broaden formatting considerations to think of Adobe Illustrator, InDesign, and other graphic design tools, then lines and bands are certainly considered. This illustrates that data visualization is closely related to graphic design and that formatting considers much more than just textual layout. Rule – keep the font choice simple Typically using one or two fonts on a dashboard is advisable. More fonts can create a confusing environment and interfere with readability. Fonts chosen for titles should be thick and solid while the body fonts should be easy to read. As of Tableau 10.0 choosing appropriate fonts is simple because of the new Tableau Font Family. Go to Format | Font to display the Format Font window to see and choose these new fonts: Assuming your dashboard is primarily intended for the screen, sans serif fonts are best. On the rare occasions a dashboard is primarily intended for print, you may consider serif fonts; particularly if the print resolution is high. Rule – Trend line > Fever line > Reference line > Drop line > Zero line > Grid line The preceding pseudo formula is intended to communicate line visibility. For example, trend line visibility should be greater than fever line visibility. Visibility is usually enhanced by increasing line thickness but may be enhanced via color saturation or by choosing a dotted or dashed line over a solid line. The trend line, if present, is usually the most visible line on the graph. Trend lines are displayed via the Analytics pane and can be adjusted via Format à Lines. The fever line (for example, the line used on a time-series chart) should not be so heavy as to obscure twists and turns in the data. Although a fever line may be displayed as dotted or dashed by utilizing the Pages shelf, this is usually not advisable because it may obscure visibility. The thickness of a fever line can be adjusted by clicking on the Size shelf in the Marks View card. Reference lines are usually less prevalent than either fever or trend lines and can be formatted by going to Format | Reference lines. Drop lines are not frequently used. To deploy drop lines, right-click in a blank portion of the view and go to Drop lines | Show drop lines. Next, click on a point in the view to display a drop line. To format droplines, go to Format | Droplines. Drop lines are relevant only if at least one axis is utilized in the visualization. Zero lines (sometimes referred to as base lines) display only if zero or negative values are included in the view or positive numerical values are relatively close to zero. Format zero lines by going to Format | Lines. Grid lines should be the most muted lines on the view and may be dispensed with altogether. Format grid lines by going to Format | Lines. Rule – band in groups of three to five Visualizations comprised of a tall table of text or horizontal bars should segment dimension members in groups of three to five. Exercise – banding Navigate to https://public.tableau.com/profile/david.baldwin#!/ to locate and download the workbook. Navigate to the worksheet titled Banding. Select the Superstore data source and place Product Name on the Rows shelf. Double-click on Discount, Profit, Quantity, and Sales. Navigate to Format | Shading and set Band Size under Row Banding so that three to five lines of text are encompassed by each band. Be sure to set an appropriate color for both Pane and Header: Note that after completing the preceding five steps, Tableau defaulted to banding every other row. This default formatting is fine for a short table but is quite busy for a tall table. The band in groups of three to five rule is influenced by Dona W. Wong, who, in her book The Wall Street Journal Guide to Information Graphics, recommends separating long tables or bar charts with thin rules to separate the bars in groups of three to five to help the readers read across. Color rules It seems slightly ironic to discuss color rules in a black-and-white publication such as Mastering Tableau. Nonetheless, even in a monochromatic setting, a discussion of color is relevant. For example, exclusive use of black text communicates differently than using variations of gray. The following survey of color rules should be helpful to ensure that you use colors effectively in a variety of settings. Rule – keep colors simple and limited Stick to the basic hues and provide only a few (perhaps three to five) hue variations. Alberto Cairo, in his book The Functional Art: An Introduction to Information Graphics and Visualization, provides insights into why this is important. The limited capacity of our visual working memory helps explain why it's not advisable to use more than four or five colors or pictograms to identify different phenomena on maps and charts. Rule – respect the psychological implication of colors In Western society, there is a color vocabulary so pervasive, it's second nature. Exit signs marking stairwell locations are red. Traffic cones are orange. Baby boys are traditionally dressed in blue while baby girls wear pink. Similarly, in Tableau reds and oranges should usually be associated with negative performance while blues and greens should be associated with positive performance. Using colors counterintuitively can cause confusion. Rule – be colorblind-friendly Colorblindness is usually manifested as an inability to distinguish red and green or blue and yellow. Red/green and blue/yellow are on opposite sides of the color wheel. Consequently, the challenges these color combinations present for colorblind individuals can be easily recreated with image editing software such as Photoshop. If you are not colorblind, convert an image with these color combinations to grayscale and observe. The challenge presented to the 8.0% of the males and 0.5% of the females who are color blind becomes immediately obvious! Rule – use pure colors sparingly The resulting colors from the following exercise should be a very vibrant red, green, and blue. Depending on the monitor, you may even find it difficult to stare directly at the colors. These are known as pure colors and should be used sparingly; perhaps only to highlight particularly important items. Exercise – using pure colors Open the workbook and navigate to the worksheet entitled Pure Colors. Select the Superstore data source and place Category on both the Rows shelf and the Color shelf. Set the Fit to Entire View. Click on the Color shelf and choose Edit Colors…. In the Edit Colors dialog box, double-click on the color icons to the left of each dimension member; that is, Furniture, Office Supplies, and Technology: Within the resulting dialog box, set furniture to an HTML value of #0000ff, Office Supplies to #ff0000, and Technology to #00ff00. Rule – color variations over symbol variation Deciphering different symbols takes more mental energy for the end user than distinguishing color. Therefore color variation should be used over symbol variation. This rule can actually be observed in Tableau defaults. Create a scatter plot and place a dimension with many members on the Color shelf and Shape shelf respectively. Note that by default, the view will display 20 unique colors but only 10 unique shapes. Older versions of Tableau (such as Tableau 9.0) display warnings that include text such as “…the recommended maximum for this shelf is 10”: Visualization type rules We won't spend time here to delve into a lengthy list of visualization type rules. However, it does seem appropriate to review at least a couple of rules. In the following exercise, we will consider keeping shapes simple and effectively using pie charts. Rule – keep shapes simple Too many shape details impede comprehension. This is because shape details draw the user's focus away from the data. Consider the following exercise on using two different shopping cart images. Exercise – shapes Open the workbook associated and navigate to the worksheet entitled Simple Shopping Cart. Note that the visualization is a scatterplot showing the top 10 selling Sub-Categories in terms of total sales and profits. On your computer, navigate to the Shapes directory located in the My Tableau Repository. On my computer, the path is C:UsersDavid BaldwinDocumentsMy Tableau RepositoryShapes. Within the Shapes directory, create a folder named My Shapes. Reference the link included in the comment section of the worksheet to download the assets. In the downloaded material, find the images titled Shopping_Cart and Shopping_Cart_3D and copy those images into the My Shapes directory created previously. Within Tableau, access the Simple Shopping Cart worksheet. Click on the Shape shelf and then select More Shapes. Within the Edit Shape dialog box, click on the Reload Shapes button. Select the My Shapes palette and set the shape to the simple shopping cart. After closing the dialog box, click on the Size shelf and adjust as desired. Also adjust other aspects of the visualization as desired. Navigate to the 3D Shopping Cart worksheet and then repeat steps 8 to 11. Instead of using the simple shopping cart, use the 3D shopping cart: Compare the two visualizations. Which version of the shopping cart is more attractive? Likely the cart with the 3D look was your choice. Why not choose the more attractive image? Making visualizations attractive is only of secondary concern. The primary goal is to display the data as clearly and efficiently as possible. A simple shape is grasped more quickly and intuitively than a complex shape. Besides, the cuteness of the 3D image will quickly wear off. Rule – use pie charts sparingly Edward Tufte makes an acrid (and somewhat humorous) comment against the use of pie charts in his book The Visual Display of Quantitative Information. A table is nearly always better than a dumb pie chart; the only worse design than a pie chart is several of them. Given their low density and failure to order numbers along a visual dimension, pie charts should never be used. The present sentiment in data visualization circles is largely sympathetic to Tufte's criticism. There may, however, be some exceptions; that is, some circumstances where a pie chart is optimal. Consider the following visualization: Which of the four visualizations best demonstrates that A accounts for 25% of the whole? Clearly it is the pie chart! Therefore, perhaps it is fairer to refer to pie charts as limited and to use them sparingly as opposed to considering them inherently evil. Compromises In this section, we will transition from more or less strict rules to compromises. Often, building visualizations is a balancing act. It's common to encounter contradictory directions from books, blogs, consultants, and within organizations. One person may insist on utilizing every pixel of space while another urges simplicity and whitespace. One counsels a guided approach while another recommends building wide open dashboards that allow end users to discover their own path. Avant gardes may crave esoteric visualizations while those of a more conservative bent prefer to stay with the conventional. We now explore a few of the more common competing requests and suggests compromises. Make the dashboard simple versus make the dashboard robust Recently a colleague showed me a complex dashboard he had just completed. Although he was pleased that he had managed to get it working well, he felt the need to apologize by saying, “I know it's dense and complex but it's what the client wanted.” Occam's Razor encourages the simplest possible solution for any problem. For my colleague's dashboard, the simplest solution was rather complex. This is OK! Complexity in Tableau dashboarding need not be shunned. But a clear understanding of some basic guidelines can help the author intelligently determine how to compromise between demands for simplicity and demands for robustness. More frequent data updates necessitate simpler design. Some Tableau dashboards may be near-real-time. Third-party technology may be utilized to force a browser displaying a dashboard via Tableau Server to refresh every few minutes to ensure the absolute latest data displays. In such cases, the design should be quite simple. The end user must be able to see at a glance all pertinent data and should not use that dashboard for extensive analysis. Conversely, a dashboard that is refreshed monthly can support high complexity and thus may be used for deep exploration. Greater end user expertise supports greater dashboard complexity. Know thy users. If they want easy, at-a-glance visualizations, keep the dashboards simple. If they like deep dives, design accordingly. Smaller audiences require more precise design. If only a few people monitor a given dashboard, it may require a highly customized approach. In such cases, specifications may be detailed, complex, and difficult to execute and maintain because the small user base has expectations that may not be natively easy to produce in Tableau. Screen resolution and visualization complexity are proportional. Users with low-resolution devices will need to interact fairly simply with a dashboard. Thus the design of such a dashboard will likely be correspondingly uncomplicated. Conversely, high-resolution devices support greater complexity. Greater distance from the screen requires larger dashboard elements. If the dashboard is designed for conference room viewing, the elements on the dashboard may need to be fairly large to meet the viewing needs of those far from the screen. Thus the dashboard will likely be relatively simple. Conversely, a dashboard to be viewed primarily on end users desktops can be more complex. Although these points are all about simple versus complex, do not equate simple with easy. A simple and elegantly designed dashboard can be more difficult to create than a complex dashboard. In the words of Steve Jobs: Simple can be harder than complex: You have to work hard to get your thinking clean to make it simple. But it's worth it in the end because once you get there, you can move mountains. Present dense information versus present sparse information Normally, a line graph should have a maximum of four to five lines. However, there are times when you may wish to display many lines. A compromise can be achieved by presenting many lines and empowering the end user to highlight as desired. The following line graph displays the percentage of Internet usage by country from 2000 to 2012. Those countries with the largest increases have been highlighted. Assuming that Highlight Selected Items has been activated within the Color legend, the end user can select items (countries in this case) from the legend to highlight as desired. Or, even better, a worksheet can be created listing all countries and used in conjunction with a highlight action on a dashboard to focus attention on selected items on the line graph: Tell a story versus allow a story to be discovered Albert Cairo, in his excellent book The Functional Art: An Introduction to Information Graphics and Visualization, includes a section where he interviews prominent data visualization and information graphics professionals. Two of these interviews are remarkable for their opposing views. I… feel that many visualization designers try to transform the user into an editor.  They create these amazing interactive tools with tons of bubbles, lines, bars, filters, and scrubber bars, and expect readers to figure the story out by themselves, and draw conclusions from the data. That's not an approach to information graphics I like. – Jim Grimwade The most fascinating thing about the rise of data visualization is exactly that anyone can explore all those large data sets without anyone telling us what the key insight is. – Moritz Stefaner Fortunately, the compromise position can be found in the Jim Grimwade interview: [The New York Times presents] complex sets of data, and they let you go really deep into the figures and their connections. But beforehand, they give you some context, some pointers as to what you can do with those data. If you don't do this… you will end up with a visualization that may look really beautiful and intricate, but that will leave readers wondering, What has this thing really told me? What is this useful for? – Jim Grimwade Although the case scenarios considered in the preceding quotes are likely quite different from the Tableau work you are involved in, the underlying principles remain the same. You can choose to tell a story or build a platform that allows the discovery of numerous stories. Your choice will differ depending on the given dataset and audience. If you choose to create a platform for story discovery, be sure to take the New York Times approach suggested by Grimwade. Provide hints, pointers, and good documentation to lead your end user to successfully interact with the story you wish to tell or successfully discover their own story. Document, Document, Document! But don't use any space! Immediately above we considered the suggestion Provide hints, pointers, and good documentation… but there's an issue. These things take space. Dashboard space is precious. Often Tableau authors are asked to squeeze more and more stuff on a dashboard and are hence looking for ways to conserve space. Here are some suggestions for maximizing documentation on a dashboard while minimally impacting screen real estate. Craft titles for clear communication Titles are expected. Not just a title for a dashboard and worksheets on the dashboard, but also titles for legends, filters and other objects. These titles can be used for effective and efficient documentation. For instance a filter should not just read Market. Instead it should say something like Select a Market. Notice the imperative statement. The user is being told to do something and this is a helpful hint. Adding a couple of words to a title will usually not impact dashboard space. Use subtitles to relay instructions A subtitle will take some extra space but it does not have to be much. A small, italicized font immediately underneath a title is an obvious place a user will look at for guidance. Consider an example: red represents loss. This short sentence could be used as a subtitle that may eliminate the need for a legend and thus actually save space. Use intuitive icons Consider a use case of navigating from one dashboard to another. Of course you could associate an action with some hyperlinked text stating Click here to navigate to another dashboard. But this seems quite unnecessary when an action can be associated with a small, innocuous arrow, such as is natively used in PowerPoint, to communicate the same thing. Store more extensive documentation in a tooltip associated with a help icon. A small question mark in the top-right corner of an application is common. This clearly communicates where to go if additional help is required. As shown in the following exercise, it's easy to create a similar feature on a Tableau dashboard. Summary Hence from this article we studied to create some effective dashboards that are very beneficial in corporate world as a statistical tool to calculate average growth in terms of revenue. Resources for Article: Further resources on this subject: Say Hi to Tableau [article] Tableau Data Extract Best Practices [article] Getting Started with Tableau Public [article]
Read more
  • 0
  • 0
  • 3264

article-image-elastic-stack-overview
Packt
10 Jan 2017
9 min read
Save for later

Elastic Stack Overview

Packt
10 Jan 2017
9 min read
In this article by Ravi Kumar Gupta and Yuvraj Gupta, from the book, Mastering Elastic Stack, we will have an overview of Elastic Stack, it's very easy to read a log file of a few MBs or hundreds, so is it to keep data of this size in databases or files and still get sense out of it. But then a day comes when this data takes terabytes and petabytes, and even notepad++ would refuse to open a data file of a few hundred MBs. Then we start to find something for huge log management, or something that can index the data properly and make sense out of it. If you Google this, you would stumble upon ELK Stack. Elasticsearch manages your data, Logstash reads the data from different sources, and Kibana makes a fine visualization of it. Recently, ELK Stack has evolved as Elastic Stack. We will get to know about it in this article. The following are the points that will be covered in this article: Introduction to ELK Stack The birth of Elastic Stack Who uses the Stack (For more resources related to this topic, see here.) Introduction to ELK Stack It all began with Shay Banon, who started it as an open source project, Elasticsearch, successor of Compass, which gained popularity to become one of the top open source database engines. Later, based on the distributed model of working, Kibana was introduced to visualize the data present in Elasticsearch. Earlier, to put data into Elasticsearch we had rivers, which provided us with a specific input via which we inserted data into Elasticsearch. However, with growing popularity this setup required a tool via which we can insert data into Elasticsearch and have flexibility to perform various transformations on data to make unstructured data structured to have full control on how to process the data. Based on this premise, Logstash was born, which was then incorporated into the Stack, and together these three tools, Elasticsearch, Logstash, and Kibana were named ELK Stack. The following diagram is a simple data pipeline using ELK Stack: As we can see from the preceding figure, data is read using Logstash and indexed to Elasticsearch. Later we can use Kibana to read the indices from Elasticsearch and visualize it using charts and lists. Let's understand these components separately and the role they play in the making of the Stack. Logstash As we got to know that rivers were used initially to put data into Elasticsearch before ELK Stack. For ELK Stack, Logstash is the entry point for all types of data. Logstash has so many plugins to read data from a number of sources and so many output plugins to submit data to a variety of destinations and one of those is the Elasticsearch plugin, which helps to send data to Elasticsearch. After Logstash became popular, eventually rivers got deprecated as they made the cluster unstable and also performance issues were observed. Logstash does not ship data from one end to another; it helps us with collecting raw data and modifying/filtering it to convert it to something meaningful, formatted, and organized. The updated data is then sent to Elasticsearch. If there is no plugin available to support reading data from a specific source, or writing the data to a location, or modifying it in your way, Logstash is flexible enough to allow you to write your own plugins. Simply put, Logstash is open source, highly flexible, rich with plugins, can read your data from your choice of location, normalizes it as per your defined configurations, and sends it to a particular destination as per the requirements. Elasticsearch All of the data read by Logstash is sent to Elasticsearch for indexing. There is a lot more than just indexing. Elasticsearch is not only used to index data, but it is a full-text search engine, highly scalable, distributed, and offers many more things. Elasticsearch manages and maintains your data in the form of indices, offers you to query, access, and aggregate the data using its APIs. Elasticsearch is based on Lucene, thus providing you all of the features that Lucene does. Kibana Kibana uses Elasticsearch APIs to read/query data from Elasticsearch indices to visualize and analyze in the form of charts, graphs and tables. Kibana is in the form of a web application, providing you a highly configurable user interface that lets you query the data, create a number of charts to visualize, and make actual sense out of the data stored. After a robust ELK Stack, as time passed, a few important and complex demands took place, such as authentication, security, notifications, and so on. This demand led for few other tools such as Watcher (providing alerting and notification based on changes in data), Shield (authentication and authorization for securing clusters), Marvel (monitoring statistics of the cluster), ES-Hadoop, Curator, and Graph as requirement arose. The birth of Elastic Stack All the jobs of reading data were done using Logstash, but that's resource consuming. Since Logstash runs on JVM, it consumes a good amount of memory. The community realized the need of improvement and to make the pipelining process resource friendly and lightweight. Back in 2015, Packetbeat was born, a project which was an effort to make a network packet analyzer that could read from different protocols, parse the data, and ship to Elasticsearch. Being lightweight in nature did the trick and a new concept of Beats was formed. Beats are written in Go programming language. The project evolved a lot, and now ELK stack was no more than just Elasticsearch, Logstash, and Kibana, but Beats also became a significant component. The pipeline now looked as follows: Beat A Beat reads data, parses it, and can ship to either Elasticsearch or Logstash. The difference is that they are lightweight, serve a specific purpose, and are installed as agents. There are few beats available such as Topbeat, Filebeat, Packetbeat, and so on, which are supported and provided by the Elastic.co and a good number of Beats already written by the community. If you have a specific requirement, you can write your own Beat using the libbeat library. In simple words, Beats can be treated as very light weight agents to ship data to either Logstash or Elasticsearch and offer you an infrastructure using the libbeat library to create your own Beats. Together Elasticsearch, Logstash, Kibana, and Beats become Elastic Stack, formally known as ELK Stack. Elastic Stack did not just add Beats to its team, but they will be using the same version always. The starting version of the Elastic Stack will be 5.0.0 and the same version will apply to all the components. This version and release method is not only for Elastic Stack, but for other tools of the Elastic family as well. Due to so many tools, there was a problem of unification, wherein each tool had their own version and every version was not compatible with each other, hence leading to a problem. To solve this, now all of the tools will be built, tested, and released together. All of these components play a significant role in creating a pipeline. While Beats and Logstash are used to collect the data, parse it, and ship it, Elasticsearch creates indices, which is finally used by Kibana to make visualizations. While Elastic Stack helps with a pipeline, other tools add security, notifications, monitoring, and other such capabilities to the setup. Who uses Elastic Stack? In the past few years, implementations of Elastic Stack have been increasing very rapidly. In this section, we will consider a few case studies to understand how Elastic Stack has helped this development. Salesforce Salesforce developed a new plugin named ELF (Event Log Files) to collect Salesforce logged data to enable auditing of user activities. The purpose was to analyze the data to understand user behavior and trends in Salesforce. The plugin is available on GitHub at https://github.com/developerforce/elf_elk_docker. This plugin simplifies the Stack configuration and allows us to download ELF to get indexed and finally sensible data that can be visualized using Kibana. This implementation utilizes Elasticsearch, Logstash, and Kibana. CERN There is not just one use case that Elastic Stack helped CERN (European Organization for Nuclear Research), but five. At CERN, Elastic Stack is used for the following: Messaging Data monitoring Cloud benchmarking Infrastructure monitoring Job monitoring Multiple Kibana dashboards are used by CERN for a number of visualizations. Green Man Gaming This is an online gaming platform where game providers publish their games. The website wanted to make a difference by proving better gameplay. They started using Elastic Stack to do log analysis, search, and analysis of gameplay data. They began with setting up Kibana dashboards to gain insights about the counts of gamers by the country and currency used by gamers. That helped them to understand and streamline the support and help in order to provide an improved response. Apart from these case studies, Elastic Stack is used by a number of other companies to gain insights of the data they own. Sometimes, not all of the components are used, that is, not all of the times a Beat would be used and Logstash would be configured. Sometimes, only an Elasticsearch and Kibana combination is used. If we look at the users within the organization, all of the titles who are expected to do big data analysis, business intelligence, data visualizations, log analysis, and so on, can utilize Elastic Stack for their technical forte. A few of these titles are data scientists, DevOps, and so on. Stack competitors Well, it would be wrong to call for Elastic Stack competitors because Elastic Stack has been emerged as a strong competitor to many other tools in the market in recent years and is growing rapidly. A few of these are: Open source: Graylog: Visit https://www.graylog.org/ for more information InfluxDB: Visit https://influxdata.com/ for more information Others: Logscape: Visit http://logscape.com/ for more information Logscene: Visit http://sematext.com/logsene/ for more information Splunk: Visit http://www.splunk.com/ for more information Sumo Logic: Visit https://www.sumologic.com/ for more information Kibana competitors: Grafana: Visit http://grafana.org/ for more information Graphite: Visit https://graphiteapp.org/ for more information Elasticsearch competitors: Lucene/Solr: Visit http://lucene.apache.org/solr/ or https://lucene.apache.org/ for more information Sphinx: Visit http://sphinxsearch.com/ for more information Most of these compare with respect to log management, while Elastic Stack is much more than that. It offers you the ability to analyze any type of data and not just logs. Resources for Article: Further resources on this subject: AIO setup of OpenStack – preparing the infrastructure code environment [article] Our App and Tool Stack [article] Using the OpenStack Dashboard [article]
Read more
  • 0
  • 0
  • 3325

article-image-deep-learning-and-regression-analysis
Packt
09 Jan 2017
6 min read
Save for later

Deep learning and regression analysis

Packt
09 Jan 2017
6 min read
In this article by Richard M. Reese and Jennifer L. Reese, authors of the book, Java for Data Science, We will discuss neural networks can be used to perform regression analysis. However, other techniques may offer a more effective solution. With regression analysis, we want to predict a result based on several input variables (For more resources related to this topic, see here.) We can perform regression analysis using an output layer that consists of a single neuron that sums the weighted input plus bias of the previous hidden layer. Thus, the result is a single value representing the regression. Preparing the data We will use a car evaluation database to demonstrate how to predict the acceptability of a car based on a series of attributes. The file containing the data we will be using can be downloaded from: http://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data. It consists of car data such as price, number of passengers, and safety information, and an assessment of its overall quality. It is this latter element that we will try to predict. The comma-delimited values in each attribute are shown next, along with substitutions. The substitutions are needed because the model expects numeric data: Attribute Original value Substituted value Buying price vhigh, high, med, low 3,2,1,0 Maintenance price vhigh, high, med, low 3,2,1,0 Number of doors 2, 3, 4, 5-more 2,3,4,5 Seating 2, 4, more 2,4,5 Cargo space small, med, big 0,1,2 Safety low, med, high 0,1,2 There are 1,728 instances in the file. The cars are marked with four classes: Class Number of instances Percentage of instances Original value Substituted value Unacceptable 1210 70.023% unacc 0 Acceptable 384 22.222% acc 1 Good 69 3.99% good 2 Very good 65 3.76% v-good 3 Setting up the class We start with the definition of a CarRegressionExample class, as shown next: public class CarRegressionExample { public CarRegressionExample() { try { ... } catch (IOException | InterruptedException ex) { // Handle exceptions } } public static void main(String[] args) { new CarRegressionExample(); } } Reading and preparing the data The first task is to read in the data. We will use the CSVRecordReader class to get the data: RecordReader recordReader = new CSVRecordReader(0, ","); recordReader.initialize(new FileSplit(new File("car.txt"))); DataSetIterator iterator = new RecordReaderDataSetIterator(recordReader, 1728, 6, 4); With this dataset, we will split the data into two sets. Sixty five percent of the data is used for training and the rest for testing: DataSet dataset = iterator.next(); dataset.shuffle(); SplitTestAndTrain testAndTrain = dataset.splitTestAndTrain(0.65); DataSet trainingData = testAndTrain.getTrain(); DataSet testData = testAndTrain.getTest(); The data now needs to be normalized: DataNormalization normalizer = new NormalizerStandardize(); normalizer.fit(trainingData); normalizer.transform(trainingData); normalizer.transform(testData); We are now ready to build the model. Building the model A MultiLayerConfiguration instance is created using a series of NeuralNetConfiguration.Builder methods. The following is the dice used. We will discuss the individual methods following the code. Note that this configuration uses two layers. The last layer uses the softmax activation function, which is used for regression analysis: MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .iterations(1000) .activation("relu") .weightInit(WeightInit.XAVIER) .learningRate(0.4) .list() .layer(0, new DenseLayer.Builder() .nIn(6).nOut(3) .build()) .layer(1, new OutputLayer .Builder(LossFunctions.LossFunction .NEGATIVELOGLIKELIHOOD) .activation("softmax") .nIn(3).nOut(4).build()) .backprop(true).pretrain(false) .build(); Two layers are created. The first is the input layer. The DenseLayer.Builder class is used to create this layer. The DenseLayer class is a feed-forward and fully connected layer. The created layer uses the six car attributes as input. The output consists of three neurons that are fed into the output layer and is duplicated here for your convenience: .layer(0, new DenseLayer.Builder() .nIn(6).nOut(3) .build()) The second layer is the output layer created with the OutputLayer.Builder class. It uses a loss function as the argument of its constructor. The softmax activation function is used since we are performing regression as shown here: .layer(1, new OutputLayer .Builder(LossFunctions.LossFunction .NEGATIVELOGLIKELIHOOD) .activation("softmax") .nIn(3).nOut(4).build()) Next, a MultiLayerNetwork instance is created using the configuration. The model is initialized, its listeners are set, and then the fit method is invoked to perform the actual training. The ScoreIterationListener instance will display information as the model trains which we will see shortly in the output of this example. Its constructor argument specifies the frequency that information is displayed: MultiLayerNetwork model = new MultiLayerNetwork(conf); model.init(); model.setListeners(new ScoreIterationListener(100)); model.fit(trainingData); We are now ready to evaluate the model. Evaluating the model In the next sequence of code, we evaluate the model against the training dataset. An Evaluation instance is created using an argument specifying that there are four classes. The test data is fed into the model using the output method. The eval method takes the output of the model and compares it against the test data classes to generate statistics. The getLabels method returns the expected values: Evaluation evaluation = new Evaluation(4); INDArray output = model.output(testData.getFeatureMatrix()); evaluation.eval(testData.getLabels(), output); out.println(evaluation.stats()); The output of the training follows, which is produced by the ScoreIterationListener class. However, the values you get may differ due to how the data is selected and analyzed. Notice that the score improves with the iterations but levels out after about 500 iterations: 12:43:35.685 [main] INFO o.d.o.l.ScoreIterationListener - Score at iteration 0 is 1.443480901811554 12:43:36.094 [main] INFO o.d.o.l.ScoreIterationListener - Score at iteration 100 is 0.3259061845624861 12:43:36.390 [main] INFO o.d.o.l.ScoreIterationListener - Score at iteration 200 is 0.2630572026049783 12:43:36.676 [main] INFO o.d.o.l.ScoreIterationListener - Score at iteration 300 is 0.24061281470878784 12:43:36.977 [main] INFO o.d.o.l.ScoreIterationListener - Score at iteration 400 is 0.22955121170274934 12:43:37.292 [main] INFO o.d.o.l.ScoreIterationListener - Score at iteration 500 is 0.22249920540161677 12:43:37.575 [main] INFO o.d.o.l.ScoreIterationListener - Score at iteration 600 is 0.2169898450109222 12:43:37.872 [main] INFO o.d.o.l.ScoreIterationListener - Score at iteration 700 is 0.21271599814600958 12:43:38.161 [main] INFO o.d.o.l.ScoreIterationListener - Score at iteration 800 is 0.2075677126088741 12:43:38.451 [main] INFO o.d.o.l.ScoreIterationListener - Score at iteration 900 is 0.20047317735870715 This is followed by the results of the stats method as shown next. The first part reports on how examples are classified and the second part displays various statistics: Examples labeled as 0 classified by model as 0: 397 times Examples labeled as 0 classified by model as 1: 10 times Examples labeled as 0 classified by model as 2: 1 times Examples labeled as 1 classified by model as 0: 8 times Examples labeled as 1 classified by model as 1: 113 times Examples labeled as 1 classified by model as 2: 1 times Examples labeled as 1 classified by model as 3: 1 times Examples labeled as 2 classified by model as 1: 7 times Examples labeled as 2 classified by model as 2: 21 times Examples labeled as 2 classified by model as 3: 14 times Examples labeled as 3 classified by model as 1: 2 times Examples labeled as 3 classified by model as 3: 30 times ==========================Scores======================================== Accuracy: 0.9273 Precision: 0.854 Recall: 0.8323 F1 Score: 0.843 ======================================================================== The regression model does a reasonable job with this dataset. Summary In this article, we examined deep learning and regression analysis. We showed how to prepare the data and class, build the model, and evaluate the model. We used sample data and displayed output statistics to demonstrate the relative effectiveness of our model. Resources for Article: Further resources on this subject: KnockoutJS Templates [article] The Heart of It All [article] Bringing DevOps to Network Operations [article]
Read more
  • 0
  • 0
  • 5104

article-image-exploring-structure-motion-using-opencv
Packt
09 Jan 2017
20 min read
Save for later

Exploring Structure from Motion Using OpenCV

Packt
09 Jan 2017
20 min read
In this article by Roy Shilkrot, coauthor of the book Mastering OpenCV 3, we will discuss the notion of Structure from Motion (SfM), or better put, extracting geometric structures from images taken with a camera under motion, using OpenCV's API to help us. First, let's constrain the otherwise very broad approach to SfM using a single camera, usually called a monocular approach, and a discrete and sparse set of frames rather than a continuous video stream. These two constrains will greatly simplify the system we will sketch out in the coming pages and help us understand the fundamentals of any SfM method. In this article, we will cover the following: Structure from Motion concepts Estimating the camera motion from a pair of images (For more resources related to this topic, see here.) Throughout the article, we assume the use of a calibrated camera—one that was calibrated beforehand. Calibration is a ubiquitous operation in computer vision, fully supported in OpenCV using command-line tools. We, therefore, assume the existence of the camera's intrinsic parameters embodied in the K matrix and the distortion coefficients vector—the outputs from the calibration process. To make things clear in terms of language, from this point on, we will refer to a camera as a single view of the scene rather than to the optics and hardware taking the image. A camera has a position in space and a direction of view. Between two cameras, there is a translation element (movement through space) and a rotation of the direction of view. We will also unify the terms for the point in the scene, world, real, or 3D to be the same thing, a point that exists in our real world. The same goes for points in the image or 2D, which are points in the image coordinates, of some real 3D point that was projected on the camera sensor at that location and time. Structure from Motion concepts The first discrimination we should make is the difference between stereo (or indeed any multiview), 3D reconstruction using calibrated rigs, and SfM. A rig of two or more cameras assumes we already know what the "motion" between the cameras is, while in SfM, we don't know what this motion is and we wish to find it. Calibrated rigs, from a simplistic point of view, allow a much more accurate reconstruction of 3D geometry because there is no error in estimating the distance and rotation between the cameras—it is already known. The first step in implementing an SfM system is finding the motion between the cameras. OpenCV may help us in a number of ways to obtain this motion, specifically using the findFundamentalMat and findEssentialMat functions. Let's think for one moment of the goal behind choosing an SfM algorithm. In most cases, we wish to obtain the geometry of the scene, for example, where objects are in relation to the camera and what their form is. Having found the motion between the cameras picturing the same scene, from a reasonably similar point of view, we would now like to reconstruct the geometry. In computer vision jargon, this is known as triangulation, and there are plenty of ways to go about it. It may be done by way of ray intersection, where we construct two rays: one from each camera's center of projection and a point on each of the image planes. The intersection of these rays in space will, ideally, intersect at one 3D point in the real world that was imaged in each camera, as shown in the following diagram: In reality, ray intersection is highly unreliable. This is because the rays usually do not intersect, making us fall back to using the middle point on the shortest segment connecting the two rays. OpenCV contains a simple API for a more accurate form of triangulation, the triangulatePoints function, so this part we do not need to code on our own. After you have learned how to recover 3D geometry from two views, we will see how you can incorporate more views of the same scene to get an even richer reconstruction. At that point, most SfM methods try to optimize the bundle of estimated positions of our cameras and 3D points by means of Bundle Adjustment. OpenCV contains means for Bundle Adjustment in its new Image Stitching Toolbox. However, the beauty of working with OpenCV and C++ is the abundance of external tools that can be easily integrated into the pipeline. We will, therefore, see how to integrate an external bundle adjuster, the Ceres non-linear optimization package. Now that we have sketched an outline of our approach to SfM using OpenCV, we will see how each element can be implemented. Estimating the camera motion from a pair of images Before we set out to actually find the motion between two cameras, let's examine the inputs and the tools we have at hand to perform this operation. First, we have two images of the same scene from (hopefully not extremely) different positions in space. This is a powerful asset, and we will make sure that we use it. As for tools, we should take a look at mathematical objects that impose constraints over our images, cameras, and the scene. Two very useful mathematical objects are the fundamental matrix (denoted by F) and the essential matrix (denoted by E). They are mostly similar, except that the essential matrix is assuming usage of calibrated cameras; this is the case for us, so we will choose it. OpenCV allows us to find the fundamental matrix via the findFundamentalMat function and the essential matrix via the findEssentialMatrix function. Finding the essential matrix can be done as follows: Mat E = findEssentialMat(leftPoints, rightPoints, focal, pp); This function makes use of matching points in the "left" image, leftPoints, and "right" image, rightPoints, which we will discuss shortly, as well as two additional pieces of information from the camera's calibration: the focal length, focal, and principal point, pp. The essential matrix, E, is a 3 x 3 matrix, which imposes the following constraint on a point in one image and a point in the other image: x'K­TEKx = 0, where x is a point in the first image one, x' is the corresponding point in the second image, and K is the calibration matrix. This is extremely useful, as we are about to see. Another important fact we use is that the essential matrix is all we need in order to recover the two cameras' positions from our images, although only up to an arbitrary unit of scale. So, if we obtain the essential matrix, we know where each camera is positioned in space and where it is looking. We can easily calculate the matrix if we have enough of those constraint equations, simply because each equation can be used to solve for a small part of the matrix. In fact, OpenCV internally calculates it using just five point-pairs, but through the Random Sample Consensus algorithm (RANSAC), many more pairs can be used and make for a more robust solution. Point matching using rich feature descriptors Now we will make use of our constraint equations to calculate the essential matrix. To get our constraints, remember that for each point in image A, we must find a corresponding point in image B. We can achieve such a matching using OpenCV's extensive 2D feature-matching framework, which has greatly matured in the past few years. Feature extraction and descriptor matching is an essential process in computer vision and is used in many methods to perform all sorts of operations, for example, detecting the position and orientation of an object in the image or searching a big database of images for similar images through a given query. In essence, feature extraction means selecting points in the image that would make for good features and computing a descriptor for them. A descriptor is a vector of numbers that describes the surrounding environment around a feature point in an image. Different methods have different lengths and data types for their descriptor vectors. Descriptor Matching is the process of finding a corresponding feature from one set in another using its descriptor. OpenCV provides very easy and powerful methods to support feature extraction and matching. Let's examine a very simple feature extraction and matching scheme: vector<KeyPoint> keypts1, keypts2; Mat desc1, desc2; // detect keypoints and extract ORB descriptors Ptr<Feature2D> orb = ORB::create(2000); orb->detectAndCompute(img1, noArray(), keypts1, desc1); orb->detectAndCompute(img2, noArray(), keypts2, desc2); // matching descriptors Ptr<DescriptorMatcher> matcher = DescriptorMatcher::create("BruteForce-Hamming"); vector<DMatch> matches; matcher->match(desc1, desc2, matches); You may have already seen similar OpenCV code, but let's review it quickly. Our goal is to obtain three elements: feature points for two images, descriptors for them, and a matching between the two sets of features. OpenCV provides a range of feature detectors, descriptor extractors, and matchers. In this simple example, we use the ORB class to get both the 2D location of Oriented BRIEF (ORB) (where Binary Robust Independent Elementary Features (BRIEF)) feature points and their respective descriptors. We use a brute-force binary matcher to get the matching, which is the most straightforward way to match two feature sets by comparing each feature in the first set to each feature in the second set (hence the phrasing "brute-force"). In the following image, we will see a matching of feature points on two images from the Fountain-P11 sequence found at http://cvlab.epfl.ch/~strecha/multiview/denseMVS.html: Practically, raw matching like we just performed is good only up to a certain level, and many matches are probably erroneous. For that reason, most SfM methods perform some form of filtering on the matches to ensure correctness and reduce errors. One form of filtering, which is built into OpenCV's brute-force matcher, is cross-check filtering. That is, a match is considered true if a feature of the first image matches a feature of the second image, and the reverse check also matches the feature of the second image with the feature of the first image. Another common filtering mechanism, used in the provided code, is to filter based on the fact that the two images are of the same scene and have a certain stereo-view relationship between them. In practice, the filter tries to robustly calculate the fundamental or essential matrix and retain those feature pairs that correspond to this calculation with small errors. An alternative to using rich features, such as ORB, is to use optical flow. The following information box provides a short overview of optical flow. It is possible to use optical flow instead of descriptor matching to find the required point matching between two images, while the rest of the SfM pipeline remains the same. OpenCV recently extended its API for getting the flow field from two images, and now it is faster and more powerful. Optical flow It is the process of matching selected points from one image to another, assuming that both images are part of a sequence and relatively close to one another. Most optical flow methods compare a small region, known as the search window or patch, around each point from image A to the same area in image B. Following a very common rule in computer vision, called the brightness constancy constraint (and other names), the small patches of the image will not change drastically from one image to other, and therefore the magnitude of their subtraction should be close to zero. In addition to matching patches, newer methods of optical flow use a number of additional methods to get better results. One is using image pyramids, which are smaller and smaller resized versions of the image, which allow for working from coarse to-fine—a very well-used trick in computer vision. Another method is to define global constraints on the flow field, assuming that the points close to each other move together in the same direction. Finding camera matrices Now that we have obtained matches between keypoints, we can calculate the essential matrix. However, we must first align our matching points into two arrays, where an index in one array corresponds to the same index in the other. This is required by the findEssentialMat function as we've seen in the Estimating Camera Motion section. We would also need to convert the KeyPoint structure to a Point2f structure. We must pay special attention to the queryIdx and trainIdx member variables of DMatch, the OpenCV struct that holds a match between two keypoints, as they must align with the way we used the DescriptorMatcher::match() function. The following code section shows how to align a matching into two corresponding sets of 2D points, and how these can be used to find the essential matrix: vector<KeyPoint> leftKpts, rightKpts; // ... obtain keypoints using a feature extractor vector<DMatch> matches; // ... obtain matches using a descriptor matcher //align left and right point sets vector<Point2f> leftPts, rightPts; for (size_t i = 0; i < matches.size(); i++) { // queryIdx is the "left" image leftPts.push_back(leftKpts[matches[i].queryIdx].pt); // trainIdx is the "right" image rightPts.push_back(rightKpts[matches[i].trainIdx].pt); } //robustly find the Essential Matrix Mat status; Mat E = findEssentialMat( leftPts, //points from left image rightPts, //points from right image focal, //camera focal length factor pp, //camera principal point cv::RANSAC, //use RANSAC for a robust solution 0.999, //desired solution confidence level 1.0, //point-to-epipolar-line threshold status); //binary vector for inliers We may later use the status binary vector to prune those points that align with the recovered essential matrix. Refer to the following image for an illustration of point matching after pruning. The red arrows mark feature matches that were removed in the process of finding the matrix, and the green arrows are feature matches that were kept: Now we are ready to find the camera matrices; however, the new OpenCV 3 API makes things very easy for us by introducing the recoverPose function. First, we will briefly examine the structure of the camera matrix we will use: This is the model for our camera; it consists of two elements, rotation (denoted as R) and translation (denoted as t). The interesting thing about it is that it holds a very essential equation: x = PX, where x is a 2D point on the image and X is a 3D point in space. There is more to it, but this matrix gives us a very important relationship between the image points and the scene points. So, now that we have a motivation for finding the camera matrices, we will see how it can be done. The following code section shows how to decompose the essential matrix into the rotation and translation elements: Mat E; // ... find the essential matrix Mat R, t; //placeholders for rotation and translation //Find Pright camera matrix from the essential matrix //Cheirality check is performed internally. recoverPose(E, leftPts, rightPts, R, t, focal, pp, mask); Very simple. Without going too deep into mathematical interpretation, this conversion of the essential matrix to rotation and translation is possible because the essential matrix was originally composed by these two elements. Strictly for satisfying our curiosity, we can look at the following equation for the essential matrix, which appears in the literature:. We see that it is composed of (some form of) a translation element, t, and a rotational element, R. Note that a cheirality check is internally performed in the recoverPose function. The cheirality check makes sure that all triangulated 3D points are in front of the reconstructed camera. Camera matrix recovery from the essential matrix has in fact four possible solutions, but the only correct solution is the one that will produce triangulated points in front of the camera, hence the need for a cheirality check. Note that what we just did only gives us one camera matrix, and for triangulation, we require two camera matrices. This operation assumes that one camera matrix is fixed and canonical (no rotation and no translation): The other camera that we recovered from the essential matrix has moved and rotated in relation to the fixed one. This also means that any of the 3D points that we recover from these two camera matrices will have the first camera at the world origin point (0, 0, 0). One more thing we can think of adding to our method is error checking. Many times, the calculation of an essential matrix from point matching is erroneous, and this affects the resulting camera matrices. Continuing to triangulate with faulty camera matrices is pointless. We can install a check to see if the rotation element is a valid rotation matrix. Keeping in mind that rotation matrices must have a determinant of 1 (or -1), we can simply do the following: bool CheckCoherentRotation(const cv::Mat_<double>& R) { if (fabsf(determinant(R)) - 1.0 > 1e-07) { cerr << "rotation matrix is invalid" << endl; return false; } return true; } We can now see how all these elements combine into a function that recovers the P matrices. First, we will introduce some convenience data structures and type short hands: typedef std::vector<cv::KeyPoint> Keypoints; typedef std::vector<cv::Point2f> Points2f; typedef std::vector<cv::Point3f> Points3f; typedef std::vector<cv::DMatch> Matching; struct Features { //2D features Keypoints keyPoints; Points2f points; cv::Mat descriptors; }; struct Intrinsics { //camera intrinsic parameters cv::Mat K; cv::Mat Kinv; cv::Mat distortion; }; Now, we can write the camera matrix finding function: void findCameraMatricesFromMatch( constIntrinsics& intrin, constMatching& matches, constFeatures& featuresLeft, constFeatures& featuresRight, cv::Matx34f& Pleft, cv::Matx34f& Pright) { { //Note: assuming fx = fy const double focal = intrin.K.at<float>(0, 0); const cv::Point2d pp(intrin.K.at<float>(0, 2), intrin.K.at<float>(1, 2)); //align left and right point sets using the matching Features left; Features right; GetAlignedPointsFromMatch( featuresLeft, featuresRight, matches, left, right); //find essential matrix Mat E, mask; E = findEssentialMat( left.points, right.points, focal, pp, RANSAC, 0.999, 1.0, mask); Mat_<double> R, t; //Find Pright camera matrix from the essential matrix recoverPose(E, left.points, right.points, R, t, focal, pp, mask); Pleft = Matx34f::eye(); Pright = Matx34f(R(0,0), R(0,1), R(0,2), t(0), R(1,0), R(1,1), R(1,2), t(1), R(2,0), R(2,1), R(2,2), t(2)); } At this point, we have the two cameras that we need in order to reconstruct the scene. The canonical first camera, in the Pleft variable, and the second camera we calculated, form the essential matrix in the Pright variable. Choosing the image pair to use first Given we have more than just two image views of the scene, we must choose which two views we will start the reconstruction from. In their paper, Snavely et al. suggest that we pick the two views that have the least number of homography inliers. A homography is a relationship between two images or sets of points that lie on a plane; the homography matrix defines the transformation from one plane to another. In case of an image or a set of 2D points, the homography matrix is of size 3 x 3. When Snavely et al. look for the lowest inlier ratio, they essentially suggest to calculate the homography matrix between all pairs of images and pick the pair whose points mostly do not correspond with the homography matrix. This means the geometry of the scene in these two views is not planar or at least not the same plane in both views, which helps when doing 3D reconstruction. For reconstruction, it is best to look at a complex scene with non-planar geometry, with things closer and farther away from the camera. The following code snippet shows how to use OpenCV's findHomography function to count the number of inliers between two views whose features were already extracted and matched: int findHomographyInliers( const Features& left, const Features& right, const Matching& matches) { //Get aligned feature vectors Features alignedLeft; Features alignedRight; GetAlignedPointsFromMatch(left, right, matches, alignedLeft, alignedRight); //Calculate homography with at least 4 points Mat inlierMask; Mat homography; if(matches.size() >= 4) { homography = findHomography(alignedLeft.points, alignedRight.points, cv::RANSAC, RANSAC_THRESHOLD, inlierMask); } if(matches.size() < 4 or homography.empty()) { return 0; } return countNonZero(inlierMask); } The next step is to perform this operation on all pairs of image views in our bundle and sort them based on the ratio of homography inliers to outliers: //sort pairwise matches to find the lowest Homography inliers map<float, ImagePair> pairInliersCt; const size_t numImages = mImages.size(); //scan all possible image pairs (symmetric) for (size_t i = 0; i < numImages - 1; i++) { for (size_t j = i + 1; j < numImages; j++) { if (mFeatureMatchMatrix[i][j].size() < MIN_POINT_CT) { //Not enough points in matching pairInliersCt[1.0] = {i, j}; continue; } //Find number of homography inliers const int numInliers = findHomographyInliers( mImageFeatures[i], mImageFeatures[j], mFeatureMatchMatrix[i][j]); const float inliersRatio = (float)numInliers / (float)(mFeatureMatchMatrix[i][j].size()); pairInliersCt[inliersRatio] = {i, j}; } } Note that the std::map<float, ImagePair> will internally sort the pairs based on the map's key: the inliers ratio. We then simply need to traverse this map from the beginning to find the image pair with least inlier ratio, and if that pair cannot be used, we can easily skip ahead to the next pair. Summary In this article, we saw how OpenCV v3 can help us approach Structure from Motion in a manner that is both simple to code and to understand. OpenCV v3's new API contains a number of useful functions and data structures that make our lives easier and also assist in a cleaner implementation. However, the state-of-the-art SfM methods are far more complex. There are many issues we choose to disregard in favor of simplicity, and plenty more error examinations that are usually in place. Our chosen methods for the different elements of SfM can also be revisited. Some methods even use the N-view triangulation once they understand the relationship between the features in multiple images. If we would like to extend and deepen our familiarity with SfM, we will certainly benefit from looking at other open source SfM libraries. One particularly interesting project is libMV, which implements a vast array of SfM elements that may be interchanged to get the best results. There is a great body of work from University of Washington that provides tools for many flavors of SfM (Bundler and VisualSfM). This work inspired an online product from Microsoft, called PhotoSynth, and 123D Catch from Adobe. There are many more implementations of SfM readily available online, and one must only search to find quite a lot of them. Resources for Article: Further resources on this subject: Basics of Image Histograms in OpenCV [article] OpenCV: Image Processing using Morphological Filters [article] Face Detection and Tracking Using ROS, Open-CV and Dynamixel Servos [article]
Read more
  • 0
  • 1
  • 57530

article-image-tensorflow
Packt
04 Jan 2017
17 min read
Save for later

TensorFlow

Packt
04 Jan 2017
17 min read
In this article by Nicholas McClure, the author of the book TensorFlow Machine Learning Cookbook, we will cover basic recipes in order to understand how TensorFlow works and how to access data for this book and additional resources: How TensorFlow works Declaring tensors Using placeholders and variables Working with matrices Declaring operations (For more resources related to this topic, see here.) Introduction Google's TensorFlow engine has a unique way of solving problems. This unique way allows us to solve machine learning problems very efficiently. We will cover the basic steps to understand how TensorFlow operates. This understanding is essential in understanding recipes for the rest of this book. How TensorFlow works At first, computation in TensorFlow may seem needlessly complicated. But there is a reason for it: because of how TensorFlow treats computation, developing more complicated algorithms is relatively easy. This recipe will talk you through the pseudo code of how a TensorFlow algorithm usually works. Getting ready Currently, TensorFlow is only supported on Mac and Linux distributions. Using TensorFlow on Windows requires the usage of a virtual machine. Throughout this book we will only concern ourselves with the Python library wrapper of TensorFlow. This book will use Python 3.4+ (https://www.python.org) and TensorFlow 0.7 (https://www.tensorflow.org). While TensorFlow can run on the CPU, it runs faster if it runs on the GPU, and it is supported on graphics cards with NVidia Compute Capability 3.0+. To run on a GPU, you will also need to download and install the NVidia Cuda Toolkit (https://developer.nvidia.com/cuda-downloads). Some of the recipes will rely on a current installation of the Python packages Scipy, Numpy, and Scikit-Learn as well. How to do it… Here we will introduce the general flow of TensorFlow algorithms. Most recipes will follow this outline: Import or generate data: All of our machine-learning algorithms will depend on data. In this book we will either generate data or use an outside source of data. Sometimes it is better to rely on generated data because we will want to know the expected outcome. Transform and normalize data: The data is usually not in the correct dimension or type that our TensorFlow algorithms expect. We will have to transform our data before we can use it. Most algorithms also expect normalized data and we will do this here as well. TensorFlow has built in functions that can normalize the data for you as follows: data = tf.nn.batch_norm_with_global_normalization(...) Set algorithm parameters: Our algorithms usually have a set of parameters that we hold constant throughout the procedure. For example, this can be the number of iterations, the learning rate, or other fixed parameters of our choosing. It is considered good form to initialize these together so the reader or user can easily find them, as follows: learning_rate = 0.01 iterations = 1000 Initialize variables and placeholders: TensorFlow depends on us telling it what it can and cannot modify. TensorFlow will modify the variables during optimization to minimize a loss function. To accomplish this, we feed in data through placeholders. We need to initialize both of these, variables and placeholders with size and type, so that TensorFlow knows what to expect. See the following: code: a_var = tf.constant(42) x_input = tf.placeholder(tf.float32, [None, input_size]) y_input = tf.placeholder(tf.fload32, [None, num_classes]) Define the model structure: After we have the data, and have initialized our variables and placeholders, we have to define the model. This is done by building a computational graph. We tell TensorFlow what operations must be done on the variables and placeholders to arrive at our model predictions: y_pred = tf.add(tf.mul(x_input, weight_matrix), b_matrix) Declare the loss functions: After defining the model, we must be able to evaluate the output. This is where we declare the loss function. The loss function is very important as it tells us how far off our predictions are from the actual values: loss = tf.reduce_mean(tf.square(y_actual – y_pred)) Initialize and train the model: Now that we have everything in place, we need to create an instance for our graph, feed in the data through the placeholders and let TensorFlow change the variables to better predict our training data. Here is one way to initialize the computational graph: with tf.Session(graph=graph) as session: ... session.run(...) ... Note that we can also initiate our graph with: session = tf.Session(graph=graph) session.run(…) (Optional) Evaluate the model: Once we have built and trained the model, we should evaluate the model by looking at how well it does with new data through some specified criteria. (Optional) Predict new outcomes: It is also important to know how to make predictions on new, unseen, data. We can do this with all of our models, once we have them trained. How it works… In TensorFlow, we have to setup the data, variables, placeholders, and model before we tell the program to train and change the variables to improve the predictions. TensorFlow accomplishes this through the computational graph. We tell it to minimize a loss function and TensorFlow does this by modifying the variables in the model. TensorFlow knows how to modify the variables because it keeps track of the computations in the model and automatically computes the gradients for every variable. Because of this, we can see how easy it can be to make changes and try different data sources. See also A great place to start is the official Python API Tensorflow documentation: https://www.tensorflow.org/versions/r0.7/api_docs/python/index.html There are also tutorials available: https://www.tensorflow.org/versions/r0.7/tutorials/index.html Declaring tensors Getting ready Tensors are the data structure that TensorFlow operates on in the computational graph. We can declare these tensors as variables or feed them in as placeholders. First we must know how to create tensors. When we create a tensor and declare it to be a variable, TensorFlow creates several graph structures in our computation graph. It is also important to point out that just by creating a tensor, TensorFlow is not adding anything to the computational graph. TensorFlow does this only after creating a variable out of the tensor. See the next section on variables and placeholders for more information. How to do it… Here we will cover the main ways to create tensors in TensorFlow. Fixed tensors: Creating a zero filled tensor. Use the following: zero_tsr = tf.zeros([row_dim, col_dim]) Creating a one filled tensor. Use the following: ones_tsr = tf.ones([row_dim, col_dim]) Creating a constant filled tensor. Use the following: filled_tsr = tf.fill([row_dim, col_dim], 42) Creating a tensor out of an existing constant.Use the following: constant_tsr = tf.constant([1,2,3]) Note that the tf.constant() function can be used to broadcast a value into an array, mimicking the behavior of tf.fill() by writing tf.constant(42, [row_dim, col_dim]) Tensors of similar shape: We can also initialize variables based on the shape of other tensors, as follows: zeros_similar = tf.zeros_like(constant_tsr) ones_similar = tf.ones_like(constant_tsr) Note, that since these tensors depend on prior tensors, we must initialize them in order. Attempting to initialize all the tensors all at once will result in an error. Sequence tensors: Tensorflow allows us to specify tensors that contain defined intervals. The following functions behave very similarly to the range() outputs and numpy's linspace() outputs. See the following function: linear_tsr = tf.linspace(start=0, stop=1, start=3) The resulting tensor is the sequence [0.0, 0.5, 1.0]. Note that this function includes the specified stop value. See the following function integer_seq_tsr = tf.range(start=6, limit=15, delta=3) The result is the sequence [6, 9, 12]. Note that this function does not include the limit value. Random tensors: The following generated random numbers are from a uniform distribution: randunif_tsr = tf.random_uniform([row_dim, col_dim], minval=0, maxval=1) Know that this random uniform distribution draws from the interval that includes the minval but not the maxval ( minval<=x<maxval ). To get a tensor with random draws from a normal distribution, as follows: randnorm_tsr = tf.random_normal([row_dim, col_dim], mean=0.0, stddev=1.0) There are also times when we wish to generate normal random values that are assured within certain bounds. The truncated_normal() function always picks normal values within two standard deviations of the specified mean. See the following: runcnorm_tsr = tf.truncated_normal([row_dim, col_dim], mean=0.0, stddev=1.0) We might also be interested in randomizing entries of arrays. To accomplish this there are two functions that help us, random_shuffle() and random_crop(). See the following: shuffled_output = tf.random_shuffle(input_tensor) cropped_output = tf.random_crop(input_tensor, crop_size) Later on in this book, we will be interested in randomly cropping an image of size (height, width, 3) where there are three color spectrums. To fix a dimension in the cropped_output, you must give it the maximum size in that dimension: cropped_image = tf.random_crop(my_image, [height/2, width/2, 3]) How it works… Once we have decided on how to create the tensors, then we may also create the corresponding variables by wrapping the tensor in the Variable() function, as follows. More on this in the next section: my_var = tf.Variable(tf.zeros([row_dim, col_dim])) There's more… We are not limited to the built in functions, we can convert any numpy array, Python list, or constant to a tensor using the function convert_to_tensor(). Know that this function also accepts tensors as an input in case we wish to generalize a computation inside a function. Using placeholders and variables Getting ready One of the most important distinctions to make with data is whether it is a placeholder or variable. Variables are the parameters of the algorithm and TensorFlow keeps track of how to change these to optimize the algorithm. Placeholders are objects that allow you to feed in data of a specific type and shape or that depend on the results of the computational graph, like the expected outcome of a computation. How to do it… The main way to create a variable is by using the Variable() function, which takes a tensor as an input and outputs a variable. This is the declaration and we still need to initialize the variable. Initializing is what puts the variable with the corresponding methods on the computational graph. Here is an example of creating and initializing a variable: my_var = tf.Variable(tf.zeros([2,3])) sess = tf.Session() initialize_op = tf.initialize_all_variables() sess.run(initialize_op) To see whatthe computational graph looks like after creating and initializing a variable, see the next part in this section, How it works…, Figure 1. Placeholders are just holding the position for data to be fed into the graph. Placeholders get data from a feed_dict argument in the session. To put a placeholder in the graph, we must perform at least one operation on the placeholder. We initialize the graph, declare x to be a placeholder, and define y as the identity operation on x, which just returns x. We then create data to feed into the x placeholder and run the identity operation. It is worth noting that Tensorflow will not return a self-referenced placeholder in the feed dictionary. The code is shown below and the resulting graph is in the next section, How it works…: sess = tf.Session() x = tf.placeholder(tf.float32, shape=[2,2]) y = tf.identity(x) x_vals = np.random.rand(2,2) sess.run(y, feed_dict={x: x_vals}) # Note that sess.run(x, feed_dict={x: x_vals}) will result in a self-referencing error. How it works… The computational graph of initializing a variable as a tensor of zeros is seen in Figure 1to follow: Figure 1: Variable Figure 1: Here we can see what the computational graph looks like in detail with just one variable, initialized to all zeros. The grey shaded region is a very detailed view of the operations and constants involved. The main computational graph with less detail is the smaller graph outside of the grey region in the upper right. For more details on creating and visualizing graphs. Similarly, the computational graph of feeding a numpy array into a placeholder can be seen to follow, in Figure 2: Figure 2: Computational graph of an initialized placeholder Figure 2: Here is the computational graph of a placeholder initialized. The grey shaded region is a very detailed view of the operations and constants involved. The main computational graph with less detail is the smaller graph outside of the grey region in the upper right. There's more… During the run of the computational graph, we have to tell TensorFlow when to initialize the variables we have created. While each variable has an initializer method, the most common way to do this is with the helper function initialize_all_variables(). This function creates an operation in the graph that initializes all the variables we have created, as follows: initializer_op = tf.initialize_all_variables() But if we want to initialize a variable based on the results of initializing another variable, we have to initialize variables in the order we want, as follows: sess = tf.Session() first_var = tf.Variable(tf.zeros([2,3])) sess.run(first_var.initializer) second_var = tf.Variable(tf.zeros_like(first_var)) # Depends on first_var sess.run(second_var.initializer) Working with matrices Getting ready Many algorithms depend on matrix operations. TensorFlow gives us easy-to-use operations to perform such matrix calculations. For all of the following examples, we can create a graph session by running the following code: import tensorflow as tf sess = tf.Session() How to do it… Creating matrices: We can create two-dimensional matrices from numpy arrays or nested lists, as we described in the earlier section on tensors. We can also use the tensor creation functions and specify a two-dimensional shape for functions like zeros(), ones(), truncated_normal(), and so on: Tensorflow also allows us to create a diagonal matrix from a one dimensional array or list with the function diag(), as follows: identity_matrix = tf.diag([1.0, 1.0, 1.0]) # Identity matrix A = tf.truncated_normal([2, 3]) # 2x3 random normal matrix B = tf.fill([2,3], 5.0) # 2x3 constant matrix of 5's C = tf.random_uniform([3,2]) # 3x2 random uniform matrix D = tf.convert_to_tensor(np.array([[1., 2., 3.],[-3., -7., -1.],[0., 5., -2.]])) print(sess.run(identity_matrix)) [[ 1. 0. 0.] [ 0. 1. 0.] [ 0. 0. 1.]] print(sess.run(A)) [[ 0.96751703 0.11397751 -0.3438891 ] [-0.10132604 -0.8432678 0.29810596]] print(sess.run(B)) [[ 5. 5. 5.] [ 5. 5. 5.]] print(sess.run(C)) [[ 0.33184157 0.08907614] [ 0.53189191 0.67605299] [ 0.95889051 0.67061249]] print(sess.run(D)) [[ 1. 2. 3.] [-3. -7. -1.] [ 0. 5. -2.]] Note that if we were to run sess.run(C) again, we would reinitialize the random variables and end up with different random values. Addition and subtraction uses the following function: print(sess.run(A+B)) [[ 4.61596632 5.39771316 4.4325695 ] [ 3.26702736 5.14477345 4.98265553]] print(sess.run(B-B)) [[ 0. 0. 0.] [ 0. 0. 0.]] Multiplication print(sess.run(tf.matmul(B, identity_matrix))) [[ 5. 5. 5.] [ 5. 5. 5.]] Also, the function matmul() has arguments that specify whether or not to transpose the arguments before multiplication or whether each matrix is sparse. Transpose the arguments as follows: print(sess.run(tf.transpose(C))) [[ 0.67124544 0.26766731 0.99068872] [ 0.25006068 0.86560275 0.58411312]] Again, it is worth mentioning the reinitializing that gives us different values than before. Determinant, use the following: print(sess.run(tf.matrix_determinant(D))) -38.0 Inverse: print(sess.run(tf.matrix_inverse(D))) [[-0.5 -0.5 -0.5 ] [ 0.15789474 0.05263158 0.21052632] [ 0.39473684 0.13157895 0.02631579]] Note that the inverse method is based on the Cholesky decomposition if the matrix is symmetric positive definite or the LU decomposition otherwise. Decompositions: Cholesky decomposition, use the following: print(sess.run(tf.cholesky(identity_matrix))) [[ 1. 0. 1.] [ 0. 1. 0.] [ 0. 0. 1.]] Eigenvalues and Eigenvectors, use the following code: print(sess.run(tf.self_adjoint_eig(D)) [[-10.65907521 -0.22750691 2.88658212] [ 0.21749542 0.63250104 -0.74339638] [ 0.84526515 0.2587998 0.46749277] [ -0.4880805 0.73004459 0.47834331]] Note that the function self_adjoint_eig() outputs the eigen values in the first row and the subsequent vectors in the remaining vectors. In mathematics, this is called the eigen decomposition of a matrix. How it works… TensorFlow provides all the tools for us to get started with numerical computations and add such computations to our graphs. This notation might seem quite heavy for simple matrix operations. Remember that we are adding these operations to the graph and telling TensorFlow what tensors to run through those operations. Declaring operations Getting ready Besides the standard arithmetic operations, TensorFlow provides us more operations that we should be aware of and how to use them before proceeding. Again, we can create a graph session by running the following code: import tensorflow as tf sess = tf.Session() How to do it… TensorFlow has the standard operations on tensors, add(), sub(), mul(), and div(). Note that all of these operations in this section will evaluate the inputs element-wise unless specified otherwise. TensorFlow provides some variations of div() and relevant functions. It is worth mentioning that div() returns the same type as the inputs. This means it really returns the floor of the division (akin to Python 2) if the inputs are integers. To return the Python 3 version, which casts integers into floats before dividing and always returns a float, TensorFlow provides the function truediv()shown as follows: print(sess.run(tf.div(3,4))) 0 print(sess.run(tf.truediv(3,4))) 0.75 If we have floats and want integer division, we can use the function floordiv(). Note that this will still return a float, but rounded down to the nearest integer. The function is shown as follows: print(sess.run(tf.floordiv(3.0,4.0))) 0.0 Another important function is mod(). This function returns the remainder after division.It is shown as follows: print(sess.run(tf.mod(22.0, 5.0))) 2.0 The cross product between two tensors is achieved by the cross() function. Remember that the cross product is only defined for two 3-dimensional vectors, so it only accepts two 3-dimensional tensors. The function is shown as follows: print(sess.run(tf.cross([1., 0., 0.], [0., 1., 0.]))) [ 0. 0. 1.0] Here is a compact list of the more common math functions. All of these functions operate element-wise: abs() Absolute value of one input tensor ceil() Ceiling function of one input tensor cos() Cosine function of one input tensor exp() Base e exponential of one input tensor floor() Floor function of one input tensor inv() Multiplicative inverse (1/x) of one input tensor log() Natural logarithm of one input tensor maximum() Element-wise max of two tensors minimum() Element-wise min of two tensors neg() Negative of one input tensor pow() The first tensor raised to the second tensor element-wise round() Rounds one input tensor rsqrt() One over the square root of one tensor sign() Returns -1, 0, or 1, depending on the sign of the tensor sin() Sine function of one input tensor sqrt() Square root of one input tensor square() Square of one input tensor Specialty mathematical functions: There are some special math functions that get used in machine learning that are worth mentioning and TensorFlow has built in functions for them. Again, these functions operate element-wise, unless specified otherwise: digamma() Psi function, the derivative of the lgamma() function erf() Gaussian error function, element-wise, of one tensor erfc() Complimentary error function of one tensor igamma() Lower regularized incomplete gamma function igammac() Upper regularized incomplete gamma function lbeta() Natural logarithm of the absolute value of the beta function lgamma() Natural logarithm of the absolute value of the gamma function squared_difference() Computes the square of the differences between two tensors How it works… It is important to know what functions are available to us to add to our computational graphs. Mostly we will be concerned with the preceding functions. We can also generate many different custom functions as compositions of the preceding, as follows: # Tangent function (tan(pi/4)=1) print(sess.run(tf.div(tf.sin(3.1416/4.), tf.cos(3.1416/4.)))) 1.0 There's more… If we wish to add other operations to our graphs that are not listed here, we must create our own from the preceding functions. Here is an example of an operation not listed above that we can add to our graph: # Define a custom polynomial function def custom_polynomial(value): # Return 3 * x^2 - x + 10 return(tf.sub(3 * tf.square(value), value) + 10) print(sess.run(custom_polynomial(11))) 362 Summary Thus in this article we have implemented some introductory recipes that will help us to learn the basics of TensorFlow. Resources for Article: Further resources on this subject: Data Clustering [article] The TensorFlow Toolbox [article] Implementing Artificial Neural Networks with TensorFlow [article]
Read more
  • 0
  • 0
  • 2713

article-image-text-recognition
Packt
04 Jan 2017
7 min read
Save for later

Text Recognition

Packt
04 Jan 2017
7 min read
In this article by Fábio M. Soares and Alan M.F. Souza, the authors of the book Neural Network Programming with Java - Second Edition, we will cover pattern recognition, neural networks in pattern recognition, and text recognition (OCR). We all know that humans can read and recognize images faster than any supercomputer; however we have seen so far that neural networks show amazing capabilities of learning through data in both supervised and unsupervised way. In this article we present an additional case of pattern recognition involving an example of Optical Character Recognition (OCR). Neural networks can be trained to strictly recognize digits written in an image file. The topics of this article are: Pattern recognition Defined classes Undefined classes Neural networks in pattern recognition MLP Text recognition (OCR) Preprocessing and Classes definition (For more resources related to this topic, see here.) Pattern recognition Patterns are a bunch of data and elements that look similar to each other, in such a way that they can occur systematically and repeat from time to time. This is a task that can be solved mainly by unsupervised learning by clustering; however, when there are labelled data or defined classes of data, this task can be solved by supervised methods. We as humans perform this task more often than we can imagine. When we see objects and recognize them as belonging to a certain class, we are indeed recognizing a pattern. Also when we analyze charts discrete events and time series, we might find an evidence of some sequence of events that repeat systematically under certain conditions. In summary, patterns can be learned by data observations. Examples of pattern recognition tasks include, not liming to: Shapes recognition Objects classification Behavior clustering Voice recognition OCR Chemical reactions taxonomy Defined classes In the existence of a list of classes that has been predefined for a specific domain, then each class is considered to be a pattern, therefore every data record or occurrence is assigned one of these predefined classes. The predefinition of classes can usually be performed by an expert or based on a previous knowledge of the application domain. Also it is desirable to apply defined classes when we want the data to be classified strictly into one of the predefined classes. One illustrated example for pattern recognition using defined classes is animal recognition by image, shown in the next figure. The pattern recognizer however should be trained to catch all the characteristics that formally define the classes. In the example eight figures of animals are shown, belonging to two classes: mammals and birds. Since this is a supervised mode of learning, the neural network should be provided with a sufficient number of images that allow it to properly classify new images: Of course, sometimes the classification may fail, mainly due to similar hidden patterns in the images that neural networks may catch and also due to small nuances present in the shapes. For example, the dolphin has flippers but it is still a mammal. Sometimes in order to obtain a better classification, it is necessary to apply preprocessing and ensure that the neural network will receive the appropriate data that would allow for classification. Undefined classes When data are unlabeled and there is no predefined set of classes, it is a scenario for unsupervised learning. Shapes recognition are a good example since they may be flexible and have infinite number of edges, vertices or bindings: In the previous figure, we can see some sorts of shapes and we want to arrange them, whereby the similar ones can be grouped into the same cluster. Based on the shape information that is present in the images, it is likely for the pattern recognizer to classify the rectangle, the square and the rectangular triangle in into the same group. But if the information were presented to the pattern recognizer, not as an image, but as a graph with edges and vertices coordinates, the classification might change a little. In summary, the pattern recognition task may use both supervised and unsupervised mode of learning, basically depending of the objective of recognition. Neural networks in pattern recognition For pattern recognition the neural network architectures that can be applied are the MLPs (supervised) and the Kohonen network (unsupervised). In the first case, the problem should be set up as a classification problem, that is. the data should be transformed into the X-Y dataset, where for every data record in X there should be a corresponding class in Y. The output of the neural network for classification problems should have all of the possible classes, and this may require preprocessing of the output records. For the other case, the unsupervised learning, there is no need to apply labels on the output, however, the input data should be properly structured as well. To remind the reader the schema of both neural networks are shown in the next figure: Data pre-processing We have to deal with all possible types of data, that is., numerical (continuous and discrete) and categorical (ordinal or unscaled). But here we have the possibility to perform pattern recognition on multimedia content, such as images and videos. So how could multimedia be handled? The answer of this question lies in the way these contents are stored in files. Images, for example, are written with a representation of small colored points called pixels. Each color can be coded in an RGB notation where the intensity of red, green and blue define every color the human eye is able to see. Therefore an image of dimension 100x100 would have 10,000 pixels, each one having 3 values for red, green and blue, yielding a total of 30,000 points. That is the challenge for image processing in neural networks. Some methods, may reduce this huge number of dimensions. Afterwards an image can be treated as big matrix of numerical continuous values. For simplicity in this article we are applying only gray-scaled images with small dimension. Text recognition (OCR) Many documents are now being scanned and stored as images, making necessary the task of converting these documents back into text, for a computer to apply edition and text processing. However, this feature involves a number of challenges: Variety of text font Text size Image noise Manuscripts In the spite of that, humans can easily interpret and read even the texts written in a bad quality image. This can be explained by the fact that humans are already familiar with the text characters and the words in their language. Somehow the algorithm must become acquainted with these elements (characters, digits, signalization, and so on), in order to successfully recognize texts in images. Digits recognition Although there are a variety of tools available in the market for OCR, this remains still a big challenge for an algorithm to properly recognize texts in images. So we will be restricting our application in a smaller domain, so we could face simpler problems. Therefore, in this article we are going to implement a Neural Network to recognize digits from 0 to 9 represented on images. Also the images will have standardized and small dimensions, for simplicity purposes. Summary In this article we have covered pattern recognition, neural networks in pattern recognition, and text recognition (OCR). Resources for Article: Further resources on this subject: Training neural networks efficiently using Keras [article] Implementing Artificial Neural Networks with TensorFlow [article] Training and Visualizing a neural network with R [article]
Read more
  • 0
  • 0
  • 1859
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-microsoft-cognitive-services
Packt
04 Jan 2017
16 min read
Save for later

Microsoft Cognitive Services

Packt
04 Jan 2017
16 min read
In this article by Leif Henning Larsen, author of the book Learning Microsoft Cognitive Services, we will look into what Microsoft Cognitive Services offer. You will then learn how to utilize one of the APIs by recognizing faces in images. Microsoft Cognitive Services give developers the possibilities of adding AI-like capabilities to their applications. Using a few lines of code, we can take advantage of powerful algorithms that would usually take a lot of time, effort, and hardware to do yourself. (For more resources related to this topic, see here.) Overview of Microsoft Cognitive Services Using Cognitive Services means you have 21 different APIs at your hand. These are in turn separated into 5 top-level domains according to what they do. They are vision, speech, language, knowledge, and search. Let's see more about them in the following sections. Vision APIs under the vision flags allows your apps to understand images and video content. It allows you to retrieve information about faces, feelings, and other visual content. You can stabilize videos and recognize celebrities. You can read text in images and generate thumbnails from videos and images. There are four APIs contained in the vision area, which we will see now. Computer Vision Using the Computer Vision API, you can retrieve actionable information from images. This means you can identify content (such as image format, image size, colors, faces, and more). You can detect whether an image is adult/racy. This API can recognize text in images and extract it to machine-readable words. It can detect celebrities from a variety of areas. Lastly, it can generate storage-efficient thumbnails with smart cropping functionality. Emotion The Emotion API allows you to recognize emotions, both in images and videos. This can allow for more personalized experiences in applications. The emotions that are detected are cross-cultural emotions: anger, contempt, disgust, fear, happiness, neutral, sadness, and surprise. Face We have already seen the very basic example of what the Face API can do. The rest of the API revolves around the same—to detect, identify, organize, and tag faces in photos. Apart from face detection, you can see how likely it is that two faces belong to the same person. You can identify faces and also find similar-looking faces. Video The Video API is about analyzing, editing, and processing videos in your app. If you have a video that is shaky, the API allows you to stabilize it. You can detect and track faces in videos. If a video contains a stationary background, you can detect motion. The API lets you to generate thumbnail summaries for videos, which allows users to see previews or snapshots quickly. Speech Adding one of the Speech APIs allows your application to hear and speak to your users. The APIs can filter noise and identify speakers. They can drive further actions in your application based on the recognized intent. Speech contains three APIs, which we will discuss now. Bing Speech Adding the Bing Speech API to your application allows you to convert speech to text and vice versa. You can convert spoken audio to text either by utilizing a microphone or other sources in real time or by converting audio from files. The API also offer speech intent recognition, which is trained by Language Understanding Intelligent Service to understand the intent. Speaker Recognition The Speaker Recognition API gives your application the ability to know who is talking. Using this API, you can use verify that someone speaking is who they claim to be. You can also determine who an unknown speaker is, based on a group of selected speakers. Custom Recognition To improve speech recognition, you can use the Custom Recognition API. This allows you to fine-tune speech recognition operations for anyone, anywhere. Using this API, the speech recognition model can be tailored to the vocabulary and speaking style of the user. In addition to this, the model can be customized to match the expected environment of the application. Language APIs related to language allow your application to process natural language and learn how to recognize what users want. You can add textual and linguistic analysis to your application as well as natural language understanding. The following five APIs can be found in the Language area. Bing Spell Check Bing Spell Check API allows you to add advanced spell checking to your application. Language Understanding Intelligent Service (LUIS) Language Understanding Intelligent Service, or LUIS, is an API that can help your application understand commands from your users. Using this API, you can create language models that understand intents. Using models from Bing and Cortana, you can make these models recognize common requests and entities (such as places, time, and numbers). You can add conversational intelligence to your applications. Linguistic Analysis Linguistic Analysis API lets you parse complex text to explore the structure of text. Using this API, you can find nouns, verbs, and more from text, which allows your application to understand who is doing what to whom. Text Analysis Text Analysis API will help you in extracting information from text. You can find the sentiment of a text (whether the text is positive or negative). You will be able to detect language, topic, and key phrases used throughout the text. Web Language Model Using the Web Language Model (WebLM) API, you are able to leverage the power of language models trained on web-scale data. You can use this API to predict which words or sequences follow a given sequence or word. Knowledge When talking about Knowledge APIs, we are talking about APIs that allow you to tap into rich knowledge. This may be knowledge from the Web, it may be academia, or it may be your own data. Using these APIs, you will be able to explore different nuances of knowledge. The following four APIs are contained in the Knowledge area. Academic Using the Academic API, you can explore relationships among academic papers, journals, and authors. This API allows you to interpret natural language user query strings, which allows your application to anticipate what the user is typing. It will evaluate the said expression and return academic knowledge entities. Entity Linking Entity Linking is the API you would use to extend knowledge of people, places, and events based on the context. As you may know, a single word may be used differently based on the context. Using this API allows you to recognize and identify each separate entity within a paragraph based on the context. Knowledge Exploration The Knowledge Exploration API will let you add the ability to use interactive search for structured data in your projects. It interprets natural language queries and offers auto-completions to minimize user effort. Based on the query expression received, it will retrieve detailed information about matching objects. Recommendations The Recommendations API allows you to provide personalized product recommendations for your customers. You can use this API to add frequently bought together functionality to your application. Another feature you can add is item-to-item recommendations, which allow customers to see what other customers who likes this also like. This API will also allow you to add recommendations based on the prior activity of the customer. Search Search APIs give you the ability to make your applications more intelligent with the power of Bing. Using these APIs, you can use a single call to access data from billions of web pages, images videos, and news. The following five APIs are in the search domain. Bing Web Search With Bing Web Search, you can search for details in billions of web documents indexed by Bing. All the results can be arranged and ordered according to the layout you specify, and the results are customized to the location of the end user. Bing Image Search Using Bing Image Search API, you can add advanced image and metadata search to your application. Results include URLs to images, thumbnails, and metadata. You will also be able to get machine-generated captions and similar images and more. This API allows you to filter the results based on image type, layout, freshness (how new is the image), and license. Bing Video Search Bing Video Search will allow you to search for videos and returns rich results. The results contain metadata from the videos, static- or motion- based thumbnails, and the video itself. You can add filters to the result based on freshness, video length, resolution, and price. Bing News Search If you add Bing News Search to your application, you can search for news articles. Results can include authoritative image, related news and categories, information on the provider, URL, and more. To be more specific, you can filter news based on topics. Bing Autosuggest Bing Autosuggest API is a small, but powerful one. It will allow your users to search faster with search suggestions, allowing you to connect powerful search to your apps. Detecting faces with the Face API We have seen what the different APIs can do. Now we will test the Face API. We will not be doing a whole lot, but we will see how simple it is to detect faces in images. The steps we need to cover to do this are as follows: Register for a free Face API preview subscription. Add necessary NuGet packages to our project. Add some UI to the test application. Detect faces on command. Head over to https://www.microsoft.com/cognitive-services/en-us/face-api to start the process of registering for a free subscription to the Face API. By clicking on the yellow button, stating Get started for free,you will be taken to a login page. Log in with your Microsoft account, or if you do not have one, register for one. Once logged in, you will need to verify that the Face API Preview has been selected in the list and accept the terms and conditions. With that out of the way, you will be presented with the following: You will need one of the two keys later, when we are accessing the API. In Visual Studio, create a new WPF application. Following the instructions at https://www.codeproject.com/articles/100175/model-view-viewmodel-mvvm-explained, create a base class that implements the INotifyPropertyChanged interface and a class implementing the ICommand interface. The first should be inherited by the ViewModel, the MainViewModel.cs file, while the latter should be used when creating properties to handle button commands. The Face API has a NuGet package, so we need to add that to our project. Head over to NuGet Package Manager for the project we created earlier. In the Browse tab, search for the Microsoft.ProjectOxford.Face package and install the it from Microsoft: As you will notice, another package will also be installed. This is the Newtonsoft.Json package, which is required by the Face API. The next step is to add some UI to our application. We will be adding this in the MainView.xaml file. First, we add a grid and define some rows for the grid: <Grid> <Grid.RowDefinitions> <RowDefinition Height="*" /> <RowDefinition Height="20" /> <RowDefinition Height="30" /> </Grid.RowDefinitions> Three rows are defined. The first is a row where we will have an image. The second is a line for status message, and the last is where we will place some buttons: Next, we add our image element: <Image x_Name="FaceImage" Stretch="Uniform" Source="{Binding ImageSource}" Grid.Row="0" /> We have given it a unique name. By setting the Stretch parameter to Uniform, we ensure that the image keeps its aspect ratio. Further on, we place this element in the first row. Last, we bind the image source to a BitmapImage interface in the ViewModel, which we will look at in a bit. The next row will contain a text block with some status text. The text property will be bound to a string property in the ViewModel: <TextBlock x_Name="StatusTextBlock" Text="{Binding StatusText}" Grid.Row="1" /> The last row will contain one button to browse for an image and one button to be able to detect faces. The command properties of both buttons will be bound to the DelegateCommand properties in the ViewModel: <Button x_Name="BrowseButton" Content="Browse" Height="20" Width="140" HorizontalAlignment="Left" Command="{Binding BrowseButtonCommand}" Margin="5, 0, 0, 5" Grid.Row="2" /> <Button x_Name="DetectFaceButton" Content="Detect face" Height="20" Width="140" HorizontalAlignment="Right" Command="{Binding DetectFaceCommand}" Margin="0, 0, 5, 5" Grid.Row="2"/> With the View in place, make sure that the code compiles and run it. This should present you with the following UI: The last part is to create the binding properties in our ViewModel and make the buttons execute something. Open the MainViewModel.cs file. First, we define two variables: private string _filePath; private IFaceServiceClient _faceServiceClient; The string variable will hold the path to our image, while the IFaceServiceClient variable is to interface the Face API. Next we define two properties: private BitmapImage _imageSource; public BitmapImage ImageSource { get { return _imageSource; } set { _imageSource = value; RaisePropertyChangedEvent("ImageSource"); } } private string _statusText; public string StatusText { get { return _statusText; } set { _statusText = value; RaisePropertyChangedEvent("StatusText"); } } What we have here is a property for the BitmapImage mapped to the Image element in the view. We also have a string property for the status text, mapped to the text block element in the view. As you also may notice, when either of the properties is set, we call the RaisePropertyChangedEvent method. This will ensure that the UI is updated when either of the properties has new values. Next, we define our two DelegateCommand objects and do some initialization through the constructor: public ICommand BrowseButtonCommand { get; private set; } public ICommand DetectFaceCommand { get; private set; } public MainViewModel() { StatusText = "Status: Waiting for image..."; _faceServiceClient = new FaceServiceClient("YOUR_API_KEY_HERE"); BrowseButtonCommand = new DelegateCommand(Browse); DetectFaceCommand = new DelegateCommand(DetectFace, CanDetectFace); } In our constructor, we start off by setting the status text. Next, we create an object of the Face API, which needs to be created with the API key we got earlier. At last, we create the DelegateCommand object for our command properties. Note how the browse command does not specify a predicate. This means it will always be possible to click on the corresponding button. To make this compile, we need to create the functions specified in the DelegateCommand constructors—the Browse, DetectFace, and CanDetectFace functions: private void Browse(object obj) { var openDialog = new Microsoft.Win32.OpenFileDialog(); openDialog.Filter = "JPEG Image(*.jpg)|*.jpg"; bool? result = openDialog.ShowDialog(); if (!(bool)result) return; We start the Browse function by creating an OpenFileDialog object. This dialog is assigned a filter for JPEG images, and in turn it is opened. When the dialog is closed, we check the result. If the dialog was cancelled, we simply stop further execution: _filePath = openDialog.FileName; Uri fileUri = new Uri(_filePath); With the dialog closed, we grab the filename of the file selected and create a new URI from it: BitmapImage image = new BitmapImage(fileUri); image.CacheOption = BitmapCacheOption.None; image.UriSource = fileUri; With the newly created URI, we want to create a new BitmapImage interface. We specify it to use no cache, and we set the URI source the URI we created: ImageSource = image; StatusText = "Status: Image loaded..."; } The last step we take is to assign the bitmap image to our BitmapImage property, so the image is shown in the UI. We also update the status text to let the user know the image has been loaded. Before we move on, it is time to make sure that the code compiles and that you are able to load an image into the View: private bool CanDetectFace(object obj) { return !string.IsNullOrEmpty(ImageSource?.UriSource.ToString()); } The CanDetectFace function checks whether or not the detect faces button should be enabled. In this case, it checks whether our image property actually has a URI. If it does, by extension that means we have an image, and we should be able to detect faces: private async void DetectFace(object obj) { FaceRectangle[] faceRects = await UploadAndDetectFacesAsync(); string textToSpeak = "No faces detected"; if (faceRects.Length == 1) textToSpeak = "1 face detected"; else if (faceRects.Length > 1) textToSpeak = $"{faceRects.Length} faces detected"; Debug.WriteLine(textToSpeak); } Our DetectFace method calls an async method to upload and detect faces. The return value contains an array of FaceRectangles. This array contains the rectangle area for all face positions in the given image. We will look into the function we call in a bit. After the call has finished executing, we print a line with the number of faces to the debug console window: private async Task<FaceRectangle[]> UploadAndDetectFacesAsync() { StatusText = "Status: Detecting faces..."; try { using (Stream imageFileStream = File.OpenRead(_filePath)) { In the UploadAndDetectFacesAsync function, we create a Stream object from the image. This stream will be used as input for the actual call to the Face API service: Face[] faces = await _faceServiceClient.DetectAsync(imageFileStream, true, true, new List<FaceAttributeType>() { FaceAttributeType.Age }); This line is the actual call to the detection endpoint for the Face API. The first parameter is the file stream we created in the previous step. The rest of the parameters are all optional. The second parameter should be true if you want to get a face ID. The next specifies if you want to receive face landmarks or not. The last parameter takes a list of facial attributes you may want to receive. In our case, we want the age parameter to be returned, so we need to specify that. The return type of this function call is an array of faces with all the parameters you have specified: List<double> ages = faces.Select(face => face.FaceAttributes.Age).ToList(); FaceRectangle[] faceRects = faces.Select(face => face.FaceRectangle).ToArray(); StatusText = "Status: Finished detecting faces..."; foreach(var age in ages) { Console.WriteLine(age); } return faceRects; } } The first line in the previous code iterates over all faces and retrieves the approximate age of all faces. This is later printed to the debug console window, in the following foreach loop. The second line iterates over all faces and retrieves the face rectangle with the rectangular location of all faces. This is the data we return to the calling function. Add a catch clause to finish the method. In case an exception is thrown, in our API call, we catch that. You want to show the error message and return an empty FaceRectangle array. With that code in place, you should now be able to run the full example. The end result will look like the following image: The resulting debug console window will print the following text: 1 face detected 23,7 Summary In this article, we looked at what Microsoft Cognitive Services offer. We got a brief description of all the APIs available. From there, we looked into the Face API, where we saw how to detect faces in images. Resources for Article: Further resources on this subject: Auditing and E-discovery [article] The Sales and Purchase Process [article] Manage Security in Excel [article]
Read more
  • 0
  • 0
  • 16238

article-image-neural-network
Packt
04 Jan 2017
11 min read
Save for later

What is an Artificial Neural Network?

Packt
04 Jan 2017
11 min read
In this article by Prateek Joshi, author of book Artificial Intelligence with Python, we are going to learn about artificial neural networks. We will start with an introduction to artificial neural networks and the installation of the relevant library. We will discuss perceptron and how to build a classifier based on that. We will learn about single layer neural networks and multilayer neural networks. (For more resources related to this topic, see here.) Introduction to artificial neural networks One of the fundamental premises of Artificial Intelligence is to build machines that can perform tasks that require human intelligence. The human brain is amazing at learning new things. Why not use the model of the human brain to build a machine? An artificial neural network is a model designed to simulate the learning process of the human brain. Artificial neural networks are designed such that they can identify the underlying patterns in data and learn from them. They can be used for various tasks such as classification, regression, segmentation, and so on. We need to convert any given data into the numerical form before feeding it into the neural network. For example, we deal with many different types of data including visual, textual, time-series, and so on. We need to figure out how to represent problems in a way that can be understood by artificial neural networks. Building a neural network The human learning process is hierarchical. We have various stages in our brain’s neural network and each stage corresponds to a different granularity. Some stages learn simple things and some stages learn more complex things. Let’s consider an example of visually recognizing an object. When we look at a box, the first stage identifies simple things like corners and edges. The next stage identifies the generic shape and the stage after that identifies what kind of object it is. This process differs for different tasks, but you get the idea! By building this hierarchy, our human brain quickly separates the concepts and identifies the given object. To simulate the learning process of the human brain, an artificial neural network is built using layers of neurons. These neurons are inspired from the biological neurons we discussed in the previous paragraph. Each layer in an artificial neural network is a set of independent neurons. Each neuron in a layer is connected to neurons in the adjacent layer. Training a neural network If we are dealing with N-dimensional input data, then the input layer will consist of N neurons. If we have M distinct classes in our training data, then the output layer will consist of M neurons. The layers between the input and output layers are called hidden layers. A simple neural network will consist of a couple of layers and a deep neural network will consist of many layers. Consider the case where we want to use a neural network to classify the given data. The first step is to collect the appropriate training data and label it. Each neuron acts as a simple function and the neural network trains itself until the error goes below a certain value. The error is basically the difference between the predicted output and the actual output. Based on how big the error is, the neural network adjusts itself and retrains until it gets closer to the solution. You can learn more about neural networks here: http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html. We will be using a library called NeuroLab . You can find more about it here: https://pythonhosted.org/neurolab. You can install it by running the following command on your Terminal: $ pip3 install neurolab Once you have installed it, you can proceed to the next section. Building a perceptron based classifier Perceptron is the building block of an artificial neural network. It is a single neuron that takes inputs, performs computation on them, and then produces an output. It uses a simple linear function to make the decision. Let’s say we are dealing with an N-dimension input datapoint. A perceptron computes the weighted summation of those N numbers and it then adds a constant to produce the output. The constant is called the bias of the neuron. It is remarkable to note that these simple perceptrons are used to design very complex deep neural networks. Let’s see how to build a perceptron based classifier using NeuroLab. Create a new python file and import the following packages: importnumpy as np importmatplotlib.pyplot as plt importneurolab as nl Load the input data from the text file data_perceptron.txt provided to you. Each line contains space separated numbers where the first two numbers are the features and the last number is the label: # Load input data text = np.loadtxt(‘data_perceptron.txt’) Separate the text into datapoints and labels: # Separate datapoints and labels data = text[:, :2] labels = text[:, 2].reshape((text.shape[0], 1)) Plot the datapoints: # Plot input data plt.figure() plt.scatter(data[:,0], data[:,1]) plt.xlabel(‘Dimension 1’) plt.ylabel(‘Dimension 2’) plt.title(‘Input data’) Define the maximum and minimum values that each dimension can take: # Define minimum and maximum values for each dimension dim1_min, dim1_max, dim2_min, dim2_max = 0, 1, 0, 1 Since the data is separated into two classes, we just need one bit to represent the output. So the output layer will contain a single neuron. # Number of neurons in the output layer num_output = labels.shape[1] We have a dataset where the datapoints are 2-dimensional. Let’s define a perceptron with 2 input neurons where we assign one neuron for each dimension. # Define a perceptron with 2 input neurons (because we # have 2 dimensions in the input data) dim1 = [dim1_min, dim1_max] dim2 = [dim2_min, dim2_max] perceptron = nl.net.newp([dim1, dim2], num_output) Train the perceptron with the training data: # Train the perceptron using the data error_progress = perceptron.train(data, labels, epochs=100, show=20, lr=0.03) Plot the training progress using the error metric: # Plot the training progress plt.figure() plt.plot(error_progress) plt.xlabel(‘Number of epochs’) plt.ylabel(‘Training error’) plt.title(‘Training error progress’) plt.grid() plt.show() The full code is given in the file perceptron_classifier.py. If you run the code, you will get two output figures. The first figure indicates the input datapoints: The second figure represents the training progress using the error metric: As we can observe from the preceding figure, the error goes down to 0 at the end of fourth epoch. Constructing a single layer neural network A perceptron is a good start, but it cannot do much. The next step is to have a set of neurons act as a unit to see what we can achieve. Let’s create a single neural network that consists of independent neurons acting on input data to produce the output. Create a new python file and import the following packages: importnumpy as np importmatplotlib.pyplot as plt importneurolab as nl Load the input data from the file data_simple_nn.txt provided to you. Each line in this file contains 4 numbers. The first two numbers form the datapoint and the last two numbers are the labels. Why do we need to assign two numbers for labels? Because we have 4 distinct classes in our dataset, so we need two bits represent them. # Load input data text = np.loadtxt(‘data_simple_nn.txt’) Separate the data into datapoints and labels: # Separate it into datapoints and labels data = text[:, 0:2] labels = text[:, 2:] Plot the input data: # Plot input data plt.figure() plt.scatter(data[:,0], data[:,1]) plt.xlabel(‘Dimension 1’) plt.ylabel(‘Dimension 2’) plt.title(‘Input data’) Extract the minimum and maximum values for each dimension (we don’t need to hardcode it like we did in the previous section): # Minimum and maximum values for each dimension dim1_min, dim1_max = data[:,0].min(), data[:,0].max() dim2_min, dim2_max = data[:,1].min(), data[:,1].max() Define the number of neurons in the output layer: # Define the number of neurons in the output layer num_output = labels.shape[1] Define a single layer neural network using the above parameters: # Define a single-layer neural network dim1 = [dim1_min, dim1_max] dim2 = [dim2_min, dim2_max] nn = nl.net.newp([dim1, dim2], num_output) Train the neural network using training data: # Train the neural network error_progress = nn.train(data, labels, epochs=100, show=20, lr=0.03) Plot the training progress: # Plot the training progress plt.figure() plt.plot(error_progress) plt.xlabel(‘Number of epochs’) plt.ylabel(‘Training error’) plt.title(‘Training error progress’) plt.grid() plt.show() Define some sample test datapoints and run the network on those points: The full code is given in the file simple_neural_network.py. If you run the code, you will get two figures. The first figure represents the input datapoints: # Run the classifier on test datapoints print(‘nTest results:’) data_test = [[0.4, 4.3], [4.4, 0.6], [4.7, 8.1]] for item in data_test: print(item, ‘-->‘, nn.sim([item])[0]) The second figure shows the training progress: You will see the following printed on your Terminal: If you locate those test datapoints on a 2D graph, you can visually verify that the predicted outputs are correct. Constructing a multilayer neural network In order to enable higher accuracy, we need to give more freedom the neural network. This means that a neural network needs more than one layer to extract the underlying patterns in the training data. Let’s create a multilayer neural network to achieve that. Create a new python file and import the following packages: importnumpy as np importmatplotlib.pyplot as plt importneurolab as nl In the previous two sections, we saw how to use a neural network as a classifier. In this section, we will see how to use a multilayer neural network as a regressor. Generate some sample datapoints based on the equation y = 3x^2 + 5 and then normalize the points: # Generate some training data min_val = -15 max_val = 15 num_points = 130 x = np.linspace(min_val, max_val, num_points) y = 3 * np.square(x) + 5 y /= np.linalg.norm(y) Reshape the above variables to create a training dataset: # Create data and labels data = x.reshape(num_points, 1) labels = y.reshape(num_points, 1) Plot the input data: # Plot input data plt.figure() plt.scatter(data, labels) plt.xlabel(‘Dimension 1’) plt.ylabel(‘Dimension 2’) plt.title(‘Input data’) Define a multilayer neural network with 2 hidden layers. You are free to design a neural network any way you want. For this case, let’s have 10 neurons in the first layer and 6 neurons in the second layer. Our task is to predict the value, so the output layer will contain a single neuron. # Define a multilayer neural network with 2 hidden layers; # First hidden layer consists of 10 neurons # Second hidden layer consists of 6 neurons # Output layer consists of 1 neuron nn = nl.net.newff([[min_val, max_val]], [10, 6, 1]) Set the training algorithm to gradient descent: # Set the training algorithm to gradient descent nn.trainf = nl.train.train_gd Train the neural network using the training data that was generated: # Train the neural network error_progress = nn.train(data, labels, epochs=2000, show=100, goal=0.01) Run the neural network on the training datapoints: # Run the neural network on training datapoints output = nn.sim(data) y_pred = output.reshape(num_points) Plot the training progress: # Plot training error plt.figure() plt.plot(error_progress) plt.xlabel(‘Number of epochs’) plt.ylabel(‘Error’) plt.title(‘Training error progress’) Plot the predicted output: # Plot the output x_dense = np.linspace(min_val, max_val, num_points * 2) y_dense_pred = nn.sim(x_dense.reshape(x_dense.size,1)).reshape(x_dense.size) plt.figure() plt.plot(x_dense, y_dense_pred, ‘-’, x, y, ‘.’, x, y_pred, ‘p’) plt.title(‘Actual vs predicted’) plt.show() The full code is given in the file multilayer_neural_network.py. If you run the code, you will get three figures. The first figure shows the input data: The second figure shows the training progress: The third figure shows the predicted output overlaid on top of input data: The predicted output seems to follow the general trend. If you continue to train the network and reduce the error, you will see that the predicted output will match the input curve even more accurately. You will see the following printed on your Terminal: Summary In this article, we learnt more about artificial neural networks. We discussed how to build and train neural networks. We also talked about perceptron and built a classifier based on that. We also learnt about single layer neural networks as well as multilayer neural networks. Resources for Article: Further resources on this subject: Training and Visualizing a neural network with R [article] Implementing Artificial Neural Networks with TensorFlow [article] How to do Machine Learning with Python [article]
Read more
  • 0
  • 0
  • 24394

article-image-introduction-deep-learning
Packt
04 Jan 2017
19 min read
Save for later

Introduction to Deep Learning

Packt
04 Jan 2017
19 min read
In this article by Dipayan Dev, the author of the book Deep Learning with Hadoop, we will see a brief introduction to concept of the deep learning and deep feed-forward networks. "By far the greatest danger of Artificial Intelligence is that people conclude too early that they understand it."                                                                                                                  - Eliezer Yudkowsky Ever thought, why it is often difficult to beat the computer in chess, even by the best players of the game? How Facebook is able to recognize your face among hundreds of millions photos? How your mobile phone can recognize your voice, and redirects the call to the correct person selecting from hundreds of contacts listed? The primary goal of this book is to deal with many of those queries, and to provide detailed solutions to the readers. This book can be used for a wide range of reasons by a variety of readers, however, we wrote the book with two main target audiences in mind. One of the primary target audiences are the undergraduate or graduate university students learning about deep learning and Artificial Intelligence; the second group of readers belongs to the software engineers who already have a knowledge of Big Data, deep learning, and statistical modeling, but want to rapidly gain the knowledge of how deep learning can be used for Big Data and vice versa. This article will mainly try to set the foundation of the readers by providing the basic concepts, terminologies, characteristics, and the major challenges of deep learning. The article will also put forward the classification of different deep network algorithms, which have been widely used by researchers in the last decade. Following are the main topics that this article will cover: Get started with deep learning Deep learning: A revolution in Artificial Intelligence Motivations for deep learning Classification of deep learning networks Ever since the dawn of civilization, people have always dreamt of building some artificial machines or robots which can behave and work exactly like human beings. From the Greek mythological characters to the ancient Hindu epics, there are numerous such examples, which clearly suggest people's interest and inclination towards creating and having an artificial life. During the initial computer generations, people had always wondered if the computer could ever become as intelligent as a human being! Going forward, even in medical science too, the need of automated machines became indispensable and almost unavoidable. With this need and constant research in the same field, Artificial Intelligence (AI) has turned out to be a flourishing technology with its various applications in several domains, such as image processing, video processing, and many other diagnosis tools in medical science too. Although there are many problems that are resolved by AI systems on a daily basis, nobody knows the specific rules for how an AI system is programmed! Few of the intuitive problems are as follows: Google search, which does a really good job of understanding what you type or speak As mentioned earlier, Facebook too, is somewhat good at recognizing your face, and hence, understanding your interests Moreover, with the integration of various other fields, for example, probability, linear algebra, statistics, machine learning, deep learning, and so on, AI has already gained a huge amount of popularity in the research field over the course of time. One of the key reasons for he early success of AI could be because it basically dealt with fundamental problems for which the computer did not require vast amount of knowledge. For example, in 1997, IBM's Deep Blue chess-playing system was able to defeat the world champion Garry Kasparov [1]. Although this kind of achievement at that time can be considered as substantial, however, chess, being limited by only a few number of rules, it was definitely not a burdensome task to train the computer with only those number of rules! Training a system with fixed and limited number of rules is termed as hard-coded knowledge of the computer. Many Artificial Intelligence projects have undergone this hard-coded knowledge about the various aspects of the world in many traditional languages. As time progresses, this hard-coded knowledge does not seem to work with systems dealing with huge amounts of data. Moreover, the number of rules that the data were following also kept changing in a frequent manner. Therefore, most of those projects following that concept failed to stand up to the height of expectation. The setbacks faced by this hard-coded knowledge implied that those artificial intelligent systems need some way of generalizing patterns and rules from the supplied raw data, without the need of external spoon-feeding. The proficiency of a system to do so is termed as machine learning. There are various successful machine learning implementations, which we use in our daily life. Few of the most common and important implementations are as follows: Spam detection: Given an e-mail in your inbox, the model can detect whether to put that e-mail in spam or in the inbox folder. A common naive Bayes model can distinguish between such e-mails. Credit card fraud detection: A model that can detect whether a number of transactions performed at a specific time interval are done by the original customer or not. One of the most popular machine learning model, given by Mor-Yosef et al. [1990], used logistic regression, which could recommend whether caesarean delivery is needed for the patient or not! There are many such models, which have been implemented with the help of machine learning techniques: The figure shows the example of different types of representation. Let's say we want to train the machine to detect some empty spaces in between the jelly beans. In the image on the right side, we have sparse jelly beans, and it would be easier for the AI system to determine the empty parts. However, in the image on the left side, we have extremely compact jelly beans, and hence, it will be an extremely difficult task for the machine to find the empty spaces. Images sourced from USC-SIPI image database. A large portion of performance of the machine learning systems depends on the data fed to the system. This is called representation of the data. All the information related to the representation is called feature of the data. For example, if logistic regression is used to detect a brain tumor in a patient, the AI system will not try to diagnose the patient directly! Rather, the concerned doctor will provide the necessary input to the systems according to the common symptoms of that patient. The AI system will then match those inputs with the already received past inputs which were used to train the system. Based on the predictive analysis of the system, it will provide its decision regarding the disease. Although logistic regression can learn and decide based on the features given, it cannot influence or modify the way features are defined. For example, if that model was provided with a cesarean patient's report instead of the brain tumor patient's report, it would surely fail to predict the outcome, as the given features would never match with the trained data. This dependency of the machine learning systems on the representation of the data is not really unknown to us! In fact, most of our computer theory performs better based on how the data is represented. For example, the quality of database is considered based on the schema design. The execution of any database query, even on a thousand of million lines of data, becomes extremely fast if the schema is indexed properly. Therefore, the dependency of data representation of the AI systems should not surprise us. There are many such daily life examples too, where the representation of the data decides our efficiency. To locate a person from among 20 people is obviously easier than to locate the same from a crowd of 500 people. A visual representation of two different types of data representation in shown in preceding figure. Therefore, if the AI systems are fed with the appropriate featured data, even the hardest problems could be resolved. However, collecting and feeding the desired data in the correct way to the system has been a serious impediment for the computer programmer. There can be numerous real-time scenarios, where extracting the features could be a cumbersome task. Therefore, the way the data are represented decides the prime factors in the intelligence of the system. Finding cats from among a group of humans and cats could be extremely complicated if the features are not appropriate. We know that cats have tails; therefore, we might like to detect the presence of tails as a prominent feature. However, given the different tail shapes and sizes, it is often difficult to describe exactly how a tail will look like in terms of pixel values! Moreover, tails could sometimes be confused with the hands of humans. Also, overlapping of some objects could omit the presence of a cat's tail, making the image even more complicated. From all the above discussions, it can really be concluded that the success of AI systems depends, mainly, on how the data is represented. Also, various representations can ensnare and cache the different explanatory factors of all the disparities behind the data. Representation learning is one of the most popular and widely practiced learning approaches used to cope with these specific problems. Learning the representations of the next layer from the existing representation of data can be defined as representation learning. Ideally, all representation learning algorithms have this advantage of learning representations, which capture the underlying factors, a subset that might be applicable for each particular sub-task. A simple illustration is given in the following figure: The figure illustrates of representation learning. The middle layers are able to discover the explanatory factors (hidden layers, in blue rectangular boxes). Some of the factors explain each task's target, whereas some explain the inputs. However, while dealing with extracting some high-level data and features from a huge amount of raw data, which requires some sort of human-level understanding, has shown its limitations. There can be many such following examples: Differentiating the cry of two similar age babies. Identifying the image of a cat's eye in both day and night times. This becomes clumsy, because a cat's eyes glow at night unlike during daytime. In all these preceding edge cases, representation learning does not appear to behave exceptionally, and shows deterrent behavior. Deep learning, a sub-field of machine learning can rectify this major problem of representation learning by building multiple levels of representations or learning a hierarchy of features from a series of other simple representations and features [2] [8]. The figure shows how a deep learning system can represent the human image through identifying various combinations such as corners, contours, which can be defined in terms of edges. The preceding figure shows an illustration of a deep learning model. It is generally a cumbersome task for the computer to decode the meaning of raw unstructured input data, as represented by this image, as a collection of different pixel values. A mapping function, which will convert the group of pixels to identify the image, is, ideally, difficult to achieve. Also, to directly train the computer for these kinds of mapping looks almost insuperable. For these types of tasks, deep learning resolves the difficulty by creating a series of subset of mappings to reach the desired output. Each subset of mapping corresponds to a different set of layer of the model. The input contains the variables that one can observe, and hence, represented in the visible layers. From the given input, we can incrementally extract the abstract features of the data. As these values are not available or visible in the given data, these layers are termed as hidden layers. In the image, from the first layer of data, the edges can easily be identified just by a comparative study of the neighboring pixels. The second hidden layer can distinguish the corners and contours from the first layer's description of the edges. From this second hidden layer, which describes the corners and contours, the third hidden layer can identify the different parts of the specific objects. Ultimately, the different objects present in the image can be distinctly detected from the third layer. Image reprinted with permission from Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, published by The MIT Press. Deep learning started its journey exclusively since 2006, Hinton et al. in 2006[2]; also Bengio et al. in 2007[3] initially focused on the MNIST digit classification problem. In the last few years, deep learning has seen major transitions from digits to object recognition in natural images. One of the major breakthroughs was achieved by Krizhevsky et al. in 2012 [4] using the ImageNet dataset 4. The scope of this book is mainly limited to deep learning, so before diving into it directly, the necessary definitions of deep learning should be provided. Many researchers have defined deep learning in many ways, and hence, in the last 10 years, it has gone through many explanations too! Following are few of the widely accepted definitions: As noted by GitHub, deep learning is a new area of machine learning research, which has been introduced with the objective of moving machine learning closer to one of its original goals: Artificial Intelligence. Deep learning is about learning multiple levels of representation and abstraction, which help to make sense of data such as images, sound, and text. As recently updated by Wikipedia, deep learning is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in the data by using a deep graph with multiple processing layers, composed of multiple linear and non-linear transformations. As the definitions suggest, deep learning can also be considered as a special type of machine learning. Deep learning has achieved immense popularity in the field of data science with its ability to learn complex representation from various simple features. To have an in-depth grip on deep learning, we have listed out a few terminologies. The next topic of this article will help the readers lay a foundation of deep learning by providing various terminologies and important networks used for deep learning. Getting started with deep learning To understand the journey of deep learning in this book, one must know all the terminologies and basic concepts of machine learning. However, if the reader has already got enough insight into machine learning and related terms, they should feel free to ignore this section and jump to the next topic of this article. The readers who are enthusiastic about data science, and want to learn machine learning thoroughly, can follow Machine Learning by Tom M. Mitchell (1997) [5] and Machine Learning: a Probabilistic Perspective (2012) [6]. Image shows the scattered data points of social network analysis. Image sourced from Wikipedia. Neural networks do not perform miracles. But if used sensibly, they can produce some amazing results. Deep feed-forward networks Neural networks can be recurrent as well as feed-forward. Feed-forward networks do not have any loop associated in their graph, and are arranged in a set of layers. A network with many layers is said to be a deep network. In simple words, any neural network with two or more layers (hidden) is defined as deep feed-forward network or feed-forward neural network. Figure 4 shows a generic representation of a deep feed-forward neural network. Deep feed-forward network works on the principle that with an increase in depth, the network can also execute more sequential instructions. Instructions in sequence can offer great power, as these instructions can point to the earlier instruction. The aim of a feed-forward network is to generalize some function f. For example, classifier y=f/(x) maps from input x to category y. A deep feed-forward network modified the mapping, y=f(x; α), and learns the value of the parameter α, which gives the most appropriate value of the function. The following figure shows a simple representation of the deep-forward network to provide the architectural difference with the traditional neural network. Deep neural network is feed-forward network with many hidden layers: Datasets are considered to be the building blocks of a learning process. A dataset can be defined as a collection of interrelated sets of data, which is comprised of separate entities, but which can be used as a single entity depending on the use-case. The individual data elements of a dataset are called data points. The preceding figure gives the visual representation of the following data points: Unlabeled data: This part of data consists of human-generated objects, which can be easily obtained from the surroundings. Some of the examples are X-rays, log file data, news articles, speech, videos, tweets, and so on. Labelled data: Labelled data are normalized data from a set of unlabeled data. These types of data are usually well formatted, classified, tagged, and easily understandable by human beings for further processing. From the top-level understanding, the machine learning techniques can be classified as supervised and unsupervised learning based on how their learning process is carried out. Unsupervised learning In unsupervised learning algorithms, there is no desired output from the given input datasets. The system learns meaningful properties and features from its experience during the analysis of the dataset. During deep learning, the system generally tries to learn from the whole probability distribution of the data points. There are various types of unsupervised learning algorithms too, which perform clustering, which means separating the data points among clusters of similar types of data. However, with this type of learning, there is no feedback based on the final output, that is, there won't be any teacher to correct you! Figure 6 shows a basic overview of unsupervised clustering. A real life example of an unsupervised clustering algorithm is Google News. When we open a topic under Google News, it shows us a number of hyper-links redirecting to several pages. Each of those topics can be considered as a cluster of hyper-links that point to independent links. Supervised learning In supervised learning, unlike unsupervised learning, there is an expected output associated with every step of the experience. The system is given a dataset, and it already knows what the desired output will look like, along with the correct relationship between the input and output of every associated layer. This type of learning is often used for classification problems. A visual representation is given in Figure 7. Real-life examples of supervised learning are face detection, face recognition, and so on. Although supervised and unsupervised learning look like different identities, they are often connected to each other by various means. Hence, that fine line between these two learnings is often hazy to the student fraternity. The preceding statement can be formulated with the following mathematical expression: The general product rule of probability states that for an n number of datasets n ε ℝk, the joint distribution can be given fragmented as follows: The distribution signifies that the appeared unsupervised problem can be resolved by k number of supervised problems. Apart from this, the conditional probability of p (k | n), which is a supervised problem, can be solved using unsupervised learning algorithms to experience the joint distribution of p (n, k): Although these two types are not completely separate identities, they often help to classify the machine learning and deep learning algorithms based on the operations performed. Generally speaking, cluster formation, identifying the density of a population based on similarity, and so on are termed as unsupervised learning, whereas, structured formatted output, regression, classification, and so on are recognized as supervised learning. Semi-supervised learning As the name suggests, in this type of learning, both labelled and unlabeled data are used during the training. It's a class of supervised learning, which uses a vast amount of unlabeled data during training. For example, semi-supervised learning is used in Deep belief network (explained network), a type of deep network, where some layers learn the structure of the data (unsupervised), whereas one layer learns how to classify the data (supervised learning). In semi-supervised learning, unlabeled data from p (n) and labelled data from p (n, k) are used to predict the probability of k, given the probability of n, or p (k | n): Figure shows the impact of a large amount of unlabeled data during the semi-supervised learning technique. Art the top, it shows the decision boundary that the model puts after distinguishing the white and black circle. The figure at the bottom displays another decision boundary, which the model embraces. In that dataset, in addition to two different categories of circles, a collection of unlabeled data (grey circle) is also annexed. This type of training can be viewed as creating the cluster, and then marking those with the labelled data, which moves the decision boundary away from the high-density data region. Figure obtained from Wikipedia. Deep learning networks are all about representation of data. Therefore, semi-supervised learning is, generally, about learning a representation, whose objective function is given by the following: l = f (n) The objective of the equation is to determine the representation-based cluster. The preceding figure depicts the illustration of a semi-supervised learning. Readers can refer to Chapelle et al.'s book [7] to know more about semi-supervised learning methods. So, as we have already got a foundation of what Artificial Intelligence, machine learning, representation learning are, we can move our entire focus to elaborate on deep learning with further description. From the previously mentioned definition of deep learning, two major characteristics of deep learning can be pointed out as follows: A way of experiencing unsupervised and supervised learning of the feature representation through successive knowledge from subsequent abstract layers A model comprising of multiple abstract stages of non-linear information processing Summary In this article, we have explained most of these concepts in detail, and have also classified the various algorithms of deep learning.
Read more
  • 0
  • 0
  • 2480

article-image-notes-field
Packt
03 Jan 2017
7 min read
Save for later

Notes from the field

Packt
03 Jan 2017
7 min read
In this article by Donabel Santos author of the book Tableau 10 Business Intelligence Cookbook would like to offer you perhaps a personal, and maybe a not-so-conventional way to introduce Tableau. I’d like to highlight a few key concepts and tricks that I think would be useful to you as you go along. These are certainly points I highlight on the board whenever I do training on Tableau. If you feel like we are jumping too far ahead, please go ahead and start with the following section Tableau Primer. Come back to this section when you are ready for the tips and tricks. (For more resources related to this topic, see here.) Instead of thinking of Tableau as this software tool that has a steep learning curve, it is useful to think of it as a blank slate. You will draw on it, keep on adding things, removing things until something makes sense or something insightful pops out. After you work with Tableau for a while and get more comfortable with its functionalities, it might even feel like an extension of your brain to some degree. When you get access to data, you might automatically open Tableau to try and understand what’s in that data. Undo is your best friend Do not be afraid to make mistakes, and do not be afraid to explore in Tableau. Do not come in with strict prejudice – for example thinking that you can only use a time series graph when you have a measure and a date field. The best way to learn and explore how powerful Tableau is to try anything and everything. It’s one of the best tools to experiment. If you make a mistake, or if you don’t like what you see, no sweat. Just click on this friendly undo button and you are back to your previous view. If you are more of a shortcut person, it will be Ctrl + Z on a PC or Command + Z on a Mac. It doesn’t change your original data This is another common concern that comes up in my training sessions or whenever I talk to people about Tableau. No, Tableau does not write back to your data source. All the changes you make will be stored in Tableau like creating calculated fields, changing data types, editing aliases will be stored in your Tableau workbook or data source. Drag and drop Tableau is a highly drag and drop software. Although you can use the menu or a right click instead of a drag and drop for the same tasks, dragging and dropping is often faster. It also flows with your train of thought. Look for visual cues Tableau leverages its visual culture in your design area, so when you create views in Tableau, some of the visual cues and icons can help you along the way. A number of the visual cues have been discussed in this section. However, there may be some lesser known (or less noticeable) visual cues: Italicized field names mean they are Tableau-generated fields: Dual axis charts create fused pills. Notice the area when the two pills touch – they’re straight instead of curved: When you zoom in to maps, or when you search for a place, your map gets pinned (or fixed to this place) until you unpin it: Know the difference between blue (discrete) and green (continuous) Knowing the difference between blue and green will take you far in the Tableau world. The data type icons you will find beside your field names in the side bar are colored either blue or green. When you drag fields onto shelves and cards, the pills are also colored blue and green. Simply speaking, blue means discrete and green means continuous. Discrete means individual, separate, countable and finite. Continuous means range, and technically, there is an infinite number of values within this range. What’s more important is how these are manifested in Tableau. A blue discrete field will produce header, and a green continuous field will produce an axis. If dropped onto the Color shelf, for example, a blue discrete field will use individual, finite colors. A green continuous field will use a range (gradient) of colors. Some confusion also arises when we see that, by default, Tableau places numeric fields under Measures and are colored green, and categorical information under Dimensions are colored blue. These won’t always be the case. We can have numeric values that are discrete – for example an Order Number. We can also see non-numerical, discrete fields under Measures. Learn a few key shortcuts Shortcuts are great, but it’s typically faster to work when you know a few of them. Here are some of my favorite shortcuts: Shortcut What it does Right click + Drag Opens the Drop Field menu, which allows you to specify exactly which variation of the field you want to use Double click Adds the field to the view I particularly like this when creating text tables. After you place your first measure in Text, you can add more measures to your text table by double clicking on the succeeding measures Ctrl + Arrow Adjusts the height/width of the rows/columns in the view Ctrl + H Presentation mode You can find the complete list of shortcuts here: http://bit.ly/tableau-shortcuts Unpackage option The .twbx file is a Tableau packaged workbook, which means it packages local files with your Tableau workbook. When you right click a .twbx file in a machine that has Tableau Desktop installed in it, you will see a new option called Unpackage. When you unpack a .twbx file, you will get the .twb file and another folder that contains all the local files that were used in the original workbook: Just keep in mind that data (at least the file-based data sources and extracts) get packaged with your .twbx files. This is an important security and data governance consideration when you are deciding how to share your workbooks with others. Table calculations are calculations on your table. How you structure or lay out your table (or view) will affect your table calculations. Table calculations are highly influenced by: Layout Filters Scope and Direction Let’s say, for example, you are calculating Percent of Total in your view. If you swap the fields in your Rows and Columns, i.e. changing the layout, your numbers will change If you filter some of the products out, your numbers will change If you decide to compute Pane Down instead of Table Across, your numbers will change If you’re looking for the common use cases for table calculations, check out the Tableau article entitled Top 10 Tableau Table Calculations which can be found here: http://bit.ly/top10tablecalcs LODs Rock Many of the tasks that required complex table calculations or data blending have been greatly simplified by LODs (Level of Detail expressions). LODs allow us to have multiple levels of detail within a single view, and this increases the possibilities in Tableau. To learn more about Level of Detail expressions, I encourage you to check out the following: Understanding Level of Detail Expressions: http://bit.ly/UnderstandingLOD Top 15 LOD Expressions: http://bit.ly/top15LOD It is possible …. Another common question that comes up is can I do <this> or is it possible to do <this>. The answer to many of the questions is yes, and many will include calculations and/or parameters. However, not all solutions will be quick and straightforward. Some may require multiple calculated fields, table calculations, LOD expressions, regular expressions, R scripts etc. Summary In this article we have seen the basics of Tableau as this software tool that has a steep learning curve, it is useful to think of it as a blank slate. You will draw on it, keep on adding things, removing things until something makes sense or something insightful pops out. After you work with Tableau for a while and get more comfortable with its functionalities, it might even feel like an extension of your brain to some degree. When you get access to data, you might automatically open Tableau to try and understand what’s in that data. Resources for Article: Further resources on this subject: Say Hi to Tableau [article] Getting Started with Tableau Public [article] R and its Diverse Possibilities [article]
Read more
  • 0
  • 0
  • 3356
article-image-dimensionality-reduction
Packt
03 Jan 2017
15 min read
Save for later

Dimensionality Reduction

Packt
03 Jan 2017
15 min read
In this article by Ashish Kumar and Avinash Paul the authors of the book Mastering Text Mining with R, we will Data volume and high dimensions pose an astounding challenge in text mining tasks. Inherent noise and the computational cost of processing huge amount of datasets make it even more sarduous. The science of dimensionality reduction lies in the art of losing out on only a commensurately small amount of information and still being able to reduce the high dimension space into a manageable proportion. (For more resources related to this topic, see here.) For classification and clustering techniques to be applied to text data, for different natural language processing activities, we need to reduce the dimensions and noise in the data so that each document can be represented using fewer dimensions, thus significantly reducing the noise that can hinder the performance. The curse of dimensionality Topic modeling and document clustering are common text mining activities, but the text data can be very high-dimensional, which can cause a phenomenon called curse of dimensionality. Some literature also calls it concentration of measure: Distance is attributed to all the dimensions and assumes each of them to have the same effect on the distance. The higher the dimensions, the more similar things appear to each other. The similarity measures do not take into account the association of attributes, which may result in inaccurate distance estimation. The number of samples required per attribute increases exponentially with the increase in dimensions. A lot of dimensions might be highly correlated with each other, thus causing multi-collinearity. Extra dimensions cause a rapid volume increase that can result in high sparsity, which is a major issue in any method that requires statistical significance. Also, it causes huge variance in estimates, near duplicates, and poor predictors. Distance concentration and computational infeasibility Distance concentration is a phenomenon associated with high-dimensional space wherein pairwise distances or dissimilarity between points appear indistinguishable. All the vectors in high dimensions appear to be orthogonal to each other. The distances between each data point to its neighbors, farthest or nearest, become equal. This totally jeopardizes the utility of methods that use distance based measures. Let's consider that the number of samples is n and the number of dimensions is d. If d is very large, the number of samples may prove to be insufficient to accurately estimate the parameters. For the datasets with number of dimensions d, the number of parameters in the covariance matrix will be d^2. In an ideal scenario, n should be much larger than d^2, to avoid overfitting. In general, there is an optimal number of dimensions to use for a given fixed number of samples. While it may feel like a good idea to engineer more features, if we are not able to solve a problem with less number of features. But the computational cost and model complexity increases with the rise in number of dimensions. For instance, if n number of samples look to be dense enough for a one-dimensional feature space. For a k-dimensional feature space, n^k samples would be required. Dimensionality reduction Complex and noisy characteristics of textual data with high dimensions can be handled by dimensionality reduction techniques. These techniques reduce the dimension of the textual data while still preserving its underlying statistics. Though the dimensions are reduced, it is important to preserve the inter-document relationships. The idea is to have minimum number of dimensions, which can preserve the intrinsic dimensionality of the data. A textual collection is mostly represented in form of a term document matrix wherein we have the importance of each term in a document. The dimensionality of such a collection increases with the number of unique terms. If we were to suggest the simplest possible dimensionality reduction method, that would be to specify the limit or boundary on the distribution of different terms in the collection. Any term that occurs with a significantly high frequency is not going to be informative for us, and the barely present terms can undoubtedly be ignored and considered as noise. Some examples of stop words are is, was, then, and the. Words that generally occur with high frequency and have no particular meaning are referred to as stop words. Words that occur just once or twice are more likely to be spelling errors or complicated words, and hence both these and stop words should not be considered for modeling the document in the Term Document Matrix (TDM). We will discuss a few dimensionality reduction techniques in brief and dive into their implementation using R. Principal component analysis Principal component analysis (PCA) reveals the internal structure of a dataset in a way that best explains the variance within the data. PCA identifies patterns to reduce the dimensions of the dataset without significant loss of information. The main aim of PCA is to project a high-dimensional feature space into a smaller subset to decrease computational cost. PCA helps in computing new features, which are called principal components; these principal components are uncorrelated linear combinations of the original features projected in the direction of higher variability. The important point is to map the set of features into a matrix, M, and compute the eigenvalues and eigenvectors. Eigenvectors provide simpler solutions to problems that can be modeled using linear transformations along axes by stretching, compressing, or flipping. Eigenvalues provide the length and magnitude of eigenvectors where such transformations occur. Eigenvectors with greater eigenvalues are selected in the new feature space because they enclose more information than eigenvectors with lower eigenvalues for a data distribution. The first principle component has the greatest possible variance, that is, the largest eigenvalues compared with the next principal component uncorrelated, relative to the first PC. The nth PC is the linear combination of the maximum variance that is uncorrelated with all previous PCs. PCA comprises the following steps: Compute the n-dimensional mean of the given dataset. Compute the covariance matrix of the features. Compute the eigenvectors and eigenvalues of the covariance matrix. Rank/sort the eigenvectors by descending eigenvalue. Choose x eigenvectors with the largest eigenvalues. Eigenvector values represent the contribution of each variable to the principal component axis. Principal components are oriented in the direction of maximum variance in m-dimensional space. PCA is one of the most widely used multivariate methods for discovering meaningful, new, informative, and uncorrelated features. This methodology also reduces dimensionality by rejecting low-variance features and is useful in reducing the computational requirements for classification and regression analysis. Using R for PCA R also has two inbuilt functions for accomplishing PCA: prcomp() and princomp(). These two functions expect the dataset to be organized with variables in columns and observations in rows and has a structure like a data frame. They also return the new data in the form of a data frame, and the principal components are given in columns. prcomp() and princomp() are similar functions used for accomplishing PCA; they have a slightly different implementation for computing PCA. Internally, the princomp() function performs PCA using eigenvectors. The prcomp() function uses a similar technique known as singular value decomposition (SVD). SVD has slightly better numerical accuracy, so prcomp() is generally the preferred function. princomp() fails in situations if the number of variables is larger than the number of observations. Each function returns a list whose class is prcomp() or princomp(). The information returned and terminology is summarized in the following table: prcomp() princomp() Explanation sdev sdev Standard deviation of each column Rotations Loading Principle components Center Center Subtracted value of each row or column to get the center data Scale Scale Scale factors used X Score The rotated data   n.obs Number of observations of each variable   Call The call to function that created the object Here's a list of the functions available in different R packages for performing PCA: PCA(): FactoMineR package acp(): amap package prcomp(): stats package princomp(): stats package dudi.pca(): ade4 package pcaMethods: This package from Bioconductor has various convenient  methods to compute PCA Understanding the FactoMineR package FactomineR is a R package that provides multiple functions for multivariate data analysis and dimensionality reduction. The functions provided in the package not only deals with quantitative data but also categorical data. Apart from PCA, correspondence and multiple correspondence analyses can also be performed using this package: library(FactoMineR) data<-replicate(10,rnorm(1000)) result.pca = PCA(data[,1:9], scale.unit=TRUE, graph=T) print(result.pca) Results for the principal component analysis (PCA). The analysis was performed on 1,000 individuals, described by nine variables. The results are available in the following objects: Name Description $eig Eigenvalues $var Results for the variables $var$coord coord. for the variables $var$cor Correlations variables - dimensions $var$cos2 cos2 for the variables $var$contrib Contributions of the variables $ind Results for the individuals $ind$coord coord. for the individuals $ind$cos2 cos2 for the individuals $ind$contrib Contributions of the individuals $call Summary statistics $call$centre Mean of the variables $call$ecart.type Standard error of the variables $call$row.w Weights for the individuals $call$col.w Weights for the variables Eigenvalue percentage of variance cumulative percentage of variance: comp 1 1.1573559 12.859510 12.85951 comp 2 1.0991481 12.212757 25.07227 comp 3 1.0553160 11.725734 36.79800 comp 4 1.0076069 11.195632 47.99363 comp 5 0.9841510 10.935011 58.92864 comp 6 0.9782554 10.869505 69.79815 comp 7 0.9466867 10.518741 80.31689 comp 8 0.9172075 10.191194 90.50808 comp 9 0.8542724 9.491916 100.00000 Amap package Amap is another package in the R environment that provides tools for clustering and PCA. It is an acronym for Another Multidimensional Analysis Package. One of the most widely used functions in this package is acp(), which does PCA on a data frame. This function is akin to princomp() and prcomp(), except that it has slightly different graphic represention. For more intricate details, refer to the CRAN-R resource page: https://cran.r-project.org/web/packages/lLibrary(amap/amap.pdf Library(amap acp(data,center=TRUE,reduce=TRUE) Additionally, weight vectors can also be provided as an argument. We can perform a robust PCA by using the acpgen function in the amap package: acpgen(data,h1,h2,center=TRUE,reduce=TRUE,kernel="gaussien") K(u,kernel="gaussien") W(x,h,D=NULL,kernel="gaussien") acprob(x,h,center=TRUE,reduce=TRUE,kernel="gaussien") Proportion of variance We look to construct components and to choose from them, the minimum number of components, which explains the variance of data with high confidence. R has a prcomp() function in the base package to estimate principal components. Let's learn how to use this function to estimate the proportion of variance, eigen facts, and digits: pca_base<-prcomp(data) print(pca_base) The pca_base object contains the standard deviation and rotations of the vectors. Rotations are also known as the principal components of the data. Let's find out the proportion of variance each component explains: pr_variance<- (pca_base$sdev^2/sum(pca_base$sdev^2))*100 pr_variance [1] 11.678126 11.301480 10.846161 10.482861 10.176036 9.605907 9.498072 [8] 9.218186 8.762572 8.430598 pr_variance signifies the proportion of variance explained by each component in descending order of magnitude. Let's calculate the cumulative proportion of variance for the components: cumsum(pr_variance) [1] 11.67813 22.97961 33.82577 44.30863 54.48467 64.09057 73.58864 [8] 82.80683 91.56940 100.00000 Components 1-8 explain the 82% variance in the data. Singular vector decomposition Singular vector decomposition (SVD) is a dimensionality reduction technique that gained a lot of popularity in recent times after the famous Netflix Movie Recommendation challenge. Since its inception, it has found its usage in many applications in statistics, mathematics, and signal processing. It is primarily a technique to factorize any matrix; it can be real or a complex matrix. A rectangular matrix can be factorized into two orthonormal matrices and a diagonal matrix of positive real values. An m*n matrix is considered as m points in n-dimensional space; SVD attempts to find the best k dimensional subspace that fits the data: SVD in R is used to compute approximations of singular values and singular vectors of large-scale data matrices. These approximations are made using different types of memory-efficient algorithm, and IRLBA is one of them (named after Lanczos bi-diagonalization (IRLBA) algorithm). We shall be using the irlba package here in order to implement SVD. Implementation of SVD using R The following code will show the implementation of SVD using R: # List of packages for the session packages = c("foreach", "doParallel", "irlba") # Install CRAN packages (if not already installed) inst <- packages %in% installed.packages() if(length(packages[!inst]) > 0) install.packages(packages[!inst]) # Load packages into session lapply(packages, require, character.only=TRUE) # register the parallel session for registerDoParallel(cores=detectCores(all.tests=TRUE)) std_svd <- function(x, k, p=25, iter=0 1 ) { m1 <- as.matrix(x) r <- nrow(m1) c <- ncol(m1) p <- min( min(r,c)-k,p) z <- k+p m2 <- matrix ( rnorm(z*c), nrow=c, ncol=z) y <- m1 %*% m2 q <- qr.Q(qr(y)) b<- t(q) %*% m1 #iterations b1<-foreach( i=i1:iter ) %dopar% { y1 <- m1 %*% t(b) q1 <- qr.Q(qr(y1)) b1 <- t(q1) %*% m1 } b1<-b1[[iter]] b2 <- b1 %*% t(b1) eigens <- eigen(b2, symmetric=T) result <- list() result$svalues <- sqrt(eigens$values)[1:k] u1=eigens$vectors[1:k,1:k] result$u <- (q %*% eigens$vectors)[,1:k] result$v <- (t(b) %*% eigens$vectors %*% diag(1/eigens$values))[,1:k] return(result) } svd<- std_svd(x=data,k=5)) # singular vectors svd$svalues [1] 35.37645 33.76244 32.93265 32.72369 31.46702 We obtain the following values after running SVD using the IRLBA algorithm: d: approximate singular values. u: nu approximate left singular vectors v: nv approximate right singular vectors iter: # of IRLBA algorithm iterations mprod: # of matrix vector products performed These values can be used for obtaining results of SVD and understanding the overall statistics about how the algorithm performed. Latent factors # svd$u, svd$v dim(svd$u) #u value after running IRLBA [1] 1000 5 dim(svd$v) #v value after running IRLBA [1] 10 5 A modified version of the previous function can be achieved by altering the power iterations for a robust implementation: foreach( i = 1:iter )%dopar% { y1 <- m1 %*% t(b) y2 <- t(y1) %*% y1 r2 <- chol(y2, pivot = T) q1 <- y2 %*% solve(r2) b1 <- t(q1) %*% m1 } b2 <- b1 %*% t(b1) Some other functions available in R packages are as follows: Functions Package svd() svd Irlba() irlba svdImpute bcv ISOMAP – moving towards non-linearity ISOMAP is a nonlinear dimension reduction method and is representative of isometric mapping methods. ISOMAP is one of the approaches for manifold learning. ISOMAP finds the map that preserves the global, nonlinear geometry of the data by preserving the geodesic manifold inter-point distances. Like multi-dimensional scaling, ISOMAP creates a visual presentation of distance of a number of objects. Geodesic is the shortest curve along the manifold connecting two points induced by a neighborhood graph. Multi-dimensional scaling uses the Euclidian distance measure; since the data is in a nonlinear format, ISOMPA uses geodesic distance. ISOMAP can be viewed as an extension of metric multi-dimensional scaling. At a very high level, ISOMAP can be describes in four steps: Determine the neighbor of each point Construct a neighborhood graph Compute the shortest distance path between all pairs Construct k-dimensional coordinate vectors by applying MDS Geodesic distance approximation is basically calculated in three ways: Neighboring points: input-space distance Faraway points: a sequence of short hops between neighboring points Method: Finding shortest paths in a graph with edges connecting neighboring data points source("http://bioconductor.org/biocLite.R") biocLite("RDRToolbox") library('RDRToolbox') swiss_Data=SwissRoll(N = 1000, Plot=TRUE) x=SwissRoll() open3d() plot3d(x, col=rainbow(1050)[-c(1:50)],box=FALSE,type="s",size=1) simData_Iso = Isomap(data=swiss_Data, dims=1:10, k=10,plotResiduals=TRUE) library(vegan)data(BCI) distance <- vegdist(BCI) tree <- spantree(dis) pl1 <- ordiplot(cmdscale(dis), main="cmdscale") lines(tree, pl1, col="red") z <- isomap(distance, k=3) rgl.isomap(z, size=4, color="red") pl2 <- plot(isomap(distance, epsilon=0.5), main="isomap epsilon=0.5") pl3 <- plot(isomap(distance, k=5), main="isomap k=5") pl4 <- plot(z, main="isomap k=3") Summary The idea of this article was to get you familiar with some of the generic dimensionality reduction methods and their implementation using R language. We discussed a few packages that provide functions to perform these tasks. We also covered a few custom functions that can be utilized to perform these tasks. Kudos, you have completed the basics of text mining with R. You must be feeling confident about various data mining methods, text mining algorithms (related to natural language processing of the texts) and after reading this article, dimensionality reduction. If you feel a little low on confidence, do not be upset. Turn a few pages back and try implementing those tiny code snippets on your own dataset and figure out how they help you understand your data. Remember this - to mine something, you have to get into it by yourself. This holds true for text as well. Resources for Article: Further resources on this subject: Data Science with R [Article] Machine Learning with R [Article] Data mining [Article]
Read more
  • 0
  • 0
  • 4433

article-image-recommendation-engines-explained
Packt
02 Jan 2017
10 min read
Save for later

Recommendation Engines Explained

Packt
02 Jan 2017
10 min read
In this article written by Suresh Kumar Gorakala, author of the book Building Recommendation Engines we will learn how to build a basic recommender system using R. In this article we will learn about various types of recommender systems in detail. This article explains neighborhood-similarity-based recommendations, personalized recommendation engines, model-based recommender systems and hybrid recommendation engines. Following are the different subtypes of recommender systems covered in this article: Neighborhood-based recommendation engines User-based collaborative filtering Item-based collaborative filtering Personalized recommendation engines Content-based recommendation engines Context-aware recommendation engines (For more resources related to this topic, see here.) Neighborhood-based recommendation engines As the name suggests, neighborhood-based recommender systems considers the preferences or likes of other users in the neighborhood before making suggestions or recommendations to the active user. While considering the preferences or tastes of neighbors, we first calculate how similar the other users are to the active user and then new items from more similar users are recommended to the user. Here the active user is the person to whom the system is serving recommendations. Since similarity calculations are involved these recommender systems are also called similarity-based recommender systems. Also since preferences or tastes are considered collaboratively from a pool of users these recommender systems are also called as collaborative filtering recommender systems. In this type of systems the main actors are the users, products and users preference information such as rating/ranking/liking towards the products. Preceding image is an example from Amazon showing collaborative filtering The collaborative filtering systems come in two flavors, They are: User-based collaborative filtering Item-based collaborative filtering Collaborative filtering When we have only the users interaction data of the products such as ratings, like/unlike, view/not viewed and we have to recommend new products then we have to choose Collaborative filtering approach. User-based Collaborative Filtering The basic intuition behind user-based collaborative filtering systems is, people with similar tastes in the past like similar items in future as well. For example, if user A and user B have very similar purchase history and if user A buys a new book which user B has not yet seen then we can suggest this book to User B as they have similar tastes. Item-based Collaborative filtering In this type of recommender systems unlike user-based collaborative filtering, we use similarity between items instead of similarity between users. Basic intuition for item-based recommender systems is if a User likes item A in past he might like item B which is similar to item A. In this approach instead of calculating similarity between users we calculate similarity between items or products. The most common similarity measure used for this approach is cosine similarity. Like user-based collaborative approach, we project the data into vector space and similarity between items is calculated using cosine angle between items. Similar to user-based collaborative filtering approach there are two steps for item-based collaborative approach. They are: Calculating the similarity between items. Predicting the ratings for the non rated item for a active user by making use of previous ratings given to other similar items Advantages of user-based collaborative filtering Easy to implement Neither the content information of the products nor users profile information is required for building recommendations New items are recommended to users giving Surprise factor to the users Disadvantages of user-based collaborative filtering This approach is computationally expensive as all the user information, product, rating information is loaded into the memory for similarity calculations. This approach fails for new users where we do not have any information about the users. This problem is called cold-start problem. This approach performs very poor if we have little data. Since we do not have content information about users or products. We cannot generate recommendations accurately only based on rating information only. Content-based recommender systems The recommendations are generated by considering only the rating or interaction information of the products by the users, that is suggesting new items for the active user are based on the ratings given to those new items by similar users to the active user. Assume the case of a person has given 4 star rating to a movie. In a collaborative filtering approach we only consider this rating information for generating recommendations. In real, a person rates a movie based on the features or content of the movie such as its genre, actor, director, story, and screenplay. Also the person watches a movie based on his personal choices. When we are building a recommendation engine to target users at personal level, the recommendations should not be based on the tastes of other similar people but should be based on the individual users’ tastes and the contents of the products. A recommendation which is targeted at personalized level and which considers individual preferences and contents of the products for generating recommendations are called content-based recommender systems. Another motivation for building content-based recommendation engines are they solve the cold start problem which new users face in collaborative filtering approach. When a new user comes based on the preferences of the person we can suggest new items which are similar to his tastes. Building content-based recommender systems involves three main steps as follows: Generating content information for products. Generating a user profile, preferences with respect to the features of the products. Generating recommendations predicting list of items which the user might like. Let us discuss each step in detail: Content extraction: In this step, we extract the features that represent the product. Most commonly the content of the products is represented in the vector space model with products name as rows and features as columns. User Profile generation: In this step, we build the user profile or preference matrix or vector space model matching the products content. Generating Recommendations: Now that, we have the generated the product content and user profile, the next step will be to generate the recommendations. Recommender systems using machine learning or any other mathematical, statistical models to generate recommendations are called as model-based systems Cosine similarity In this approach we first represent the user profiles and product content in vector forms and then we take cosine angle between each vector. The product which forms less angle with the user profile is considered as the most preferable item for the user. This approach is a standard approach while using neighborhood approach for Content based recommendations. Empirical studies shown that this approach gives more accurate results compared to other similarity measures. Classification-based approach Classification-based approaches fall under model-based recommender systems. In this approach, first we build a machine learning model by using the historical information, with user profile similar to the product content as input and the like/dislike of the product as output response classes. Supervised classification tasks such as logistic regression, KNN-classification methods, probabilistic methods and so on can be used. Advantages Content-based recommender systems are targeting at individual level Recommendations are generated using the user preferences alone unlike from user community as in collaborative filtering This approaches can be employed at real time as recommendation model doesn’t need to load the entire data for processing or generating recommendations Accuracy is high compared to collaborative approaches as they deal with the content of the products instead of rating information alone Cold start problem can be easily handled Disadvantages As the system is more personalized and the generated recommendations will become narrowed down to only user preferences with more and more user information comes into the system As a result, no new products that are not related to the user preferences will be shown to the user The user will not be able to look at what is happening around or what’s trending around Context-aware recommender Systems Over the years there has been evolution in recommender systems from neighborhood approaches to personalized recommender systems which are targeted to the individual users. These personalized recommender systems have become a huge success as this is useful at end user level and for organizations these systems become catalysts to increase their business. The personalized recommender systems, also called as content-based recommender systems are also getting evolved into Context aware recommender systems. Though the personalized recommender systems are targeted at individual user level and caters recommendations based on the personal preferences of the users, still there was scope to improve or refine the systems. Same person at different places might have different requirements. Likewise same person has different requirements at different times. Our intelligent recommender systems should be evolved enough to cater to the needs of the users for different places, at different times. Recommender System should be robust enough to suggest cotton shirts to a person during summer and suggesting Leather Jacket in winter. Similarly based on the time of the day suggesting Good restaurants serving a person’s personal choice breakfast and dinner would be very helpful. These kinds of recommender systems which considers location, time, mood, and so on that defines the context of user and suggests personalized recommendations are called context aware recommender systems. At broad level, context aware recommender systems are content-based recommenders with the inclusion of new dimension called context. In context aware systems, recommendations are generated in two steps: Generating list of recommendations of products for each user based on users’ preferences, that is content-based recommendations. Filtering out the recommendations that are specific to a current context. For example, based on past transaction history, interaction information, browsing patterns, ratings information on e-wallet mobile app, assume that User A is a movie lover, Sports lover, fitness freak. Using this information the content-based recommender systems generate recommendations of products such as Movie Tickets, 4G data offer for watching Football matches, Discount offers at GYM. Now based on the GPS co-ordinates of the mobile if the User A found to be at a 10K RUN marathon, then my Context aware recommendation engine will take this location information as the context and filters out the offers that are relevant to the current context and recommends Discount Offers at GYM to the user A. Most common approaches for building Context Aware Recommender systems are: Post filtering Approaches Pre-filtering approaches Pre-filtering approaches In pre-filtering approach, context information is applied to the User profile and product content. This step will filter out all the non relevant features and final personalized recommendations are generated on remaining feature set. Since filtering of features are made before generating personalized recommendations, these are called pre-filtering approaches. Post filtering approaches In post-filtering, firstly personalized recommendations are generated based on the user profile and product catalogue then the context information is applied for filtering out the relevant products to the user for the current context. Advantages Context aware systems are much advanced than the personalized content-based recommenders as these systems will be constantly in sync with user movements and generate recommendations as per current context. These systems are more real-time nature. Disadvantages Serendipity or surprise factor as in other personalized recommenders will be missing in this type of recommendations as well. Summary In this article, we have learned about popular recommendation engine techniques such as, collaborative filtering, content-based recommendations, context aware systems, hybrid recommendations, model-based recommendation systems with their advantages and disadvantages. Different similarity methods such as cosine similarity, Euclidean distance, Pearson-coefficient. Sub categories within each of the recommendations are also explained. Resources for Article: Further resources on this subject: Building a Recommendation Engine with Spark [article] Machine Learning Tasks [article] Machine Learning with R [article]
Read more
  • 0
  • 0
  • 53536

article-image-implementing-rethinkdb-query-language
Packt
23 Dec 2016
5 min read
Save for later

Implementing RethinkDB Query Language

Packt
23 Dec 2016
5 min read
In this article by Shahid Shaikh, the author of the book Mastering RethinkDB, we will cover how you will perform geospatial queries (such as finding all the documents with locations within 5km of a given point). (For more resources related to this topic, see here.) Performing MapReduce operations MapReduce is the programming model to perform operations (mainly aggregation) on distributed set of data across various clusters in different servers. This concept was coined by Google and been used in Google file system initially and later been adopted in open source Hadoop project. MapReduce works by processing the data on each server and then combine it together to form a result set. It actually divides into two operations namely map and reduce. Map: It performs the transformation of the elements in the group or individual sequence Reduce: It performs the aggregation and combine results from Map into meaningful result set In RethinkDB, MapReduce queries operate in three steps: Group operation: To process the data into groups. This step is optional Map operation: To transform the data or group of data into sequence Reduce operation: To aggregate the sequence data to form resultset So mainly it is Group Map Reduce (GMR) operation. RethinkDB spread the mapreduce query across various clusters in order to improve efficiency. There is specific command to perform this GMR operation; however RethinkDB already integrated them internally to some aggregate functions in order to simplify the process. Let us perform some aggregation operation in RethinkDB. Grouping the data To group the data on basis of field we can use group() ReQL function. Here is sample query on our users table to group the data on the basis of name: rethinkdb.table("users").group("name").run(connection,function(err,cursor) { if(err) { throw new Error(err); } cursor.toArray(function(err,data) { console.log(JSON.stringify(data)); }); }); Here is the output for the same: [ { "group":"John", "reduction":[ { "age":24, "id":"664fced5-c7d3-4f75-8086-7d6b6171dedb", "name":"John" }, { "address":{ "address1":"suite 300", "address2":"Broadway", "map":{ "latitude":"116.4194W", "longitude":"38.8026N" }, "state":"Navada", "street":"51/A" }, "age":24, "id":"f6f1f0ce-32dd-4bc6-885d-97fe07310845", "name":"John" } ] }, { "group":"Mary", "reduction":[ { "age":32, "id":"c8e12a8c-a717-4d3a-a057-dc90caa7cfcb", "name":"Mary" } ] }, { "group":"Michael", "reduction":[ { "age":28, "id":"4228f95d-8ee4-4cbd-a4a7-a503648d2170", "name":"Michael" } ] } ] If you observe the query response, data is group by the name and each group is associated with document. Every matching data for the group resides under reductionarray. In order to work on each reductionarray, you can use ungroup() ReQL function which in turns takes grouped streams of data and convert it into array of object. It's useful to perform the operations such as sorting and so on, on grouped values. Counting the data We can count the number of documents present in the table or a sub document of a document using count() method. Here is simple example: rethinkdb.table("users").count().run(connection,function(err,data) { if(err) { throw new Error(err); } console.log(data); });  It should return the number of documents present in the table. You can also use it count the sub document by nesting the fields and running count() function at the end. Sum We can perform the addition of the sequence of data. If value is passed as an expression then sums it up else searches in the field provided in the query. For example, find out total number of ages of users: rethinkdb.table("users")("age").sum().run(connection,function(err,data) { if(err) { throw new Error(err); } console.log(data); });  You can of course use an expression to perform math operation like this: rethinkdb.expr([1,3,4,8]).sum().run(connection,function(err,data) { if(err) { throw new Error(err); } console.log(data); });  Should return 16. Avg Performs the average of the given number or searches for the value provided as field in query. For example: rethinkdb.expr([1,3,4,8]).avg().run(connection,function(err,data) { if(err) { throw new Error(err); } console.log(data); }); Min and Max Finds out the maximum and minimum number provided as an expression or as field. For example, find out the oldest users in database: rethinkdb.table("users")("age").max().run(connection,function(err,data) { if(err) { throw new Error(err); } console.log(data); }); Same way of finding out the youngest user: rethinkdb.table("users")("age").min().run(connection,function(err,data) { if(err) { throw new Error(err); } console.log(data); }); Distinct Distinct finds and removes the duplicate element from the sequence, just like SQL one. For example, find user with unique name: rethinkdb.table("users")("name").distinct().run(connection,function(err,data) { if(err) { throw new Error(err); } console.log(data); });   It should return an array containing the names: [ 'John', 'Mary', 'Michael' ] Contains Contains look for the value in the field and if found return boolean response, true in case if it contains the value, false otherwise. For example, find the user whose name contains John. rethinkdb.table("users")("name").contains("John").run(connection,function(err,data) { if(err) { throw new Error(err); } console.log(data); }); Should return true. Map and reduce Aggregate functions such as count(), sum() already makes use of map and reduce internally, if required then group() too. You can of course use them explicitly in order to perform various functions. Summary From this article we learned various RethinkDB query language as it will help the readers to know much more basic concept of RethinkDB. Resources for Article: Further resources on this subject: Introducing RethinkDB [article] Amazon DynamoDB - Modelling relationships, Error handling [article] Oracle 12c SQL and PL/SQL New Features [article]
Read more
  • 0
  • 0
  • 1982
article-image-say-hi-tableau
Packt
21 Dec 2016
9 min read
Save for later

Say Hi to Tableau

Packt
21 Dec 2016
9 min read
In this article by Shweta Savale, the author of the book Tableau Cookbook- Recipes for Data Visualization, we will cover how you need to install My Tableau Repository and connecting to the sample data source. (For more resources related to this topic, see here.) Introduction to My Tableau Repository and connecting to the sample data source Tableau is a very versatile tool and it is used across various industries, businesses, and organizations, such as government and non-profit organizations, BFSI sector, consulting, construction, education, healthcare, manufacturing, retail, FMCG, software and technology, telecommunications, and many more. The good thing about Tableau is that it is industry and business vertical agnostic, and hence as long as we have data, we can analyze and visualize it. Tableau can connect to a wide variety of data sources and many of the data sources are implemented as native connections in Tableau. This ensures that the connections are as robust as possible. In order to view the comprehensive list of data sources that Tableau connects to, we can visit the technical specification page on the Tableau website by clicking on the following link: http://www.tableau.com/products/desktop?qt-product_tableau_desktop=1#qt-product_tableau_desktops. Getting ready Tableau provides some sample datasets with the Desktop edition. In this article, we will frequently be using the sample datasets that have been provided by Tableau. We can find these datasets in the Data sources directory in the My Tableau Repository folder, which gets created in our Documents folder when Tableau Desktop is installed on our machine. We can look for these data sources in the repository or we can quickly download it from the link mentioned and save it in a new folder called Tableau Cookbook data under Documents/My Tableau Repository/Datasources. The link for downloading the sample datasets is as follows: https://1drv.ms/f/s!Av5QCoyLTBpngihFyZaH55JpI5BN There are two files that have been uploaded. They are as follows: Microsoft Excel data called Sample - Superstore.xls Microsoft Access data called Sample - Coffee Chain.mdb In the following section, we will see how to connect to the sample data source. We will be connecting to the Excel data called Sample - Superstore.xls. This Excel file contains transactional data for a retail store. There are three worksheets in this Excel workbook. The first sheet, which is called the Orders sheet, contains the transaction details; the Returns sheet contains the status of returned orders, and the People sheet contains the region names and the names of managers associated with those regions. Refer to the following screenshot to get a glimpse of how the Excel data is structured: Now that we have taken a look at the Excel data, let us see how to connect to this Excel data in the following recipe. To begin with, we will work on the Orders sheet of the Sample - Superstore.xls data. This worksheet contains the order details in terms of the products purchased, the name of the customer, Sales, Profits, Discounts offered, day of purchase, order shipment date, among many other transactional details. How to do it… Let’s open Tableau Desktop by double-clicking on the Tableau 10.0 icon on our Desktop. We can also right-click on the icon and select Open. We will see the start page of Tableau, as shown in the following screenshot: We will select the Excel option from under the Connect header on the left-hand side of the screen. Once we do that, we will have to browse the Excel file called Sample - Superstore.xls, which is saved in Documents/My Tableau Repository/Datasources/Tableau Cookbook data. Once we are able to establish a connection to the referred Excel file, we will get a view as shown in the following screenshot: Annotation 1 in the preceding screenshot is the data that we have connected to, and annotation 2 is the list of worksheets/tables/views in our data. Double-click on the Orders sheet or drag and drop the Orders sheet from the left-hand side section into the blank space that says Drag sheets here. Refer to annotation 3 in the preceding screenshot. Once we have selected the Orders sheet, we will get to see the preview of our data, as highlighted in annotation 1 in the following screenshot. We will see the column headers, their data type (#, Abc, and so on), and the individual rows of data: While connecting to a data source, we can also read data from multiple tables/sheets from that data source. However, this is something that we will explore a little later. Further moving ahead, we will need to specify what type of connection we wish to maintain with the data source. Do we wish to connect to our data directly and maintain a Live connectivity with it, or do we wish to import the data into Tableau's data engine by creating an Extract? Refer to annotation 2 in the preceding screenshot. We will understand these options in detail in the next section. However, to begin with, we will select the Live option. Next, in order to get to our Tableau workspace where we can start building our visualizations, we will click on the Go to Worksheet option/ Sheet 1. Refer to annotation 3 in the preceding screenshot. This is how we can connect to data in Tableau. In case we have a database to connect to, then we can select the relevant data source from the list and fill in the necessary information in terms of server name, username, password, and so on. Refer to the following screenshot to see what options we get when we connect to Microsoft SQL Server: How it works… Before we connect to any data, we need to make sure that our data is clean and in the right format. The Excel file that we connected to was stored in a tabular format where the first row of the sheet contains all the column headers and every other row is basically a single transaction in the data. This is the ideal data structure for making the best use of Tableau. Typically, when we connect to databases, we would get columnar/tabular data. However, flat files such as Excel can have data even in cross-tab formats. Although Tableau can read cross-tab data, we may face certain limitations in terms of options for viewing, aggregating, and slicing and dicing our data in Tableau. Having said that, there may be situations where we have to deal with such cross-tab or pre-formatted Excel files. These files will essentially need cleaning up before we pull into Tableau. Refer to the following article to understand more about how we can clean up these files and make them Tableau ready: http://onlinehelp.tableau.com/current/pro/desktop/en-us/help.htm#data_tips.html In case it is a cross-tab file, then we will have to pivot it into normalized columns either at the data level or on the fly at Tableau level. We can do so by selecting multiple columns that we wish to pivot and then selecting the Pivot option from the dropdown that appears when we hover over any of the columns. Refer to the following screenshot: If the format of the data in our Excel file is not suitable for analysis in Tableau, then we can turn on the Data Interpreter option, which becomes available when Tableau detects any unique formatting or any extra information in our Excel file. For example, the Excel data may include some empty rows and columns, or extra headers and footers. Refer to the following screenshot: Data Interpreter can remove that extra information to help prepare our Tableau data source for analysis. Refer to the following screenshot: When we enable the Data Interpreter, the preceding view will change to what is shown in the following screenshot: This is how the Data Interpreter works in Tableau. Now many a times, there may also be situations where our data fields are compounded or clubbed in a single column. Refer to the following screenshot: In the preceding screenshot, the highlighted column is basically a concatenated field that has the Country, City, and State. For our analysis, we may want to break these and analyze each geographic level separately. In order to do so, we simply need to use the Split or Custom Split…option in Tableau. Refer to the following screenshot: Once we do that, our view would be as shown in the following screenshot: When preparing data for analysis, at times a list of fields may be easy to consume as against the preview of our data. The Metadata grid in Tableau allows us to do the same along with many other quick functions such as renaming fields, hiding columns, changing data types, changing aliases, creating calculations, splitting fields, merging fields, and also pivoting the data. Refer to the following screenshot: After having established the initial connectivity by pointing to the right data source, we need to specify as to how we wish to maintain that connectivity. We can choose between the Live option and Extract option. The Live option helps us connect to our data directly and maintains a live connection with the data source. Using this option allows Tableau to leverage the capabilities of our data source and in this case, the speed of our data source will determine the performance of our analysis. The Extract option on the other hand, helps us import the entire data source into Tableau's fast data engine as an extract. This option basically creates a .tde file, which stands for Tableau Data Extract. In case we wish to extract only a subset of our data, then we can select the Edit option, as highlighted in the following screenshot. The Add link in the right corner helps us add filters while fetching the data into Tableau. Refer to the following screenshot: A point to remember about Extract is that it is a snapshot of our data stored in a Tableau proprietary format and as opposed to a Live connection, the changes in the original data won't be reflected in our dashboard unless and until the extract is updated. Please note that we will have to decide between Live and Extract on a case to case basis. Please refer to the following article for more clarity: http://www.tableausoftware.com/learn/whitepapers/memory-or-live-data Summary This article thus helps us to install and connect to sample data sources which is very helpful to create effective dashboards in business environment for statistical purpose. Resources for Article: Further resources on this subject: Getting Started with Tableau Public [article] Data Modelling Challenges [article] Creating your first heat map in R [article]
Read more
  • 0
  • 0
  • 3225

article-image-r-and-its-diverse-possibilities
Packt
16 Dec 2016
11 min read
Save for later

R and its Diverse Possibilities

Packt
16 Dec 2016
11 min read
In this article by Jen Stirrup, the author of the book Advanced Analytics with R and Tableau, We will cover, with examples, the core essentials of R programming such as variables and data structures in R such as matrices, factors, vectors, and data frames. We will also focus on control mechanisms in R ( relational operators, logical operators, conditional statements, loops, functions, and apply) and how to execute these commands in R to get grips before proceeding to article that heavily rely on these concepts for scripting complex analytical operations. (For more resources related to this topic, see here.) Core essentials of R programming One of the reasons for R’s success is its use of variables. Variables are used in all aspects of R programming. For example, variables can hold data, strings to access a database, whole models, queries, and test results. Variables are a key part of the modeling process, and their selection has a fundamental impact on the usefulness of the models. Therefore, variables are an important place to start since they are at the heart of R programming. Variables In the following section we will deal with the variables—how to create variables and working with variables. Creating variables It is very simple to create variables in R, and to save values in them. To create a variable, you simply need to give the variable a name, and assign a value to it. In many other languages, such as SQL, it’s necessary to specify the type of value that the variable will hold. So, for example, if the variable is designed to hold an integer or a string, then this is specified at the point at which the variable is created. Unlike other programming languages, such as SQL, R does not require that you specify the type of the variable before it is created. Instead, R works out the type for itself, by looking at the data that is assigned to the variable. In R, we assign variables using an assignment variable, which is a less than sign (<) followed by a hyphen (-). Put together, the assignment variable looks like so: Working with variables It is important to understand what is contained in the variables. It is easy to check the content of the variables using the lscommand. If you need more details of the variables, then the ls.strcommand will provide you with more information. If you need to remove variables, then you can use the rm function. Data structures in R The power of R resides in its ability to analyze data, and this ability is largely derived from its powerful data types. Fundamentally, R is a vectorized programming language. Data structures in R are constructed from vectors that are foundational. This means that R’s operations are optimized to work with vectors. Vector The vector is a core component of R. It is a fundamental data type. Essentially, a vector is a data structure that contains an array where all of the values are the same type. For example, they could all be strings, or numbers. However, note that vectors cannot contain mixed data types. R uses the c() function to take a list of items and turns them into a vector. Lists R contains two types of lists: a basic list, and a named list. A basic list is created using the list() operator. In a named list, every item in the list has a name as well as a value. named lists are a good mapping structure to help map data between R and Tableau. In R, lists are mapped using the $ operator. Note, however, that the list label operators are case sensitive. Matrices Matrices are two-dimensional structures that have rows and columns. The matrices are lists of rows. It’s important to note that every cell in a matrix has the same type. Factors A factor is a list of all possible values of a variable in a string format. It is a special string type, which is chosen from a specified set of values known as levels. They are sometimes known as categorical variables. In dimensional modeling terminology, a factor is equivalent to a dimension, and the levels represent different attributes of the dimension. Note that factors are variables that can only contain a limited number of different values. Data frames The data frame is the main data structure in R. It’s possible to envisage the data frame as a table of data, with rows and columns. Unlike the list structure, the data frame can contain different types of data. In R, we use the data.frame() command in order to create a data frame. The data frame is extremely flexible for working with structured data, and it can ingest data from many different data types. Two main ways to ingest data into data frames involves the use of many data connectors, which connect to data sources such as databases, for example. There is also a command, read.table(), which takes in data. Data Frame Structure Here is an example, populated data frame. There are three columns, and two rows. The top of the data frame is the header. Each horizontal line afterwards holds a data row. This starts with the name of the row, and then followed by the data itself. Each data member of a row is called a cell. Here is an example data frame, populated with data: Example Data Frame Structure df = data.frame( Year=c(2013, 2013, 2013), Country=c("Arab World","Carribean States", "Central Europe"), LifeExpectancy=c(71, 72, 76)) As always, we should read out at least some of the data frame so we can double-check that it was set correctly. The data frame was set to the df variable, so we can read out the contents by simply typing in the variable name at the command prompt: To obtain the data held in a cell, we enter the row and column co-ordinates of the cell, and surround them by square brackets []. In this example, if we wanted to obtain the value of the second cell in the second row, then we would use the following: df[2, "Country"] We can also conduct summary statistics on our data frame. For example, if we use the following command: summary(df) Then we obtain the summary statistics of the data. The example output is as follows: You’ll notice that the summary command has summarized different values for each of the columns. It has identified Year as an integer, and produced the min, quartiles, mean, and max for year. The Country column has been listed, simply because it does not contain any numeric values. Life Expectancy is summarized correctly. We can change the Year column to a factor, using the following command: df$Year <- as.factor(df$Year) Then, we can rerun the summary command again: summary(df) On this occasion, the data frame now returns the correct results that we expect: As we proceed throughout this book, we will be building on more useful features that will help us to analyze data using data structures, and visualize the data in interesting ways using R. Control structures in R R has the appearance of a procedural programming language. However, it is built on another language, known as S. S leans towards functional programming. It also has some object-oriented characteristics. This means that there are many complexities in the way that R works. In this section, we will look at some of the fundamental building blocks that make up key control structures in R, and then we will move onto looping and vectorized operations. Logical operators Logical operators are binary operators that allow the comparison of values: Operator Description <  less than <= less than or equal to >  greater than >= greater than or equal to == exactly equal to != not equal to !x Not x x | y x OR y x & y x AND y isTRUE(x) test if X is TRUE For loops and vectorization in R Specifically, we will look at the constructs involved in loops. Note, however, that it is more efficient to use vectorized operations rather than loops, because R is vector-based. We investigate loops here, because they are a good first step in understanding how R works, and then we can optimize this understanding by focusing on vectorized alternatives that are more efficient. More information about control flows can be obtained by executing the command at the command line: Help?Control The control flow commands take decisions and make decisions between alternative actions. The main constructs are for, while, and repeat. For loops Let’s look at a for loop in more detail. For this exercise, we will use the Fisher iris dataset, which is installed along with R by default. We are going to produce summary statistics for each species of iris in the dataset. You can see some of the iris data by typing in the following command at the command prompt: head(iris) We can divide the iris dataset so that the data is split by species. To do this, we use the split command, and we assign it to the variable called IrisBySpecies: IrisBySpecies <- split(iris,iris$Species) Now, we can use a for loop in order to process the data in order to summarize it by species. Firstly, we will set up a variable called output, and set it to a list type. For each species held in the IrisBySpecies variable, we set it to calculate the minimum, maximum, mean, and total cases. It is then set to a data frame called output.df, which is printed out to the screen: output <- list() for(n in names(IrisBySpecies)){ ListData <- IrisBySpecies[[n]] output[[n]] <- data.frame(species=n, MinPetalLength=min(ListData$Petal.Length), MaxPetalLength=max(ListData$Petal.Length), MeanPetalLength=mean(ListData$Petal.Length), NumberofSamples=nrow(ListData)) output.df <- do.call(rbind,output) } print(output.df) The output is as follows: We used a for loop here, but they can be expensive in terms of processing. We can achieve the same end by using a function that uses a vector called Tapply. Tapply processes data in groups. Tapply has three parameters; the vector of data, the factor that defines the group, and a function. It works by extracting the group, and then applying the function to each of the groups. Then, it returns a vector with the results. We can see an example of tapply here, using the same dataset: output <- data.frame(MinPetalLength=tapply(iris$Petal.Length,iris$Species,min), MaxPetalLength=tapply(iris$Petal.Length,iris$Species,max), MeanPetalLength=tapply(iris$Petal.Length,iris$Species,mean), NumberofSamples=tapply(iris$Petal.Length,iris$Species,length)) print(output) This time, we get the same output as previously. The only difference is that by using a vectorized function, we have concise code that runs efficiently. To summarize, R is extremely flexible and it’s possible to achieve the same objective in a number of different ways. As we move forward through this book, we will make recommendations about the optimal method to select, and the reasons for the recommendation. Functions R has many functions that are included as part of the installation. In the first instance, let’s look to see how we can work smart by finding out what functions are available by default. In our last example, we used the split() function. To find out more about the split function, we can simply use the following command: ?split Or we can use: help(split) It’s possible to get an overview of the arguments required for a function. To do this, simply use the args command: args(split) Fortunately, it’s also possible to see examples of each function by using the following command: example(split) If you need more information than the documented help file about each function, you can use the following command. It will go and search through all the documentation for instances of the keyword: help.search("split") If you  want to search the R project site from within RStudio, you can use the RSiteSearch command. For example: RSiteSearch("split") Summary In this article, we have looked at various essential structures in working with R. We have looked at the data structures that are fundamental to using R optimally. We have also taken the view that structures such as for loops can often be done better as vectorized operations. Finally, we have looked at the ways in which R can be used to create functions in order to simply code. Resources for Article: Further resources on this subject: Getting Started with Tableau Public [article] Creating your first heat map in R [article] Data Modelling Challenges [article]
Read more
  • 0
  • 0
  • 2982
Modal Close icon
Modal Close icon