Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Data

1210 Articles
article-image-neural-network-architectures-101-understanding-perceptrons
Kunal Chaudhari
12 Feb 2018
9 min read
Save for later

Neural Network Architectures 101: Understanding Perceptrons

Kunal Chaudhari
12 Feb 2018
9 min read
[box type="note" align="" class="" width=""]This article is an excerpt taken from a book Neural Network Programming with Java Second Edition written by Fabio M. Soares and Alan M. F. Souza. This book is for Java developers who want to master developing smarter applications like weather forecasting, pattern recognition etc using neural networks. [/box] In this article we will discuss about perceptrons along with their features, applications and limitations. Perceptrons are a very popular neural network architecture that implements supervised learning. Projected by Frank Rosenblatt in 1957, it has just one layer of neurons, receiving a set of inputs and producing another set of outputs. This was one of the first representations of neural networks to gain attention, especially because of their simplicity. In our Java implementation, this is illustrated with one neural layer (the output layer). The following code creates a perceptron with three inputs and two outputs, having the linear function at the output layer: int numberOfInputs=3; int numberOfOutputs=2; Linear outputAcFnc = new Linear(1.0); NeuralNet perceptron = new NeuralNet(numberOfInputs,numberOfOutputs,             outputAcFnc); Applications and limitations However, scientists did not take long to conclude that a perceptron neural network could only be applied to simple tasks, according to that simplicity. At that time, neural networks were being used for simple classification problems, but perceptrons usually failed when faced with more complex datasets. Let's illustrate this with a very basic example (an AND function) to understand better this issue. Linear separation The example consists of an AND function that takes two inputs, x1 and x2. That function can be plotted in a two-dimensional chart as follows: And now let's examine how the neural network evolves the training using the perceptron rule, considering a pair of two weights, w1 and w2, initially 0.5, and bias valued 0.5 as well. Assume learning rate η equals 0.2: Epoch x1 x2 w1 w2 b y t E Δw1 Δw2 Δb 1 0 0 0.5 0.5 0.5 0.5 0 -0.5 0 0 -0.1 1 0 1 0.5 0.5 0.4 0.9 0 -0.9 0 -0.18 -0.18 1 1 0 0.5 0.32 0.22 0.72 0 -0.72 -0.144 0 -0.144 1 1 1 0.356 0.32 0.076 0.752 1 0.248 0.0496 0.0496 0.0496 2 0 0 0.406 0.370 0.126 0.126 0 -0.126 0.000 0.000 -0.025 2 0 1 0.406 0.370 0.100 0.470 0 -0.470 0.000 -0.094 -0.094 2 1 0 0.406 0.276 0.006 0.412 0 -0.412 -0.082 0.000 -0.082 2 1 1 0.323 0.276 -0.076 0.523 1 0.477 0.095 0.095 0.095 … … 89 0 0 0.625 0.562 -0.312 -0.312 0 0.312 0 0 0.062 89 0 1 0.625 0.562 -0.25 0.313 0 -0.313 0 -0.063 -0.063 89 1 0 0.625 0.500 -0.312 0.313 0 -0.313 -0.063 0 -0.063 89 1 1 0.562 0.500 -0.375 0.687 1 0.313 0.063 0.063 0.063 After 89 epochs, we find the network to produce values near to the desired output. Since in this example the outputs are binary (zero or one), we can assume that any value produced by the network that is below 0.5 is considered to be 0 and any value above 0.5 is considered to be 1. So, we can draw a function , with the final weights and bias found by the learning algorithm w1=0.562, w2=0.5 and b=-0.375, defining the linear boundary in the chart: This boundary is a definition of all classifications given by the network. You can see that the boundary is linear, given that the function is also linear. Thus, the perceptron network is really suitable for problems whose patterns are linearly separable. The XOR case Now let's analyze the XOR case: We see that in two dimensions, it is impossible to draw a line to separate the two patterns. What would happen if we tried to train a single layer perceptron to learn this function? Suppose we tried, let's see what happened in the following table: Epoch x1 x2 w1 w2 b y t E Δw1 Δw2 Δb 1 0 0 0.5 0.5 0.5 0.5 0 -0.5 0 0 -0.1 1 0 1 0.5 0.5 0.4 0.9 1 0.1 0 0.02 0.02 1 1 0 0.5 0.52 0.42 0.92 1 0.08 0.016 0 0.016 1 1 1 0.516 0.52 0.436 1.472 0 -1.472 -0.294 -0.294 -0.294 2 0 0 0.222 0.226 0.142 0.142 0 -0.142 0.000 0.000 -0.028 2 0 1 0.222 0.226 0.113 0.339 1 0.661 0.000 0.132 0.132 2 1 0 0.222 0.358 0.246 0.467 1 0.533 0.107 0.000 0.107 2 1 1 0.328 0.358 0.352 1.038 0 -1.038 -0.208 -0.208 -0.208 … … 127 0 0 -0.250 -0.125 0.625 0.625 0 -0.625 0.000 0.000 -0.125 127 0 1 -0.250 -0.125 0.500 0.375 1 0.625 0.000 0.125 0.125 127 1 0 -0.250 0.000 0.625 0.375 1 0.625 0.125 0.000 0.125 127 1 1 -0.125 0.000 0.750 0.625 0 -0.625 -0.125 -0.125 -0.125 The perceptron just could not find any pair of weights that would drive the following error 0.625. This can be explained mathematically as we already perceived from the chart that this function cannot be linearly separable in two dimensions. So what if we add another dimension? Let's see the chart in three dimensions: In three dimensions, it is possible to draw a plane that would separate the patterns, provided that this additional dimension could properly transform the input data. Okay, but now there is an additional problem: how could we derive this additional dimension since we have only two input variables? One obvious, but also workaround, answer would be adding a third variable as a derivation from the two original ones. And being this third variable a (derivation), our neural network would probably get the following shape: Okay, now the perceptron has three inputs, one of them being a composition of the other. This also leads to a new question: how should that composition be processed? We can see that this component could act as a neuron, so giving the neural network a nested architecture. If so, there would another new question: how would the weights of this new neuron be trained, since the error is on the output neuron? Multi-layer perceptrons As we can see, one simple example in which the patterns are not linearly separable has led us to more and more issue using the perceptron architecture. That need led to the application of multilayer perceptrons. The fact that the natural neural network is structured in layers as well, and each layer captures pieces of information from a specific environment is already established. In artificial neural networks, layers of neurons act in this way, by extracting and abstracting information from data, transforming them into another dimension or shape. In the XOR example, we found the solution to be the addition of a third component that would make possible a linear separation. But there remained a few questions regarding how that third component would be computed. Now let's consider the same solution as a two-layer perceptron: Now we have three neurons instead of just one, but in the output the information transferred by the previous layer is transformed into another dimension or shape, whereby it would be theoretically possible to establish a linear boundary on those data points. However, the question on finding the weights for the first layer remains unanswered, or can we apply the same training rule to neurons other than the output? We are going to deal with this issue in the Generalized delta rule section. MLP properties Multi-layer perceptrons can have any number of layers and also any number of neurons in each layer. The activation functions may be different on any layer. An MLP network is usually composed of at least two layers, one for the output and one hidden layer. There are also some references that consider the input layer as the nodes that collect input data; therefore, for those cases, the MLP is considered to have at least three layers. For the purpose of this article, let's consider the input layer as a special type of layer which has no weights, and as the effective layers, that is, those enabled to be trained, we'll consider the hidden and output layers. A hidden layer is called that because it actually hides its outputs from the external world. Hidden layers can be connected in series in any number, thus forming a deep neural network. However, the more layers a neural network has, the slower would be both training and running, and according to mathematical foundations, a neural network with one or two hidden layers at most may learn as well as deep neural networks with dozens of hidden layers. But it depends on several factors. MLP weights In an MLP feedforward network, one particular neuron i receives data from a neuron j of the previous layer and forwards its output to a neuron k of the next layer: The mathematical description of a neural network is recursive: Here, yo is the network output (should we have multiple outputs, we can replace yo with Y, representing a vector); fo is the activation function of the output; l is the number of hidden layers; nhi is the number of neurons in the hidden layer i; wi is the weight connecting the i th neuron of the last hidden layer to the output; fi is the activation function of the neuron i; and bi is the bias of the neuron i. It can be seen that this equation gets larger as the number of layers increases. In the last summing operation, there will be the inputs xi. Recurrent MLP The neurons on an MLP may feed signals not only to neurons in the next layers (feedforward network), but also to neurons in the same or previous layers (feedback or recurrent). This behavior allows the neural network to maintain state on some data sequence, and this feature is especially exploited when dealing with time series or handwriting recognition. Recurrent networks are usually harder to train, and eventually the computer may run out of memory while executing them. In addition, there are recurrent network architectures better than MLPs, such as Elman, Hopfield, Echo state, Bidirectional RNNs (recurrent neural networks). But we are not going to dive deep into these architectures. Coding an MLP Bringing these concepts into the OOP point of view, we can review the classes already designed so far: One can see that the neural network structure is hierarchical. A neural network is composed of layers that are composed of neurons. In the MLP architecture, there are three types of layers: input, hidden, and output. So suppose that in Java, we would like to define a neural network consisting of three inputs, one output (linear activation function) and one hidden layer (sigmoid function) containing five neurons. The resulting code would be as follows: int numberOfInputs=3; int numberOfOutputs=1; int[] numberOfHiddenNeurons={5};     Linear outputAcFnc = new Linear(1.0); Sigmoid hiddenAcFnc = new Sigmoid(1.0); NeuralNet neuralnet = new NeuralNet(numberOfInputs, numberOfOutputs, numberOfHiddenNeurons, hiddenAcFnc, outputAcFnc); To summarize, we saw how perceptrons can be applied to solve linear separation problems, their limitations in classifying nonlinear data and how to suppress those limitations with multi-layer perceptrons (MLPs). If you enjoyed this excerpt, check out the book Neural Network Programming with Java Second Edition for a better understanding of neural networks and how they fit in different real-world projects.
Read more
  • 0
  • 0
  • 23853

article-image-brad-miro-talks-tensorflow-2-0-features-and-how-google-is-using-it-internally
Sugandha Lahoti
10 Dec 2019
6 min read
Save for later

Brad Miro talks TensorFlow 2.0 features and how Google is using it internally

Sugandha Lahoti
10 Dec 2019
6 min read
TensorFlow 2.0, released in October, has got developers excited about a myriad of features and its ease of use.  At the EuroPython Conference 2019, Brad Miro, developer programs engineer at Google talked about the updates being made to TensorFlow 2.0. He also gave an overview of how Google is using TensorFlow, moving on to why Python is important for TensorFlow development and how to migrate from TF 1.x to TF 2.0. EuroPython is one of the most popular Python programming language community conferences. Below are some highlights from Brad’s talk at EuroPython. What is TensorFlow? TensorFlow, an open-source deep learning library developed at Google, first released in 2015. It’s a Python framework that includes a number of utilities for helping you write deep neural networks supporting both GPUs and TPUs. A lot of deep learning involves using mathematics, statistics, and algebra and perform low-level optimizations with your system. TensorFlow removes a lot of those abstractions leaving you to focus on actually writing your model. How TensorFlow is used internally at Google Tensorflow is used internally at Google to power all of its machine learning and AI. Google’s data centers are powered using AI and TensorFlow to help optimize the usage of these data centers to reduce bandwidth, to ensure network connections are optimized, and to reduce power consumption. TensorFlow also is useful for performing global localization in Google Maps. It is also used heavily in the Google Pixel range of smartphones to help optimize the software. These technologies are also used in medical research specifically in the field of Computer Vision. For example, Tensorflow is used to distinguish between the retinal image of a healthy eye from the retinal image of an eye that has diabetic retinopathy.   Further Learning If you want to learn to build more computer vision applications with TensorFlow 2.0, check out the book Hands-On Computer Vision with TensorFlow 2 by Benjamin Planche, and Eliot Andres. This book by Packt Publishing is a practical guide to building high-performance systems for object detection, segmentation, video processing, smartphone applications, and more. By the end of the book, you will have both the theoretical understanding and practical skills to solve advanced computer vision problems with TensorFlow 2.0. Furthermore, Google is using AI and TensorFlow to predict whether or not objects in space are planets. To summarize, they use AI to predict whether or not fluctuations in the brightness of an object is due to it being a planet.   Why Python is so important for TensorFlow Python has always been the choice for TensorFlow due to the language being extremely easy to use and having a rich ecosystem for data science including tools such as numpy, scikit-learn, and pandas. When TensorFlow was being built, the idea was that it should have the simplicity of numpy, performance of C but ease of use of Python.  What does TensorFlow 2.0 bring to the table TensorFlow 2.0 is powerful, flexible, scalable and easily deployable.  What’s gone Session.run tf.control_dependencies tf.global_variables_initializer tf.cond, tf.while_loop Tf.contrib What’s new Eager execution enabled by default  tf.function Keras as main high level API Distribution Strategy API SavedModel API  TensorFlow 2.0 had a major API Cleanup. Many API symbols are removed or renamed for better consistency and clarity.  Session.run has been replaced with eager execution which effectively means that your tensorflow code runs like numpy code.  Eager execution enables fast iteration and intuitive debugging without building a graph. It also makes creating and experimenting with models using TensorFlow easier. It can be especially useful when using the tf.keras model subclassing API. TensorFlow 2.0 has tf.function, a python decorator that lets you run regular Python code which is later compiled down to TensorFlow code using AutoGraph. The Distribution Strategy API in TensorFlow 2.0 allows machine learning researchers to distribute training across a wide variety of compute configurations. This release also allows distributed training with Keras’ model.fit and custom training loops. Keras is introduced as the main high-level API. Keras is a popular high-level API used for easy and fast prototyping, building, and training of deep learning models. This will enable developers to easily leverage their various model-building APIs. Using Keras with TensorFlow has two main methods.  Symbolic (Keras sequential) Your model is a graph of layers Any graph you compile will run  TensorFlow helps you debug by catching errors at compile time Imperative method (Keras subclassing) Your model is Python bytecode Complete flexibility and control  Harder to debug/ Harder to maintain  There are pros and cons of using each method; it really just depends on what your specific use cases are. The SavedModel API allows you to save your trained ML model into a language-neutral format. With TensorFlow 2.0, all TensorFlow ecosystem projects including TensorFlow Lite, TensorFlow JS, TensorFlow Serving, and TensorFlow Hub, support SavedModels. On Tensorflow Hub, you can store and download pre-built models. You can use TensorFlow Extended which is a Python library that can be run on your servers to productionalize your models. TensorFlow Lite lets you run your TensorFlow models on edge devices. With TensorFlow.js, you can run machine learning models using javascript in the browser or run them on servers using node. TensorFlow also has Swift for TensorFlow to help developers use Swift to develop machine learning models. “Swift for TensorFlow provides a new programming model that combines the performance of graphs with the flexibility and expressivity of Eager execution, with a strong focus on improved usability at every level of the stack. This is not just a TensorFlow API wrapper written in Swift — we added compiler and language enhancements to Swift to provide a first-class user experience for machine learning developers.”  Other packages that exist in the TensorFlow ecosystem used for niche use cases are TF Probability, TF Agents (reinforcement learning), Tensor2Tensor, TF Ranking, TF Text (natural language processing), TF Federated, TF privacy and more.  How to upgrade from TensorFlow 1.x to TensorFlow 2.0 There are several migration guides available on TensorFlow’s website. You can also use the tf.compat.v1 library for backwards compatibility and the tf_upgrade_v2 script which you can execute on top of any Python script to convert TF 1.x code to 2.0 code. You can also read more about TF 2.0 migration in our book Hands-On Computer Vision with TensorFlow 2 which introduces the automatic migration tool  and compares TensorFlow 1 concepts with their TensorFlow 2 counterparts with a detailed guide on migrating to idiomatic TensorFlow 2 code. You can watch Brad’s full talk on YouTube. This video is licensed under the CC BY-NC-SA 3.0 license.  TensorFlow.js contributor Kai Sasaki on how TensorFlow.js eases web-based machine learning application development Introducing Spleeter, a Tensorflow based python library that extracts voice and sound from any music track. TensorFlow 2.0 released with tighter Keras integration, eager execution enabled by default, and more!
Read more
  • 0
  • 0
  • 23677

article-image-18-striking-ai-trends-2018-part-1
Sugandha Lahoti
27 Dec 2017
14 min read
Save for later

18 striking AI Trends to watch in 2018 - Part 1

Sugandha Lahoti
27 Dec 2017
14 min read
Artificial Intelligence is the talk of the town. It has evolved past merely being a buzzword in 2016, to be used in a more practical manner in 2017. As 2018 rolls out, we will gradually notice AI transitioning into a necessity. We have prepared a detailed report, on what we can expect from AI in the upcoming year. So sit back, relax, and enjoy the ride through the future. (Don’t forget to wear your VR headgear! ) Here are 18 things that will happen in 2018 that are either AI driven or driving AI: Artificial General Intelligence may gain major traction in research. We will turn to AI enabled solution to solve mission-critical problems. Machine Learning adoption in business will see rapid growth. Safety, ethics, and transparency will become an integral part of AI application design conversations. Mainstream adoption of AI on mobile devices Major research on data efficient learning methods AI personal assistants will continue to get smarter Race to conquer the AI optimized hardware market will heat up further We will see closer AI integration into our everyday lives. The cryptocurrency hype will normalize and pave way for AI-powered Blockchain applications. Advancements in AI and Quantum Computing will share a symbiotic relationship Deep learning will continue to play a significant role in AI development progress. AI will be on both sides of the cybersecurity challenge. Augmented reality content will be brought to smartphones. Reinforcement learning will be applied to a large number of real-world situations. Robotics development will be powered by Deep Reinforcement learning and Meta-learning Rise in immersive media experiences enabled by AI. A large number of organizations will use Digital Twin. 1. General AI: AGI may start gaining traction in research. AlphaZero is only the beginning. 2017 saw Google’s AlphaGo Zero (and later AlphaZero) beat human players at Go, Chess, and other games. In addition to this, computers are now able to recognize images, understand speech, drive cars, and diagnose diseases better with time. AGI is an advancement of AI which deals with bringing machine intelligence as close to humans as possible. So, machines can possibly do any intellectual task that a human can! The success of AlphaGo covered one of the crucial aspects of AGI systems—the ability to learn continually, avoiding catastrophic forgetting. However, there is a lot more to achieving human-level general intelligence than the ability to learn continually. For instance, AI systems of today can draw on skills it learned on one game to play another. But they lack the ability to generalize the learned skill. Unlike humans, these systems do not seek solutions from previous experiences. An AI system cannot ponder and reflect on a new task, analyze its capabilities, and work out how best to apply them. In 2018, we expect to see advanced research in the areas of deep reinforcement learning, meta-learning, transfer learning, evolutionary algorithms and other areas that aid in developing AGI systems. Detailed aspects of these ideas are highlighted in later points. We can indeed say, Artificial General Intelligence is inching closer than ever before and 2018 is expected to cover major ground in that direction. 2. Enterprise AI: Machine Learning adoption in enterprises will see rapid growth. 2017 saw a rise in cloud offerings by major tech players, such as the Amazon Sagemaker, Microsoft Azure Cloud, Google Cloud Platform, allowing business professionals and innovators to transfer labor-intensive research and analysis to the cloud. Cloud is a $130 billion industry as of now, and it is projected to grow.  Statista carried out a survey to present the level of AI adoption among businesses worldwide, as of 2017.  Almost 80% of the participants had incorporated some or other form of AI into their organizations or planned to do so in the future. Source: https://www.statista.com/statistics/747790/worldwide-level-of-ai-adoption-business/ According to a report from Deloitte, medium and large enterprises are set to double their usage of machine learning by the end of 2018. Apart from these, 2018 will see better data visualization techniques, powered by machine learning, which is a critical aspect of every business.  Artificial intelligence is going to automate the cycle of report generation and KPI analysis, and also, bring in deeper analysis of consumer behavior. Also with abundant Big data sources coming into the picture, BI tools powered by AI will emerge, which can harness the raw computing power of voluminous big data for data models to become streamlined and efficient. 3. Transformative AI: We will turn to AI enabled solutions to solve mission-critical problems. 2018 will see the involvement of AI in more and more mission-critical problems that can have world-changing consequences: read enabling genetic engineering, solving the energy crisis, space exploration, slowing climate change, smart cities, reducing starvation through precision farming, elder care etc. Recently NASA revealed the discovery of a new exoplanet, using data crunched from Machine learning and AI. With this recent reveal, more AI techniques would be used for space exploration and to find other exoplanets. We will also see the real-world deployment of AI applications. So it will not be only about academic research, but also about industry readiness. 2018 could very well be the year when AI becomes real for medicine. According to Mark Michalski, executive director, Massachusetts General Hospital and Brigham and Women’s Center for Clinical Data Science, “By the end of next year, a large number of leading healthcare systems are predicted to have adopted some form of AI within their diagnostic groups.”  We would also see the rise of robot assistants, such as virtual nurses, diagnostic apps in smartphones, and real clinical robots that can monitor patients, take care of the elderly, alert doctors, and send notifications in case of emergency. More research will be done on how AI enabled technology can help in difficult to diagnose areas in health care like mental health, the onset of hereditary diseases among others. Facebook's attempt at detection of potential suicidal messages using AI is a sign of things to come in this direction. As we explore AI enabled solutions to solve problems that have a serious impact on individuals and societies at large, considering the ethical and moral implications of such solutions will become central to developing them, let alone hard to ignore. 4. Safe AI: Safety, Ethics, and Transparency in AI applications will become integral to conversations on AI adoption and app design. The rise of machine learning capabilities has also given rise to forms of bias, stereotyping and unfair determination in such systems. 2017 saw some high profile news stories about gender bias, object recognition datasets like MS COCO, to racial disparities in education AI systems. At NIPS 2017, Kate Crawford talked about bias in machine learning systems which resonated greatly with the community and became pivotal to starting conversations and thinking by other influencers on how to address the problems raised.  DeepMind also launched a new unit, the DeepMind Ethics & Society,  to help technologists put ethics into practice, and to help society anticipate and direct the impact of AI for the benefit of all. Independent bodies like IEEE also pushed for standards in it’s ethically aligned design paper. As news about the bro culture in Silicon Valley and the lack of diversity in the tech sector continued to stay in the news all of 2017, it hit closer home as the year came to an end, when Kristian Lum, Lead Statistician at HRDAG, described her experiences with harassment as a graduate student at prominent stat conferences. This has had a butterfly effect of sorts with many more women coming forward to raise the issue in the ML/AI community. They talked about the formation of a stronger code of conduct by boards of key conferences such as NIPS among others. Eric Horvitz, a Microsoft research director, called Lum’s post a "powerful and important report." Jeff Dean, head of Google’s Brain AI unit applauded Lum for having the courage to speak about this behavior. Other key influencers from the ML and statisticians community also spoke in support of Lum and added their views on how to tackle the problem. While the road to recovery is long and machines with moral intelligence may be decades away, 2018 is expected to start that journey in the right direction by including safety, ethics, and transparency in AI/ML systems. Instead of just thinking about ML contributing to decision making in say hiring or criminal justice, data scientists would begin to think of the potential role of ML in the harmful representation of human identity. These policies will not only be included in the development of larger AI ecosystems but also in national and international debates in politics, businesses, and education. 5. Ubiquitous AI: AI will start redefining life as we know it, and we may not even know it happened. Artificial Intelligence will gradually integrate into our everyday lives. We will see it in our everyday decisions like what kind of food we eat, the entertainment we consume, the clothes we wear, etc.  Artificially intelligent systems will get better at complex tasks that humans still take for granted, like walking around a room and over objects. We’re going to see more and more products that contain some form of AI enter our lives. AI enabled stuff will become more common and available. We will also start seeing it in the background for life-altering decisions we make such as what to learn, where to work, whom to love, who our friends are,  whom should we vote for, where should we invest, and where should we live among other things. 6. Embedded AI: Mobile AI means a radically different way of interacting with the world. There is no denying that AI is the power source behind the next generation of smartphones. A large number of organizations are enabling the use of AI in smartphones, whether in the form of deep learning chips, or inbuilt software with AI capabilities. The mobile AI will be a  combination of on-device AI and cloud AI. Intelligent phones will have end-to-end capabilities that support coordinated development of chips, devices, and the cloud. The release of iPhone X’s FaceID—which uses a neural network chip to construct a mathematical model of the user’s face— and self-driving cars are only the beginning. As 2018 rolls out we will see vast applications on smartphones and other mobile devices which will run deep neural networks to enable AI. AI going mobile is not just limited to the embedding of neural chips in smartphones. The next generation of mobile networks 5G will soon greet the world. 2018 is going to be a year of closer collaborations and increasing partnerships between telecom service providers, handset makers, chip markers and AI tech enablers/researchers. The Baidu-Huawei partnership—to build an open AI mobile ecosystem, consisting of devices, technology, internet services, and content—is an example of many steps in this direction. We will also see edge computing rapidly becoming a key part of the Industrial Internet of Things (IIoT) to accelerate digital transformation. In combination with cloud computing, other forms of network architectures such as fog and mist would also gain major traction. All of the above will lead to a large-scale implementation of cognitive IoT, which combines traditional IoT implementations with cognitive computing. It will make sensors capable of diagnosing and adapting to their environment without the need for human intervention. Also bringing in the ability to combine multiple data streams that can identify patterns. This means we will be a lot closer to seeing smart cities in action. 7. Data-sparse AI: Research into data efficient learning methods will intensify 2017 saw highly scalable solutions for problems in object detection and recognition, machine translation, text-to-speech, recommender systems, and information retrieval.  The second conference on Machine Translation happened in September 2017.  The 11th ACM Conference on Recommender Systems in August 2017 witnessed a series of papers presentations, featured keynotes, invited talks, tutorials, and workshops in the field of recommendation system. Google launched the Tacotron 2 for generating human-like speech from text. However, most of these researches and systems attain state-of-the-art performance only when trained with large amounts of data. With GDPR and other data regulatory frameworks coming into play, 2018 is expected to witness machine learning systems which can learn efficiently maintaining performance, but in less time and with less data. A data-efficient learning system allows learning in complex domains without requiring large quantities of data. For this, there would be developments in the field of semi-supervised learning techniques, where we can use generative models to better guide the training of discriminative models. More research would happen in the area of transfer learning (reuse generalize knowledge across domains), active learning, one-shot learning, Bayesian optimization as well as other non-parametric methods.  In addition, researchers and organizations will exploit bootstrapping and data augmentation techniques for efficient reuse of available data. Other key trends propelling data efficient learning research are growing in-device/edge computing, advancements in robotics, AGI research, and energy optimization of data centers, among others. 8. Conversational AI: AI personal assistants will continue to get smarter AI-powered virtual assistants are expected to skyrocket in 2018. 2017 was filled to the brim with new releases. Amazon brought out the Echo Look and the Echo Show. Google made its personal assistant more personal by allowing linking of six accounts to the Google Assistant built into the Home via the Home app. Bank of America unveiled Erica, it’s AI-enabled digital assistant. As 2018 rolls out, AI personal assistants will find its way into an increasing number of homes and consumer gadgets. These include increased availability of AI assistants in our smartphones and smart speakers with built-in support for platforms such as Amazon’s Alexa and Google Assistant. With the beginning of the new year, we can see personal assistants integrating into our daily routines. Developers will build voice support into a host of appliances and gadgets by using various voice assistant platforms. More importantly, developers in 2018 will try their hands on conversational technology which will include emotional sensitivity (affective computing) as well as machine translational technology (the ability to communicate seamlessly between languages). Personal assistants would be able to recognize speech patterns, for instance, of those indicative of wanting help. AI bots may also be utilized for psychiatric counseling or providing support for isolated people.  And it’s all set to begin with the AI assistant summit in San Francisco scheduled on 25 - 26 January 2018. It will witness talks by world's leading innovators in advances in AI Assistants and artificial intelligence. 9. AI Hardware: Race to conquer the AI optimized hardware market will heat up further Top tech companies (read Google, IBM, Intel, Nvidia) are investing heavily in the development of AI/ML optimized hardware. Research and Markets have predicted the global AI chip market will have a growth rate of about 54% between 2017 and 2021. 2018 will see further hardware designs intended to greatly accelerate the next generation of applications and run AI computational jobs. With the beginning of 2018 chip makers will battle it out to determine who creates the hardware that artificial intelligence lives on. Not only that, there would be a rise in the development of new AI products, both for hardware and software platforms that run deep learning programs and algorithms. Also, chips which move away from the traditional one-size-fits-all approach to application-based AI hardware will grow in popularity. 2018 would see hardware which does not only store data, but also transform it into usable information. The trend for AI will head in the direction of task-optimized hardware. 2018 may also see hardware organizations move to software domains and vice-versa. Nvidia, most famous for their Volta GPUs have come up with NVIDIA DGX-1, a software for AI research, designed to streamline the deep learning workflow. More such transitions are expected at the highly anticipated CES 2018. [dropcap]P[/dropcap]hew, that was a lot of writing! But I hope you found it just as interesting to read as I found writing it. However, we are not done yet. And here is part 2 of our 18 AI trends in ‘18. 
Read more
  • 0
  • 0
  • 23572

article-image-twitter-sentiment-analysis
Packt
19 Feb 2016
30 min read
Save for later

Twitter Sentiment Analysis

Packt
19 Feb 2016
30 min read
In this article, we will cover: Twitter and it's importance Getting hands on with Twitter's data and using various Twitter APIs Use of data to solve business problems—comparison of various businesses based on tweets (For more resources related to this topic, see here.) Twitter and its importance Twitter can be considered as extension of the short messages service or SMS but on an Internet-based platform. In the words of Jack Dorsey, co-founder and co-creator of Twitter: "...We came across the word 'twitter', and it was just perfect. The definition was 'a short burst of inconsequential information,' and 'chirps from birds'. And that's exactly what the product was" Twitter acts as a utility where one can send their SMSs to the whole world. It enables people to instantaneously get heard and get a response. Since the audience of this SMS is so large, many a times responses are very quick. So, Twitter facilitates the basic social instincts of humans. By sharing on Twitter, a user can easily express his/her opinion for just about everything and at anytime. Friends who are connected or, in case of Twitter, followers, immediately get the information about what's going on in someone's life. This in turn severs another humanemotion—the innate need to know about what is going on in someone's life. Apart from being real time, Twitter's UI is really easy to work with. It's naturally and instinctively understood, that is, the UI is very intuitive in nature. Each tweet on Twitter is a short message with maximum of 140 characters. Twitter is an excellent example of a microblogging service. As of July 2014, the Twitter user base reached above 500 million, with more than 271 million active users. Around 23 percent are adult Internet users, which is also about 19 percent of the entire adult population. If we can properly mine what users are tweeting about, Twitter can act as a great tool for advertisement and marketing. But this not the only information Twitter provides. Because of its non-symmetric nature in terms of followers and followings, Twitter assists better in terms of understanding user interests rather than its impact on the social network. An interest graph can be thought of as a method to learn the links between individuals and their diverse interests. Computing the degree of association or correlations between individual's interests and the potential advertisements are one of the most important applications of the interest graphs. Based on these correlations, a user can be targeted so as to attain a maximum response to an advertisement campaign along with followers' recommendations. One interesting fact about Twitter (and Facebook) is that the user does not need to be a real person. A user on Twitter (or on Facebook) can be anything and anyone, for example, an organization, a campaign itself, a famous but imaginary personality (a fictional character recognizable in the media) apart from a real/actual person. If a real person follows these users on Twitter, a lot can be inferred about their personality and hence they can be recommended ads or other followers based on such information. For example, @fakingnews is an Indian blog that publishes news satires ranging from Indian politics to typical Indian mindsets. People who follow @fakingnews are the ones who, in general, like to read sarcasm news. Hence, these people can be thought of as to belonging to the same cluster or a community. If we have another sarcastic blog, we can always recommend it to this community and improve on advertisement return on investment. The chances of getting more hits via people belonging to this community will be higher than a community who don't follows @fakingnews, or any such news, in general. Once you have comprehended that Twitter allows you to create, link, and investigate a community of interest for a random topic, the influence of Twitter and the knowledge one can find from mining it becomes clearer. Understanding Twitter's API Twitter APIs provide a means to access the Twitter data, that is, tweets sent by its millions of users. Let's get to know these APIs a bit better. Twitter vocabulary As described earlier, Twitter is a microblogging service with social aspect associated. It allows its users to express their views/sentiments with the means of Internet SMS, called tweets in the context of Twitter. These tweets are entities formed of maximum of 140 characters. The content of these tweets can be anything ranging from a person's mood to person's location to a person's curiosity. The platform where these tweets are posted is called Timeline. To use Twitter's APIs, one must understand the basic terminology. Tweets are the crux of Twitter. Theoretically, a tweet is just 140 characters of text content tweeted by a user, but there is more to it than just that. There is more metadata associated with the same tweet, which are classified by Twitter as entities and places. The entities constitute of hash tags, URLs, and other media data that users have included in their tweet. The places are nothing but locations from where the tweet originated. It possible the place is a real world location from where the tweet was sent, or it is a location mentioned in the text of the tweet. Take the following tweet as an example: Learn how to consume millions of tweets with @twitterapi at #TDC2014 in São Paulo #bigdata tomorrow at 2:10pm http://t.co/pTBlWzTvVd The preceding tweet was tweeted by @TwitterDev and it's about 132 characters long. The following are the entities mentioned in this tweet: Handle: @twitterapi Hashtags: #TDC2014, #bigdata URL: http://t.co/pTBlWzTvVd São Paulo is the place mentioned in this tweet. This is a one such example of a tweet with a fairly good amount of metadata. Although the actual tweet's length is well within the 140-character limit, it contains more information than one can think of. This actually enables us to figure out that this tweet belongs to a specific community based on the cross referencing the topics presents in the hash tags, the URL to the website, the different users mentioned in it, and so on. The interface (web or mobile) on to which the tweets are displayed is called timeline. The tweets are, in general, arranged in chronological order of posting time. On a specific user's account, only certain number of tweets are displayed by Twitter. This is generally based on users the given user is following and is being followed by. This is the interface a user will see when he/she login his/her Twitter account. A Twitter stream is different from Twitter timeline in the sense that they are not for a specific user. The Tweets on a user's Twitter timeline will be displayed from only certain number of users will be displayed/updated less frequently while the Twitter stream is chronological collection of the all the tweets posted by all the users. The number of active users on Twitter is in orders of hundreds of millions. All the users tweeting during some public events of widespread interest such as presidential debates can achieve speeds of several hundreds of thousands of tweets per minute. The behavior is very similar to a stream; hence the name of such collection is Twitter stream. You can try the following by creating a Twitter account (it would be more insightful if you have less number of followers already with you). Before creating the account, it is advised that you read all the terms and conditions of the same. You can also start reading its API's documentation. Creating a Twitter API connection We need to have an app created at https://dev.twitter.com/apps before making any API requests to Twitter. It's a standard method for developers to gain API access and more important it helps Twitter to observe and restricts developer from making high load API requests. The ROAuth package is the one we are going to use in our experiments. Tokens allow users to authorize third-party apps to access the data from any user account without the need to have their passwords (or other sensitive information). ROAuth basically facilitates the same. Creating new app The first step to getting any kind of token access from twitter is to create an app on it. The user has to go to https://dev.twitter.com/ and log in with their Twitter credentials. With you logged in using your credentials, the step for creating app are as follows: Go to https://apps.twitter.com/app/new. Put the name of your application in the Name field. This name can be anything you like. Similarly, enter the description in the Description field. The Website field needs to be filled with a valid URL, but again that can be any random URL. You can leave the Callback URL field blank. After the creation of this app, we need to find the API Key and API Secret values from the Key and Access Token tab. Consider the example shown in the following figure:   Under the Key and Access Tokens tab, you will find a button to generate access tokens. Click on it and you will be provided with an Access Token and Access Token Secret value. Before using the preceding keys, we need to install twitteRto access the data in R using the app we just created, using following code: Install.packages(c("devtools", "rjson", "bit64", "httr")) library(devtools) install_github("geoffjentry/twitteR"). library(twitteR) Here's sample code that helps us access the tweets posted since any give date and which contain a specific keyword. In this example, we are searching for tweeting containing the word Earthquake in the tweets posted since September 29, 2014. In order to get this information, we provide four special types of information to get the authorization token: key secret access token access token secret We'll show you how to use the preceding information to get an app authorized by the user and access its resources on Twitter. The ROAuh function in twitteR will make our next steps very smooth and clear: api_key<- "your_api_key" api_secret<- "your_api_secret" access_token<- "your_access_token" access_token_secret<- "your_access_token_secret" setup_twitter_oauth (api_key,api_secret,access_token,access_token_secret) EarthQuakeTweets = searchTwitter("EarthQuake", since='2014-09-29') The results of this example should simply display Using direct authentication with 25 tweets loaded in the EarthQuakeTweets variable as shown here. head(EarthQuakeTweets,2) [[1]] [1] "TamamiJapan: RT @HistoricalPics: Japan. Top: One Month After Hiroshima, 1945. Bottom: One Month After The Earthquake and Tsunami, 2011. Incredible. http…" [[2]] [1] "OldhamDs: RT @HistoricalPics: Japan. Top: One Month After Hiroshima, 1945. Bottom: One Month After The Earthquake and Tsunami, 2011. Incredible. http…" We have shown in the first two of the 25 tweets containing the word Earthquake since September 29, 2014. If you closely observe the results, you'll find all the metadata using str(EarthQuakeTweets[1]). Finding trending topics Now that we understand how to create API connections to Twitter and fetch data using it, we will see how to get answer to what is trending on Twitter to list what topic (worldwide or local) is being talked about the most right now. Using the same API, we can easily access the trending information: #return data frame with name, country & woeid. Locs <- availableTrendLocations() # Where woeid is a numerical identification code describing a location ID # Filter the data frame for Delhi (India) and extract the woeid of the same LocsIndia = subset(Locs, country == "India") woeidDelhi = subset(LocsIndia, name == "Delhi")$woeid # getTrends takes a specified woeid and returns the trending topics associated with that woeid trends = getTrends(woeid=woeidDelhi) The function availableTrendLocations() returns R data frame containing the name, country, and woeid parameters. We than filter this data frame for a location of our choosing; in this example, its Delhi, India. The function getTrends() fetches the top 10 trends in the location determined by the woeid. Here are the top four trending hash tags in the region defined by woeid = 20070458, that is, Delhi, India. head(trends) name url query woeid 1 #AntiHinduNGOsExposed http://twitter.com/search?q=%23AntiHinduNGOsExposed %23AntiHinduNGOsExposed 20070458 2 #KhaasAadmi http://twitter.com/search?q=%23KhaasAadmi %23KhaasAadmi 20070458 3 #WinGOSF14 http://twitter.com/search?q=%23WinGOSF14 %23WinGOSF14 20070458 4 #ItsForRealONeBay http://twitter.com/search?q=%23ItsForRealONeBay %23ItsForRealONeBay 20070458 Searching tweets Now, similar to the trends there is one more important function that comes with the TwitteR package: searchTwitter(). This function will return tweets containing the searched string along with the other constraints. Some of the constraints that can be imposed are as follows: lang: This constraints the tweets of given language. since/until: This constraints the tweets to be since the given date or until the given date. geocode: This constraints tweets to be from only those users who are located within certain distance from the given latitude/longitude. For example, extracting tweets about the cricketer Sachin Tendulkar in the month of November 2014: head(searchTwitter('Sachin Tendulkar', since='2014-11-01', until= '2014-11-30')) [[1]] [1] "TendulkarFC: RT @Moulinparikh: Sachin Tendulkar had a long session with the Mumbai Ranji Trophy team after today's loss." [[2]] [1] "tyagi_niharika: @WahidRuba @Anuj_dvn @Neel_D_ @alishatariq3 @VWellwishers @Meenal_Rathore oh... Yaadaaya....hmaraesachuuu sirxedxa0xbdxedxb8x8d..i mean sachin Tendulkar" [[3]] [1] "Meenal_Rathore: @WahidRuba @Anuj_dvn @tyagi_niharika @Neel_D_ @alishatariq3 @AliaaFcc @VWellwishers .. Sachin Tendulkar xedxa0xbdxedxb8x8a☺️" [[4]] [1] "MishraVidyanand: Vidyanand Mishra is following the Interest "The Living Legend SachinTendu..." on http://t.co/tveHXMB4BM - http://t.co/CocNMcxFge" [[5]] [1] "CSKalwaysWin: I have never tried to compare myself to anyone else.n - Sachin Tendulkar" Twitter sentiment analysis Depending on the objective and based on the functionality to search any type of tweets from the public timeline, one can always collect the required corpus. For example, you may want to learn about customer satisfaction levels with various cab services, which are coming in Indian market. These start-ups are offering various discounts and coupons to attract customers but at the end of the day, the service quality determines the business of any organization. These startups are constantly promoting themselves on various social media websites. Customers are showing various levels of sentiments on the same platform. Let's target the following: Meru Cabs: A radio cabs service based in Mumbai, India. Launched in 2007. Ola Cabs: A taxi aggregator company based in Bangalore, India. Launched in 2011. TaxiForSure: A taxi aggregator company based in Bangalore, India. Launched in 2011. Uber India: A taxi aggregator company headquartered in San Francisco, California. Launched in India in 2014. Let's set our goal to get the general sentiments about each of the preceding services providers based on the customer sentiments present in the tweets on Twitter. Collecting tweets as a corpus We'll start with the searchTwitter()function (discussed previously) on the TwitteR package to gather the tweets for each of the preceding organizations. Now, in order to avoid writing same code again and again, we pushed the following authorization code in the file called authenticate.R. library(twitteR) api_key<- "xx" api_secret<- "xx" access_token<- "xx" access_token_secret<- "xx" setup_twitter_oauth(api_key,api_secret,access_token, access_token_secret) We run the following scripts to get the required tweets: # Load the necessary packages source('authenticate.R') Meru_tweets = searchTwitter("MeruCabs", n=2000, lang="en") Ola_tweets = searchTwitter("OlaCabs", n=2000, lang="en") TaxiForSure_tweets = searchTwitter("TaxiForSure", n=2000, lang="en") Uber_tweets = searchTwitter("Uber_Delhi", n=2000, lang="en") Now, as mentioned in Twitter's Rest API documentation, we get the message "Due to capacity constraints, the index currently only covers about a week's worth of tweets". We do not always get the desired number of tweets (for example, here it's 2000). Instead, the following are the size of each of the above Tweet lists we get the following: >length(Meru_tweets) [1] 393 >length(Ola_tweets) [1] 984 > length(TaxiForSure_tweets) [1] 720 > length(Uber_tweets) [1] 2000 As you can see from the preceding code, the length of these tweets is not equal to the number of tweets we had asked for in our query scripts. There are many takeaways from this information. Since these tweets are only from last one week's tweets on Twitter, they suggest there is more discussion about these taxi services in the following order: Uber India Ola Cabs TaxiForSure Meru Cabs A ban was imposed on Uber India after an alleged rape incident by one Uber India driver. The decision to put a ban on the entire organization because one of its drivers committed a crime became a matter of public outcry. Hence, the number of tweets about Uber increased on social media. Now, Meru Cabs have been in India for almost 7 years now. Hence, they are quite a stable organization. They amount of promotion Ola Cabs and TaxiForSure are doing is way higher than that of Meru Cabs. This can be one reason for Meru Cabs having theleast number (393) of tweets in last week. The number of tweets in last week is comparable for Ola Cabs (984) and TaxiForSure (720). There can be several numbers of reasons for the same. They were both started their business in same year and more importantly they follow the same business model. While Meru Cabs is a radio taxi service and they own and manage a fleet of cars while Ola Cabs, TaxiForSure, or Uber are a marketplace for users to compare the offerings of various operators and book easily. Let's dive deep into the data and get more insights. Cleaning the corpus Before applying any intelligent algorithms to gather more insights out of the tweets collected so far, let's first clean it. In order to clean up, we should understand how the list of tweets looks like: head(Meru_tweets) [[1]] [1] "MeruCares: @KapilTwitts 2&gt;...and other details at feedback@merucabs.com We'll check back and reach out soon." [[2]] [1] "vikasraidhan: @MeruCabs really disappointed with @GenieCabs. Cab is never assigned on time. Driver calls after 30 minutes. Why would I ride with Meru?" [[3]] [1] "shiprachowdhary: fallback of #ubershame , #MERUCABS taking customers for a ride" [[4]] [1] "shiprachowdhary: They book Genie, but JIT inform of cancellation &amp; send full fare #MERUCABS . Very disappointed.Always used these guys 4 and recommend them." [[5]] [1] "shiprachowdhary: No choice bt to take the #merucabs premium service. Driver told me that this happens a lot with #merucabs." [[6]] [1] "shiprachowdhary: booked #Merucabsyestrdy. Asked for Meru Genie. 10 mins 4 pick up time, they call to say Genie not available, so sending the full fare cab" The first tweet here is a grievance solution, while the second, fourth and fifth are actually customer sentiments about the services provided by Meru Cabs. We see: Lots of meta information such as @people, URLs and #hashtags Punctuation marks, numbers, and unnecessary spaces Some of these tweets are retweets from other users; for the given application, we would not like to consider retweets (RTs) in sentiment analysis We clean all these data using the following code block: MeruTweets <- sapply(Meru_tweets, function(x) x$getText()) OlaTweets = sapply(Ola_tweets, function(x) x$getText()) TaxiForSureTweets = sapply(TaxiForSure_tweets, function(x) x$getText()) UberTweets = sapply(Uber_tweets, function(x) x$getText()) catch.error = function(x) { # let us create a missing value for test purpose y = NA # Try to catch that error (NA) we just created catch_error = tryCatch(tolower(x), error=function(e) e) # if not an error if (!inherits(catch_error, "error")) y = tolower(x) # check result if error exists, otherwise the function works fine. return(y) } cleanTweets<- function(tweet){ # Clean the tweet for sentiment analysis # remove html links, which are not required for sentiment analysis tweet = gsub("(f|ht)(tp)(s?)(://)(.*)[.|/](.*)", " ", tweet) # First we will remove retweet entities from the stored tweets (text) tweet = gsub("(RT|via)((?:\b\W*@\w+)+)", " ", tweet) # Then remove all "#Hashtag" tweet = gsub("#\w+", " ", tweet) # Then remove all "@people" tweet = gsub("@\w+", " ", tweet) # Then remove all the punctuation tweet = gsub("[[:punct:]]", " ", tweet) # Then remove numbers, we need only text for analytics tweet = gsub("[[:digit:]]", " ", tweet) # finally, we remove unnecessary spaces (white spaces, tabs etc) tweet = gsub("[ t]{2,}", " ", tweet) tweet = gsub("^\s+|\s+$", "", tweet) # if anything else, you feel, should be removed, you can. For example "slang words" etc using the above function and methods. # Next we'll convert all the word in lower case. This makes uniform pattern. tweet = catch.error(tweet) tweet } cleanTweetsAndRemoveNAs<- function(Tweets) { TweetsCleaned = sapply(Tweets, cleanTweets) # Remove the "NA" tweets from this tweet list TweetsCleaned = TweetsCleaned[!is.na(TweetsCleaned)] names(TweetsCleaned) = NULL # Remove the repetitive tweets from this tweet list TweetsCleaned = unique(TweetsCleaned) TweetsCleaned } MeruTweetsCleaned = cleanTweetsAndRemoveNAs(MeruTweets) OlaTweetsCleaned = cleanTweetsAndRemoveNAs(OlaTweets) TaxiForSureTweetsCleaned <- cleanTweetsAndRemoveNAs(TaxiForSureTweets) UberTweetsCleaned = cleanTweetsAndRemoveNAs(UberTweets) Here's the size of each of the cleaned tweet lists: > length(MeruTweetsCleaned) [1] 309 > length(OlaTweetsCleaned) [1] 811 > length(TaxiForSureTweetsCleaned) [1] 574 > length(UberTweetsCleaned) [1] 1355 Estimating sentiment (A) There are many sophisticated resources available to estimate sentiments. Many research papers and software packages are available open source,and they implement very complex algorithms for sentiments analysis. After getting the cleaned Twitter data, we are going to use few of such R packages available to assess the sentiments in the tweets. It's worth mentioning here that not all the tweets represent a sentiment. Few tweets can be just information/facts, while others can be customer care responses. Ideally, they should not be used to assess the customer sentiment about a particular organization. As a first step, we'll use a Naïve algorithm, which gives a score based on the number of times a positive or a negative word occurred in the given sentence (and in our case, in a tweet). Please download the positive and negative opinion/sentiment (nearly 68, 000) words from English language. These opinion lexicon will be used as a first example in our sentiment analysis experiment. The good thing about this approach is that we are relying on a highly researched upon and at the same time customizable input parameters. Here are a few examples of existing positive and negative sentiments words: Positive: Love, best, cool, great, good, and amazing Negative: Hate, worst, sucks, awful, and nightmare >opinion.lexicon.pos = scan('opinion-lexicon-English/positive-words.txt', what='character', comment.char=';') >opinion.lexicon.neg = scan('opinion-lexicon-English/negative-words.txt', what='character', comment.char=';') > head(opinion.lexicon.neg) [1] "2-faced" "2-faces" "abnormal" "abolish" "abominable" "abominably" > head(opinion.lexicon.pos) [1] "a+" "abound" "abounds" "abundance" "abundant" "accessable" We'll add a few industry-specific and/or especially emphatic terms based on our requirements: pos.words = c(opinion.lexicon.pos,'upgrade') neg.words = c(opinion.lexicon.neg,'wait', 'waiting', 'wtf', 'cancellation') Now, we create a function score.sentiment(), which computes the raw sentiment based on the simple matching algorithm: getSentimentScore = function(sentences, words.positive, words.negative, .progress='none') { require(plyr) require(stringr) scores = laply(sentences, function(sentence, words.positive, words.negative) { # Let first remove the Digit, Punctuation character and Control characters: sentence = gsub('[[:cntrl:]]', '', gsub('[[:punct:]]', '', gsub('\d+', '', sentence))) # Then lets convert all to lower sentence case: sentence = tolower(sentence) # Now lets split each sentence by the space delimiter words = unlist(str_split(sentence, '\s+')) # Get the boolean match of each words with the positive & negative opinion-lexicon pos.matches = !is.na(match(words, words.positive)) neg.matches = !is.na(match(words, words.negative)) # Now get the score as total positive sentiment minus the total negatives score = sum(pos.matches) - sum(neg.matches) return(score) }, words.positive, words.negative, .progress=.progress ) # Return a data frame with respective sentence and the score return(data.frame(text=sentences, score=scores)) } Now, we apply the preceding function on the corpus of tweets collected and cleaned so far: MeruResult = getSentimentScore(MeruTweetsCleaned, words.positive , words.negative) OlaResult = getSentimentScore(OlaTweetsCleaned, words.positive , words.negative) TaxiForSureResult = getSentimentScore(TaxiForSureTweetsCleaned, words.positive , words.negative) UberResult = getSentimentScore(UberTweetsCleaned, words.positive , words.negative) Here are some sample results: Tweet for Meru Cabs Score gt and other details at feedback com we ll check back and reach out soon 0 really disappointed with cab is never assigned on time driver calls after minutes why would i ride with meru -1 so after years of bashing today i m pleasantly surprised clean car courteous driver prompt pickup mins efficient route 4 a min drive cost hrs used to cost less ur unreliable and expensive trying to lose ur customers -3 Tweet For Ola Cabs Score the service is going from bad to worse the drivers deny to come after a confirmed booking -3 love the olacabs app give it a swirl sign up with my referral code dxf n and earn rs download the app from 1 crn kept me waiting for mins amp at last moment driver refused pickup so unreliable amp irresponsible -4 this is not the first time has delighted me punctuality and free upgrade awesome that 4 Tweet For TaxiForSure Score great service now i have become a regular customer of tfs thank you for the upgrade as well happy taxi ing saving 5 really disappointed with cab is never assigned on time driver calls after minutes why would i ride with meru -1 horrible taxi service had to wait for one hour with a new born in the chilly weather of new delhi waiting for them -4 what do i get now if you resolve the issue after i lost a crucial business because of the taxi delay -3 Tweet For Uber India Score that s good uber s fares will prob be competitive til they gain local monopoly then will go sky high as in new york amp delhi saving 3 from a shabby backend app stack to daily pr fuck ups its increasingly obvious that is run by child minded blow hards -3 you say that uber is illegally running were you stupid to not ban earlier and only ban it now after the rape -3 perhaps uber biz model does need some looking into it s not just in delhi that this happens but in boston too 0 From the preceding observations, it's clear that this basic sentiment analysis method works fine in normal circumstances, but in case of Uber India the results deviated too much from a subjective score. It's safe to say that basic word matching gives a good indicator of overall customer sentiments, except in the case when the data itself is not reliable. In our case, the tweets from Uber India are not really related to the services that Uber provides, rather the one incident of crime by its driver and whole score went haywire. Let's not compute a point statistic of the scores we have computed so far. Since the numbers of tweets are not equal for each of the four organizations, we compute a mean and standard deviation for each. Organization Mean Sentiment Score Standard Deviation Meru Cabs -0.2218543 1.301846 Ola Cabs 0.197724 1.170334 TaxiForSure -0.09841828 1.154056 Uber India -0.6132666 1.071094 Estimating sentiment (B) Let's now move one step further. Now instead of using simple matching of opinion lexicon, we'll use something called Naive Bayes to decide on the emotion present in any tweet. We would require packages called Rstem and sentiment to assist in this. It's important to mention here that both these packages are no longer available in CRAN and hence we have to provide either the repository location as a parameter install.package() function. Here's the R script to install the required packages: install.packages("Rstem", repos = "http://www.omegahat.org/R", type="source") require(devtools) install_url("http://cran.r-project.org/src/contrib/Archive/sentiment/sentiment_0.2.tar.gz") require(sentiment) ls("package:sentiment") Now that we have the sentiment and Rstem packages installed in our R workspace, we can build the bayes classifier for sentiment analysis: library(sentiment) # classify_emotion function returns an object of class data frame # with seven columns (anger, disgust, fear, joy, sadness, surprise, # # best_fit) and one row for each document: MeruTweetsClassEmo = classify_emotion(MeruTweetsCleaned, algorithm="bayes", prior=1.0) OlaTweetsClassEmo = classify_emotion(OlaTweetsCleaned, algorithm="bayes", prior=1.0) TaxiForSureTweetsClassEmo = classify_emotion(TaxiForSureTweetsCleaned, algorithm="bayes", prior=1.0) UberTweetsClassEmo = classify_emotion(UberTweetsCleaned, algorithm="bayes", prior=1.0) The following figure shows few results from Bayesian analysis using thesentiment package for Meru Cabs tweets. Similarly, we generated results for other cab-services from our problem setup. The sentiment package was built to use a trained dataset of emotion words (nearly 1500 words). The function classify_emotion() generates results belonging to one of the following six emotions: anger, disgust, fear, joy, sadness, and surprise. Hence, when the system is not able to classify the overall emotion to any of the six,NA is returned: Let's substitute these NA values with the word unknown to make the further analysis easier: # we will fetch emotion category best_fit for our analysis purposes. MeruEmotion = MeruTweetsClassEmo[,7] OlaEmotion = OlaTweetsClassEmo[,7] TaxiForSureEmotion = TaxiForSureTweetsClassEmo[,7] UberEmotion = UberTweetsClassEmo[,7] MeruEmotion[is.na(MeruEmotion)] = "unknown" OlaEmotion[is.na(OlaEmotion)] = "unknown" TaxiForSureEmotion[is.na(TaxiForSureEmotion)] = "unknown" UberEmotion[is.na(UberEmotion)] = "unknown" The best-fit emotions present in these tweets are as follows: Further, we'll use another function classify_polarity() provided by the sentiment package to classify the tweets into two classes, pos (positive sentiment) or neg (negative sentiment). The idea is to compute the log likelihood of a tweet assuming it to belong to either of two classes. Once these likelihoods are calculated, a ratio of the pos-likelihood to neg-likelihood is calculated and based on this ratio the tweets are classified to belong to a particular class. It's important to note that if this ratio turns out to be 1, then the overall sentiment of the tweet is assumed to be "neutral". The code is as follows: MeruTweetsClassPol = classify_polarity(MeruTweetsCleaned, algorithm="bayes") OlaTweetsClassPol = classify_polarity(OlaTweetsCleaned, algorithm="bayes") TaxiForSureTweetsClassPol = classify_polarity(TaxiForSureTweetsCleaned, algorithm="bayes") UberTweetsClassPol = classify_polarity(UberTweetsCleaned, algorithm="bayes") We get the following output: The preceding figure shows few results from obtained using the classify_polarity() function of sentiment package for Meru Cabs tweets. We'll now generate consolidated results from the two functions in a data frame for each cab service for plotting purposes: # we will fetch polarity category best_fit for our analysis purposes, MeruPol = MeruTweetsClassPol[,4] OlaPol = OlaTweetsClassPol[,4] TaxiForSurePol = TaxiForSureTweetsClassPol[,4] UberPol = UberTweetsClassPol[,4] # Let us now create a data frame with the above results MeruSentimentDataFrame = data.frame(text=MeruTweetsCleaned, emotion=MeruEmotion, polarity=MeruPol, stringsAsFactors=FALSE) OlaSentimentDataFrame = data.frame(text=OlaTweetsCleaned, emotion=OlaEmotion, polarity=OlaPol, stringsAsFactors=FALSE) TaxiForSureSentimentDataFrame = data.frame(text=TaxiForSureTweetsCleaned, emotion=TaxiForSureEmotion, polarity=TaxiForSurePol, stringsAsFactors=FALSE) UberSentimentDataFrame = data.frame(text=UberTweetsCleaned, emotion=UberEmotion, polarity=UberPol, stringsAsFactors=FALSE) # rearrange data inside the frame by sorting it MeruSentimentDataFrame = within(MeruSentimentDataFrame, emotion <- factor(emotion, levels=names(sort(table(emotion), decreasing=TRUE)))) OlaSentimentDataFrame = within(OlaSentimentDataFrame, emotion <- factor(emotion, levels=names(sort(table(emotion), decreasing=TRUE)))) TaxiForSureSentimentDataFrame = within(TaxiForSureSentimentDataFrame, emotion <- factor(emotion, levels=names(sort(table(emotion), decreasing=TRUE)))) UberSentimentDataFrame = within(UberSentimentDataFrame, emotion <- factor(emotion, levels=names(sort(table(emotion), decreasing=TRUE)))) plotSentiments1<- function (sentiment_dataframe,title) { library(ggplot2) ggplot(sentiment_dataframe, aes(x=emotion)) + geom_bar(aes(y=..count.., fill=emotion)) + scale_fill_brewer(palette="Dark2") + ggtitle(title) + theme(legend.position='right') + ylab('Number of Tweets') + xlab('Emotion Categories') } plotSentiments1(MeruSentimentDataFrame, 'Sentiment Analysis of Tweets on Twitter about MeruCabs') plotSentiments1(OlaSentimentDataFrame, 'Sentiment Analysis of Tweets on Twitter about OlaCabs') plotSentiments1(TaxiForSureSentimentDataFrame, 'Sentiment Analysis of Tweets on Twitter about TaxiForSure') plotSentiments1(UberSentimentDataFrame, 'Sentiment Analysis of Tweets on Twitter about UberIndia') The output is as follows: In the preceding figure, we showed sample results using generated results on Meru Cabs tweets using both the functions. Let's now plot them one by one. First, let's create a single function to be used by each business's tweets. We call it plotSentiments1() and then we plot it for each business: The following dashboard shows the analysis for Ola Cabs: The following dashboard shows the analysis for TaxiForSure: The following dashboard shows the analysis for Uber India: These sentiments basically reflect the more or less the same observations as we did with the basic word-matching algorithm. The number of tweets with joy constitute the largest part of tweets for all these organizations, indicating that these organizations are trying their best to provide good business in the country. The sadness tweets are less numerous than the joy tweets. However, if compared with each other, they indicate the overall market share versus level of customer satisfaction of each service provider in question. Similarly, these graphs can be used to assess the level of dissatisfaction in terms of anger and disgust in the tweets. Let's now consider only the positive and negative sentiments present in the tweets: # Similarly we will plot distribution of polarity in the tweets plotSentiments2 <- function (sentiment_dataframe,title) { library(ggplot2) ggplot(sentiment_dataframe, aes(x=polarity)) + geom_bar(aes(y=..count.., fill=polarity)) + scale_fill_brewer(palette="RdGy") + ggtitle(title) + theme(legend.position='right') + ylab('Number of Tweets') + xlab('Polarity Categories') } plotSentiments2(MeruSentimentDataFrame, 'Polarity Analysis of Tweets on Twitter about MeruCabs') plotSentiments2(OlaSentimentDataFrame, 'Polarity Analysis of Tweets on Twitter about OlaCabs') plotSentiments2(TaxiForSureSentimentDataFrame, 'Polarity Analysis of Tweets on Twitter about TaxiForSure') plotSentiments2(UberSentimentDataFrame, 'Polarity Analysis of Tweets on Twitter about UberIndia') The output is as follows: The following dashboard shows the polarity analysis for Ola Cabs: The following dashboard shows the analysis for TaxiForSure: The following dashboard shows the analysis for Uber India: It's a basic human trait to inform about other's what's wrong rather than informing if there was something right. That is say that we tend to tweets/report if something bad had happened rather reporting/tweeting if the experience was rather good. Hence, the negative tweets are supposed to be larger than the positive tweets in general. Still over a period of time (a week in our case) the ratio of the two easily reflect the overall market share versus the level of customer satisfaction of each service provider in question. Next, we try to get the sense of the overall content of the tweets using the word clouds. removeCustomeWords <- function (TweetsCleaned) { for(i in 1:length(TweetsCleaned)){ TweetsCleaned[i] <- tryCatch({ TweetsCleaned[i] = removeWords(TweetsCleaned[i], c(stopwords("english"), "care", "guys", "can", "dis", "didn", "guy" ,"booked", "plz")) TweetsCleaned[i] }, error=function(cond) { TweetsCleaned[i] }, warning=function(cond) { TweetsCleaned[i] }) } return(TweetsCleaned) } getWordCloud <- function (sentiment_dataframe, TweetsCleaned, Emotion) { emos = levels(factor(sentiment_dataframe$emotion)) n_emos = length(emos) emo.docs = rep("", n_emos) TweetsCleaned = removeCustomeWords(TweetsCleaned) for (i in 1:n_emos){ emo.docs[i] = paste(TweetsCleaned[Emotion == emos[i]], collapse=" ") } corpus = Corpus(VectorSource(emo.docs)) tdm = TermDocumentMatrix(corpus) tdm = as.matrix(tdm) colnames(tdm) = emos require(wordcloud) suppressWarnings(comparison.cloud(tdm, colors = brewer.pal(n_emos, "Dark2"), scale = c(3,.5), random.order = FALSE, title.size = 1.5)) } getWordCloud(MeruSentimentDataFrame, MeruTweetsCleaned, MeruEmotion) getWordCloud(OlaSentimentDataFrame, OlaTweetsCleaned, OlaEmotion) getWordCloud(TaxiForSureSentimentDataFrame, TaxiForSureTweetsCleaned, TaxiForSureEmotion) getWordCloud(UberSentimentDataFrame, UberTweetsCleaned, UberEmotion) The preceding figure shows word cloud from tweets about Meru Cabs. The preceding figure shows word cloud from tweets about Ola Cabs. The preceding figure shows word cloud from tweets about TaxiForSure. The preceding figure shows word cloud from tweets about Uber India. Summary In this article, we gained knowledge of the various Twitter APIs, we discussed how to create a connection with Twitter, and we saw how to retrieve the tweets with various attributes. We saw the power of Twitter in helping us determine the customer attitude toward today's various businesses. The activity can be done on the weekly basis and one can easily get the monthly or quarterly or yearly changes in customer sentiments. This can not only help the customer decide the trending businesses, but the business itself can get a well-defined metric of its own performance. It can use such scores/graphs to improve. We also discussed various methods of sentiment analysis varying from basic word matching to the advanced Bayesian algorithms. Resources for Article: Further resources on this subject: Find Friends on Facebook [article] Supervised learning[article] Warming Up [article]
Read more
  • 0
  • 0
  • 23321

article-image-how-to-perform-numeric-metric-aggregations-with-elasticsearch
Pravin Dhandre
22 Feb 2018
7 min read
Save for later

How to perform Numeric Metric Aggregations with Elasticsearch

Pravin Dhandre
22 Feb 2018
7 min read
[box type="note" align="" class="" width=""]This article is an excerpt from the book Learning Elastic Stack 6.0 written by Pranav Shukla and Sharath Kumar M N . This book provides detailed coverage on fundamentals of each components of Elastic Stack, making it easy to search, analyze and visualize data across different sources in real-time.[/box] Today, we are going to demonstrate how to run numeric and statistical queries such as summation, average, count and various similar metric aggregations on Elastic Stack to serve a better analytics engine on your dataset. Metric aggregations   Metric aggregations work with numeric data, computing one or more aggregate metrics within the given context. The context could be a query, filter, or no query to include the whole index/type. Metric aggregations can also be nested inside other bucket aggregations. In this case, these metrics will be computed for each bucket in the bucket aggregations. We will start with simple metric aggregations without nesting them inside bucket aggregations. When we learn about bucket aggregations later in the chapter, we will also learn how to use metric aggregations inside bucket aggregations. We will learn about the following metric aggregations: Sum, average, min, and max aggregations Stats and extended stats aggregations Cardinality aggregation Let us learn about them one by one. Sum, average, min, and max aggregations Finding the sum of a field, the minimum value for a field, the maximum value for a field, or an average, are very common operations. For the people who are familiar with SQL, the query to find the sum would look like the following: SELECT sum(downloadTotal) FROM usageReport; The preceding query will calculate the sum of the downloadTotal field across all records in the table. This requires going through all records of the table or all records in the given context and adding the values of the given fields. In Elasticsearch, a similar query can be written using the sum aggregation. Let us understand the sum aggregation first. Sum aggregation Here is how to write a simple sum aggregation: GET bigginsight/_search { "aggregations": { 1 "download_sum": { 2 "sum": { 3 "field": "downloadTotal" 4 } } }, "size": 0 5 } The aggs or aggregations element at the top level should wrap any aggregation. Give a name to the aggregation; here we are doing the sum aggregation on the downloadTotal field and hence the name we chose is download_sum. You can name it anything. This field will be useful while looking up this particular aggregation's result in the response. We are doing a sum aggregation, hence the sum element. We want to do term aggregation on the downloadTotal field. Specify size = 0 to prevent raw search results from being returned. We just want aggregation results and not the search results in this case. Since we haven't specified any top level query elements, it matches all documents. We do not want any raw documents (or search hits) in the result. The response should look like the following: { "took": 92, ... "hits": { "total": 242836, 1 "max_score": 0, "hits": [] }, "aggregations": { 2 "download_sum": { 3 "value": 2197438700 4 } } } Let us understand the key aspects of the response. The key parts are numbered 1, 2, 3, and so on, and are explained in the following points: The hits.total element shows the number of documents that were considered or were in the context of the query. If there was no additional query or filter specified, it will include all documents in the type or index. Just like the request, this response is wrapped inside aggregations to indicate as Such. The response of the aggregation requested by us was named download_sum, hence we get our response from the sum aggregation inside an element with the same name. The actual value after applying the sum aggregation. The average, min, and max aggregations are very similar. Let's look at them briefly. Average aggregation The average aggregation finds an average across all documents in the querying context: GET bigginsight/_search { "aggregations": { "download_average": { 1 "avg": { 2 "field": "downloadTotal" } } }, "size": 0 } The only notable differences from the sum aggregation are as follows: We chose a different name, download_average, to make it apparent that the aggregation is trying to compute the average. The type of aggregation that we are doing is avg instead of the sum aggregation that we were doing earlier. The response structure is identical but the value field will now represent the average of the requested field. The min and max aggregations are the exactly same. Min aggregation Here is how we will find the minimum value of the downloadTotal field in the entire index/type: GET bigginsight/_search { "aggregations": { "download_min": { "min": { "field": "downloadTotal" } } }, "size": 0 } Let's finally look at max aggregation also. Max aggregation Here is how we will find the maximum value of the downloadTotal field in the entire index/type: GET bigginsight/_search { "aggregations": { "download_max": { "max": { "field": "downloadTotal" } } }, "size": 0 } These aggregations were really simple. Now let's look at some more advanced yet simple stats and extended stats aggregations. Stats and extended stats aggregations These aggregations compute some common statistics in a single request without having to issue multiple requests. This saves resources on the Elasticsearch side as well because the statistics are computed in a single pass rather than being requested multiple times. The client code also becomes simpler if you are interested in more than one of these statistics. Let's look at the stats aggregation first. Stats aggregation The stats aggregation computes the sum, average, min, max, and count of documents in a single pass: GET bigginsight/_search { "aggregations": { "download_stats": { "stats": { "field": "downloadTotal" } } }, "size": 0 } The structure of the stats request is the same as the other metric aggregations we have seen so far, so nothing special is going on here. The response should look like the following: { "took": 4, ..., "hits": { "total": 242836, "max_score": 0, "hits": [] }, "aggregations": { "download_stats": { "count": 242835, "min": 0, "max": 241213, "avg": 9049.102065188297, "sum": 2197438700 } } } As you can see, the response with the download_stats element contains count, min, max, average, and sum; everything is included in the same response. This is very handy as it reduces the overhead of multiple requests and also simplifies the client code. Let us look at the extended stats aggregation. Extended stats Aggregation The extended stats aggregation returns a few more statistics in addition to the ones returned by the stats aggregation: GET bigginsight/_search { "aggregations": { "download_estats": { "extended_stats": { "field": "downloadTotal" } } }, "size": 0 } The response looks like the following: { "took": 15, "timed_out": false, ..., "hits": { "total": 242836, "max_score": 0, "hits": [] }, "aggregations": { "download_estats": { "count": 242835, "min": 0, "max": 241213, "avg": 9049.102065188297, "sum": 2197438700, "sum_of_squares": 133545882701698, "variance": 468058704.9782911, "std_deviation": 21634.664429528162, "std_deviation_bounds": { "upper": 52318.43092424462, "lower": -34220.22679386803 } } } } It also returns the sum of squares, variance, standard deviation, and standard deviation Bounds. Cardinality aggregation Finding the count of unique elements can be done with the cardinality aggregation. It is similar to finding the result of a query such as the following: select count(*) from (select distinct username from usageReport) u; Finding the cardinality or the number of unique values for a specific field is a very common requirement. If you have click-stream from the different visitors on your website, you may want to find out how many unique visitors you got in a given day, week, or month. Let us understand how we find out the count of unique users for which we have network traffic data: GET bigginsight/_search { "aggregations": { "unique_visitors": { "cardinality": { "field": "username" } } }, "size": 0 } The cardinality aggregation response is just like the other metric aggregations: { "took": 110, ..., "hits": { "total": 242836, "max_score": 0, "hits": [] }, "aggregations": { "unique_visitors": { "value": 79 } } } To summarize, we learned how to perform numerous metric aggregations on numeric datasets and easily deploy elasticsearch in building powerful analytics application. If you found this tutorial useful, do check out the book Learning Elastic Stack 6.0 to examine the fundamentals of Elastic Stack in detail and start developing solutions for problems like logging, site search, app search, metrics and more.      
Read more
  • 0
  • 0
  • 23309

article-image-tensorflow-models-mobile-embedded-devices
Savia Lobo
15 May 2018
12 min read
Save for later

How to Build TensorFlow Models for Mobile and Embedded devices

Savia Lobo
15 May 2018
12 min read
TensorFlow models can be used in applications running on mobile and embedded platforms. TensorFlow Lite and TensorFlow Mobile are two flavors of TensorFlow for resource-constrained mobile devices. TensorFlow Lite supports a subset of the functionality compared to TensorFlow Mobile. It results in better performance due to smaller binary size with fewer dependencies. The article covers topics for training a model to integrate TensorFlow into an application. The model can then be saved and used for inference and prediction in the mobile application. [box type="note" align="" class="" width=""]This article is an excerpt from the book Mastering TensorFlow 1.x written by Armando Fandango. This book will help you leverage the power of TensorFlow and Keras to build deep learning models, using concepts such as transfer learning, generative adversarial networks, and deep reinforcement learning.[/box] To learn how to use TensorFlow models on mobile devices, following topics are covered: TensorFlow on mobile platforms TF Mobile in Android apps TF Mobile demo on Android TF Mobile demo on iOS TensorFlow Lite TF Lite demo on Android TF Lite demo on iOS TensorFlow on mobile platforms TensorFlow can be integrated into mobile apps for many use cases that involve one or more of the following machine learning tasks: Speech recognition Image recognition Gesture recognition Optical character recognition Image or text classification Image, text, or speech synthesis Object identification To run TensorFlow on mobile apps, we need two major ingredients: A trained and saved model that can be used for predictions A TensorFlow binary that can receive the inputs, apply the model, produce the predictions, and send the predictions as output The high-level architecture looks like the following figure: The mobile application code sends the inputs to the TensorFlow binary, which uses the trained model to compute predictions and send the predictions back. TF Mobile in Android apps The TensorFlow ecosystem enables it to be used in Android apps through the interface class  TensorFlowInferenceInterface, and the TensorFlow Java API in the jar file libandroid_tensorflow_inference_java.jar. You can either use the jar file from the JCenter, download a precompiled jar from ci.tensorflow.org, or build it yourself. The inference interface has been made available as a JCenter package and can be included in the Android project by adding the following code to the build.gradle file: allprojects  { repositories  { jcenter() } } dependencies  { compile  'org.tensorflow:tensorflow-android:+' } Note : Instead of using the pre-built binaries from the JCenter, you can also build them yourself using Bazel or Cmake by following the instructions at this link: https://github.com/tensorflow/tensorflow/blob/r1.4/ tensorflow/contrib/android/README.md Once the TF library is configured in your Android project, you can call the TF model with the following four steps:  Load the model: TensorFlowInferenceInterface  inferenceInterface  = new  TensorFlowInferenceInterface(assetManager,  modelFilename);  Send the input data to the TensorFlow binary: inferenceInterface.feed(inputName, floatValues,  1,  inputSize,  inputSize,  3);  Run the prediction or inference: inferenceInterface.run(outputNames,  logStats);  Receive the output from the TensorFlow binary: inferenceInterface.fetch(outputName,  outputs); TF Mobile demo on Android In this section, we shall learn about recreating the Android demo app provided by the TensorFlow team in their official repo. The Android demo will install the following four apps on your Android device: TF  Classify: This is an object identification app that identifies the images in the input from the device camera and classifies them in one of the pre-defined classes. It does not learn new types of pictures but tries to classify them into one of the categories that it has already learned. The app is built using the inception model pre-trained by Google. TF  Detect: This is an object detection app that detects multiple objects in the input from the device camera. It continues to identify the objects as you move the camera around in continuous picture feed mode. TF  Stylize: This is a style transfer app that transfers one of the selected predefined styles to the input from the device camera. TF  Speech: This is a speech recognition app that identifies your speech and if it matches one of the predefined commands in the app, then it highlights that specific command on the device screen. Note: The sample demo only works for Android devices with an API level greater than 21 and the device must have a modern camera that supports FOCUS_MODE_CONTINUOUS_PICTURE. If your device camera does not have this feature supported, then you have to add the path submitted to TensorFlow by the author: https://github.com/ tensorflow/tensorflow/pull/15489/files. The easiest way to build and deploy the demo app on your device is using Android Studio. To build it this way, follow these steps:  Install Android Studio. We installed Android Studio on Ubuntu 16.04 from the instructions at the following link: https://developer.android.com/studio/ install.html  Check out the TensorFlow repository, and apply the patch mentioned in the previous tip. Let's assume you checked out the code in the tensorflow folder in your home directory.  Using Android Studio, open the Android project in the path ~/tensorflow/tensorflow/examples/Android.     Your screen will look similar to this:  Expand the Gradle Scripts option from the left bar and then open the  build.gradle file.  In the build.gradle file, locate the def  nativeBuildSystem definition and set it to 'none'. In the version of  the code we checked out, this definition is at line 43: def  nativeBuildSystem  =  'none'  Build the demo and run it on either a real or simulated device. We tested the app on these devices: 7.  You can also build the apk and install the apk file on the virtual or actual connected device. Once the app installs on the device, you will see the four apps we discussed earlier: You can also build the whole demo app from the source using Bazel or Cmake by following the instructions at this link: https://github.com/tensorflow/tensorflow/tree/r1.4/tensorflow/examples/android TF Mobile in iOS apps TensorFlow enables support for iOS apps by following these steps:  Include TF Mobile in your app by adding a file named Profile in the root directory of your project. Add the following content to the Profile: target  'Name-Of-Your-Project' pod  'TensorFlow-experimental'  Run the pod  install command to download and install the TensorFlow Experimental pod.  Run the myproject.xcworkspace command to open the workspace so you can add the      prediction code to your application logic. Note: To create your own TensorFlow binaries for iOS projects, follow the instructions at this link: https://github.com/tensorflow/tensorflow/ tree/master/tensorflow/examples/ios Once the TF library is configured in your iOS project, you can call the TF model with the following four steps:  Load the model: PortableReadFileToProto(file_path,  &tensorflow_graph);  Create a session: tensorflow::Status  s  =  session->Create(tensorflow_graph);  Run the prediction or inference and get the outputs: std::string  input_layer  =  "input"; std::string  output_layer  =  "output"; std::vector<tensorflow::Tensor>  outputs; tensorflow::Status  run_status  =  session->Run( {{input_layer,  image_tensor}}, {output_layer},  {},  &outputs);  Fetch the output data: tensorflow::Tensor*  output  =  &outputs[0]; TF Mobile demo on iOS In order to build the demo on iOS, you need Xcode 7.3 or later. Follow these steps to build the iOS demo apps:  Check out the TensorFlow code in a tensorflow folder in your home directory.  Open a terminal window and execute the following commands from your home folder to download the Inception V1 model, extract the label and graph files, and move these files into the data folders inside the sample app code: $ mkdir -p ~/Downloads $ curl -o ~/Downloads/inception5h.zip https://storage.googleapis.com/download.tensorflow.org/models/incep tion5h.zip && unzip ~/Downloads/inception5h.zip -d ~/Downloads/inception5h $ cp ~/Downloads/inception5h/* ~/tensorflow/tensorflow/examples/ios/benchmark/data/ $ cp ~/Downloads/inception5h/* ~/tensorflow/tensorflow/examples/ios/camera/data/ $ cp ~/Downloads/inception5h/* ~/tensorflow/tensorflow/examples/ios/simple/data/  Navigate to one of the sample folders and download the experimental pod: $ cd ~/tensorflow/tensorflow/examples/ios/camera $ pod install  Open the Xcode workspace: $ open tf_simple_example.xcworkspace  Run the sample app in the device simulator. The sample app will appear with a Run Model button. The camera app requires an Apple device to be connected, while the other two can run in a simulator too. TensorFlow Lite TF Lite is the new kid on the block and still in the developer view at the time of writing this book. TF Lite is a very small subset of TensorFlow Mobile and TensorFlow, so the binaries compiled with TF Lite are very small in size and deliver superior performance. Apart from reducing the size of binaries, TensorFlow employs various other techniques, such as: The kernels are optimized for various device and mobile architectures The values used in the computations are quantized The activation functions are pre-fused It leverages specialized machine learning software or hardware available on the device, such as the Android NN API The workflow for using the models in TF Lite is as follows:  Get the model: You can train your own model or pick a pre-trained model available from different sources, and use the pre-trained as is or retrain it with your own data, or retrain after modifying some parts of the model. As long as you have a trained model in the file with an extension .pb or .pbtxt, you are good to proceed to the next step. We learned how to save the models in the previous chapters.  Checkpoint the model: The model file only contains the structure of the graph, so you need to save the checkpoint file. The checkpoint file contains the serialized variables of the model, such as weights and biases. We learned how to save a checkpoint in the previous chapters.  Freeze the model: The checkpoint and the model files are merged, also known as freezing the graph. TensorFlow provides the freeze_graph tool for this step, which can be executed as follows: $ freeze_graph --input_graph=mymodel.pb --input_checkpoint=mycheckpoint.ckpt --input_binary=true --output_graph=frozen_model.pb --output_node_name=mymodel_nodes  Convert the model: The frozen model from step 3 needs to be converted to TF Lite format with the toco tool provided by TensorFlow: $ toco --input_file=frozen_model.pb --input_format=TENSORFLOW_GRAPHDEF --output_format=TFLITE --input_type=FLOAT --input_arrays=input_nodes --output_arrays=mymodel_nodes --input_shapes=n,h,w,c  The .tflite model saved in step 4 can now be used inside an Android or iOS app that employs the TFLite binary for inference. The process of including the TFLite binary in your app is continuously evolving, so we recommend the reader follows the information at this link to include the TFLite binary in your Android or iOS app: https://github.com/tensorflow/tensorflow/tree/master/ tensorflow/contrib/lite/g3doc Generally, you would use the graph_transforms:summarize_graph tool to prune the model obtained in step 1. The pruned model will only have the paths that lead from input to output at the time of inference or prediction. Any other nodes and paths that are required only for training or for debugging purposes, such as saving checkpoints, are removed, thus making the size of the final model very small. The official TensorFlow repository comes with a TF Lite demo that uses a pre-trained mobilenet to classify the input from the device camera in the 1001 categories. The demo app displays the probabilities of the top three categories. TF Lite Demo on Android To build a TF Lite demo on Android, follow these steps: Install Android Studio. We installed Android Studio on Ubuntu 16.04 from the instructions at the following link: https://developer.android.com/studio/ install.html Check out the TensorFlow repository, and apply the patch mentioned in the previous tip. Let's assume you checked out the code in the tensorflow folder in your home directory. Using Android Studio, open the Android project from the path ~/tensorflow/tensorflow/contrib/lite/java/demo. If it complains about a missing SDK or Gradle components, please install those components and sync Gradle. Build the project and run it on a virtual device with API > 21. We received the following warnings, but the build succeeded. You may want to resolve the warnings if the build fails: Warning:The  Jack  toolchain  is  deprecated  and  will  not run.  To  enable  support  for  Java  8 language  features  built into  the  plugin,  remove  'jackOptions  {  ...  }'  from  your build.gradle  file, and  add android.compileOptions.sourceCompatibility  1.8 android.compileOptions.targetCompatibility  1.8 Note:  Future  versions  of  the  plugin  will  not  support  usage 'jackOptions'  in  build.gradle. To learn  more,  go  to https://d.android.com/r/tools/java-8-support-message.html Warning:The  specified  Android  SDK  Build  Tools  version (26.0.1)  is  ignored,  as  it  is  below  the minimum  supported version  (26.0.2)  for  Android  Gradle  Plugin  3.0.1. Android  SDK  Build  Tools 26.0.2  will  be  used. To  suppress  this  warning,  remove  "buildToolsVersion '26.0.1'"  from  your  build.gradle  file,  as  each  version  of the  Android  Gradle  Plugin  now  has  a  default  version  of the  build  tools. TF Lite demo on iOS In order to build the demo on iOS, you need Xcode 7.3 or later. Follow these steps to build the iOS demo apps:  Check out the TensorFlow code in a  tensorflow folder in your home directory.  Build the TF Lite binary for iOS from the instructions at this link: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite  Navigate to the sample folder and download the pod: $ cd ~/tensorflow/tensorflow/contrib/lite/examples/ios/camera $ pod install  Open the Xcode workspace: $ open tflite_camera_example.xcworkspace  Run the sample app in the device simulator. We learned about using TensorFlow models on mobile applications and devices. TensorFlow provides two ways to run on mobile devices: TF Mobile and TF Lite. We learned how to build TF Mobile and TF Lite apps for iOs and Android. We used TensorFlow demo apps as an example.   If you found this post useful, do check out the book Mastering TensorFlow 1.x  to skill up for building smarter, faster, and efficient machine learning and deep learning systems. The 5 biggest announcements from TensorFlow Developer Summit 2018 Getting started with Q-learning using TensorFlow Implement Long-short Term Memory (LSTM) with TensorFlow  
Read more
  • 0
  • 0
  • 23157
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-microsoft-open-sources-infer-net-its-popular-model-based-machine-learning-framework
Melisha Dsouza
08 Oct 2018
3 min read
Save for later

Microsoft open sources Infer.NET, it’s popular model-based machine learning framework

Melisha Dsouza
08 Oct 2018
3 min read
Last week, Microsoft open sourced Infer.NET, the cross-platform framework used for model-based machine learning. This popular machine learning engine used in Office, Xbox and Azure, will be available on GitHub under the permissive MIT license for free use in commercial applications. Features of  Infer.NET The team at Microsoft Research in Cambridge initially envisioned Infer.NET as a research tool and released it for academic use in 2008. The framework has served as a base to publish hundreds of papers across a variety of fields, including information retrieval and healthcare. The team then started using the framework as a machine learning engine within a wide range of Microsoft products. A model-based approach to machine learning Infer.NET allows users to incorporate domain knowledge into their model. The framework can be used to build bespoke machine learning algorithms directly from their model. To sum it up, this framework actually constructs a learning algorithm for users based on the model they have provided. Facilitates interpretability Infer.NET also facilitates interpretability. If users have designed the model themselves and the learning algorithm follows that model, they can understand why the system behaves in a particular way or makes certain predictions. Probabilistic Approach In Infer.NET, models are described using a probabilistic program. This is used to describe real-world processes in a language that machines understand. Infer.NET compiles the probabilistic program into high-performance code for implementing something cryptically called deterministic approximate Bayesian inference. This approach allows a notable amount of scalability. For instance, it can be used in a system that automatically extracts knowledge from billions of web pages, comprising petabytes of data. Additional Features The framework also supports the ability of the system to learn as new data arrives. The team is also working towards developing and growing it further. Infer.NET will become a part of ML.NET (the machine learning framework for .NET developers). They have already set up the repository under the .NET Foundation and moved the package and namespaces to Microsoft.ML.Probabilistic.  Being cross platform, Infer.NET supports .NET Framework 4.6.1, .NET Core 2.0, and Mono 5.0. Windows users get to use Visual Studio 2017, while macOS and Linux folks have command-line options, which could be incorporated into the code wrangler of their choice. Download the framework to learn more about Infer.NET. You can also check the documentation for a detailed User Guide. To know more about this news, head over to Microsoft’s official blog. Microsoft announces new Surface devices to enhance user productivity, with style and elegance Neural Network Intelligence: Microsoft’s open source automated machine learning toolkit Microsoft’s new neural text-to-speech service lets machines speak like people
Read more
  • 0
  • 0
  • 23092

article-image-how-pytorch-is-bridging-the-gap-between-research-and-production-at-facebook-pytorch-team-at-f8-conference
Vincy Davis
04 Dec 2019
7 min read
Save for later

How PyTorch is bridging the gap between research and production at Facebook: PyTorch team at F8 conference

Vincy Davis
04 Dec 2019
7 min read
PyTorch, the machine learning library which was originally developed as a research framework by a Facebook intern in 2017, has now grown into a popular deep learning workflow. One of the most loved products by Facebook, PyTorch is free, open source, and used for applications like computer vision and natural language processing (NLP).  At the F8 conference held this year, the PyTorch team consisting of Joe Spisak, the project manager for PyTorch at Facebook AI and Dmytro Dzhulgakov, the tech lead at Facebook AI gave a talk on how Facebook is developing and scaling AI experiences with PyTorch.  Spisak describes PyTorch as an eager and graph-based execution that is defined by ‘run’. This means that when a user executes a Python code, it generates a graph on the fly. It is dynamic in nature and allows the compilation of the static graph. The dynamic neural networks are accessible, thus, allowing the user to change the parameters very quickly. This feature comes in handy for applications like control flow in NLP. Another important feature of PyTorch, according to Spisak, is the ability to generate accurately distributed training models that possess close to billion parameters, including the cutting-edge ones. It also has a simple and easy API that is very intuitive by nature. This is one of the qualities of PyTorch which has endeared many developers, claims Spisak.  Become a pro at Deep Learning with PyTorch! If you want to become an expert in building and training neural network models with high speed and flexibility in text, vision, and advanced analytics using PyTorch 1.x, read our book Deep Learning with PyTorch 1.x - Second Edition written by Sri. Yogesh K., Laura Mitchell, et al.  It will give you an insight into solving real-world problems using CNNs, RNNs, and LSTMs, along with discovering state-of-the-art modern deep learning architectures, such as ResNet, DenseNet, and Inception. How PyTorch is bridging the gap between research and production at Facebook Dzhulgakov points out how general advances in AI are driven by innovative research in the fields of academia or industry and why it’s necessary to bridge this big lag between research and production. He says, “If you have a new idea and you want to take it all the way through to deployment, you usually need to go through multiple steps - figure out what the approach is and then find the training data maybe prepare massage it a little bit. Actually, build and train your model after that and then there is this painful step of transferring your model to a production environment which often historically involved reimplementation of a lot of code so you can actually take and deploy it and scale-up.”  According to Dzhulgakov, PyTorch is trying to minimize this big gap by encouraging advances and experimentations in the field, so that the research is brought into production in a few days, instead of months. Challenges in bringing research to production Following are the various classes of challenges associated with bringing research to production, according to the PyTorch team. Hardware efficiency: In case of a tight latency constraint environment, users are required to fit all the hardware into the performance budget. On the other hand, an underused hardware environment can lead to an increase in cost. Scalability: In Facebook’s recent work, Dzhulgakov says, they have trained on billions of public images, thus indicating significant accuracy gains as compared to regular datasets like imageNet. Similarly, when models are taken to inference, it means that billions of inferences per second are running with multiple diverse models sharing the same hardware. Cross-platform: Neural networks are mostly not isolated as they need to be deployed inside their target application. It has a lot of interdependence with the surrounding code and application, thus posing different constraints like the user will not be able to run Python code or the user will have to work on very constrained computer capabilities if running a mobile device, and more. Reliability: A lot of PyTorch jobs run for multiple weeks on hundreds of GPUs, hence it is important to design a reliable software which can tolerate hardware failures and deliver results.  How PyTorch is tackling these challenges In order to tackle the above-listed challenges, Dzhulgakov says Facebook develops systems that can take up a training job and perform optimizations focused on performance for the performance-critical pieces. The system also applies “recipes for reliability” so that the developer written modeling code is automatically transformed. The Jit package comes into the picture here and acts like a key factor that is built to capture the structure of the Python program with minimal changes. The main goal of the Jit package is to make this process almost seamless. He asserts that PyTorch has been successful since it feels like regular programming in Python and most of its users start developing in traditional PyTorch mode (eager mode) just by writing and prototyping in the program. “For the subset of promising models which shall show what results you need to bring to production either scale up, so you can apply techniques provided by Jit to exist mental codes and annotated in order to run in so-called script code.”   The Jit is like a subset of Python with a thread list of request semantics, which allows the user to apply transparent transformations for the eager mode to the user. The annotations include adding a few lines of Python code on top of the function in such a way that it can be done incrementally on function by function or module by module fashion. This hybrid fashion ensures that the model works along the way. Such powerful PyTorch tools permit the user to share the same code base between research and production environments.  Next, Dzhulgakov deduces that the common factor between research and production is that both teams of developers work on the same code base built on top of PyTorch. Thus, they share the codes among the teams that have a common domain like text classification or object detection or reinforcement learning. These developers prototype models, train new algorithms and address new tasks for quickly transitioning this functionality to the opposite environment. Watch the full talk to see Dzhulgakov’s examples of PyTorch bridging the gap between research and production at Facebook. If you want to become an expert at implementing deep learning applications in PyTorch, check out our latest book Deep Learning with PyTorch 1.x - Second Edition written by Sri. Yogesh K., Laura Mitchell, and Et al. This book will show you how to apply neural networks to domains such as computer vision and NLP. It will also guide you to build, train, and scale a model with PyTorch and cover complex neural networks such as GANs and autoencoders for producing text and images. NVIDIA releases Kaolin, a PyTorch library to accelerate research in 3D computer vision and AI Introducing ESPRESSO, an open-source, PyTorch based, end-to-end neural automatic speech recognition (ASR) toolkit for distributed training across GPUs Facebook releases PyTorch 1.3 with named tensors, PyTorch Mobile, 8-bit model quantization, and more Transformers 2.0: NLP library with deep interoperability between TensorFlow 2.0 and PyTorch, and 32+ pretrained models in 100+ languages PyTorch announces the availability of PyTorch Hub for improving machine learning research reproducibility
Read more
  • 0
  • 0
  • 23083

article-image-dynamodb-best-practices
Packt
15 Sep 2015
24 min read
Save for later

DynamoDB Best Practices

Packt
15 Sep 2015
24 min read
 In this article by Tanmay Deshpande, the author of the book DynamoDB Cookbook, we will cover the following topics: Using a standalone cache for frequently accessed items Using the AWS ElastiCache for frequently accessed items Compressing large data before storing it in DynamoDB Using AWS S3 for storing large items Catching DynamoDB errors Performing auto-retries on DynamoDB errors Performing atomic transactions on DynamoDB tables Performing asynchronous requests to DynamoDB (For more resources related to this topic, see here.) Introduction We are going to talk about DynamoDB implementation best practices, which will help you improve the performance while reducing the operation cost. So let's get started. Using a standalone cache for frequently accessed items In this recipe, we will see how to use a standalone cache for frequently accessed items. Cache is a temporary data store, which will save the items in memory and will provide those from the memory itself instead of making a DynamoDB call. Make a note that this should be used for items, which you expect to not be changed frequently. Getting ready We will perform this recipe using Java libraries. So the prerequisite is that you should have performed recipes, which use the AWS SDK for Java. How to do it… Here, we will be using the AWS SDK for Java, so create a Maven project with the SDK dependency. Apart from the SDK, we will also be using one of the most widely used open source caches, that is, EhCache. To know about EhCache, refer to http://ehcache.org/. Let's use a standalone cache for frequently accessed items: To use EhCache, we need to include the following repository in pom.xml: <repositories> <repository> <id>sourceforge</id> <name>sourceforge</name> <url>https://oss.sonatype.org/content/repositories/ sourceforge-releases/</url> </repository> </repositories> We will also need to add the following dependency: <dependency> <groupId>net.sf.ehcache</groupId> <artifactId>ehcache</artifactId> <version>2.9.0</version> </dependency> Once the project setup is done, we will create a cachemanager class, which will be used in the following code: public class ProductCacheManager { // Ehcache cache manager CacheManager cacheManager = CacheManager.getInstance(); private Cache productCache; public Cache getProductCache() { return productCache; } //Create an instance of cache using cache manager public ProductCacheManager() { cacheManager.addCache("productCache"); this.productCache = cacheManager.getCache("productCache"); } public void shutdown() { cacheManager.shutdown(); } } Now, we will create another class where we will write a code to get the item from DynamoDB. Here, we will first initiate the ProductCacheManager: static ProductCacheManager cacheManager = new ProductCacheManager(); Next, we will write a method to get the item from DynamoDB. Before we fetch the data from DynamoDB, we will first check whether the item with the given key is available in cache. If it is available in cache, we will return it from cache itself. If the item is not found in cache, we will first fetch it from DynamoDB and immediately put it into cache. Once the item is cached, every time we need this item, we will get it from cache, unless the cached item is evicted: private static Item getItem(int id, String type) { Item product = null; if (cacheManager.getProductCache().isKeyInCache(id + ":" + type)) { Element prod = cacheManager.getProductCache().get(id + ":" + type); product = (Item) prod.getObjectValue(); System.out.println("Returning from Cache"); } else { AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); DynamoDB dynamoDB = new DynamoDB(client); Table table = dynamoDB.getTable("product"); product = table.getItem(new PrimaryKey("id", id, "type", type)); cacheManager.getProductCache().put( new Element(id + ":" + type, product)); System.out.println("Making DynamoDB Call for getting the item"); } return product; } Now we can use this method whenever needed. Here is how we can test it: Item product = getItem(10, "book"); System.out.println("First call :Item: " + product); Item product1 = getItem(10, "book"); System.out.println("Second call :Item: " + product1); cacheManager.shutdown(); How it works… EhCache is one of the most popular standalone caches used in the industry. Here, we are using EhCache to store frequently accessed items from the product table. Cache keeps all its data in memory. Here, we will save every item against its keys that are cached. We have the product table, which has the composite hash and range keys, so we will also store the items against the key of (Hash Key and Range Key). Note that caching should be used for only those tables that expect lesser updates. It should only be used for the table, which holds static data. If at all anyone uses cache for not so static tables, then you will get stale data. You can also go to the next level and implement a time-based cache, which holds the data for a certain time, and after that, it clears the cache. We can also implement algorithms, such as Least Recently Used (LRU), First In First Out (FIFO), to make the cache more efficient. Here, we will make comparatively lesser calls to DynamoDB, and ultimately, save some cost for ourselves. Using AWS ElastiCache for frequently accessed items In this recipe, we will do the same thing that we did in the previous recipe. The only thing we will change is that we will use a cloud hosted distributed caching solution instead of saving it on the local standalone cache. ElastiCache is a hosted caching solution provided by Amazon Web Services. We have two options to select which caching technology you would need. One option is Memcached and another option is Redis. Depending upon your requirements, you can decide which one to use. Here are links that will help you with more information on the two options: http://memcached.org/ http://redis.io/ Getting ready To get started with this recipe, we will need to have an ElastiCache cluster launched. If you are not aware of how to do it, you can refer to http://aws.amazon.com/elasticache/. How to do it… Here, I am using the Memcached cluster. You can choose the size of the instance as you wish. We will need a Memcached client to access the cluster. Amazon has provided a compiled version of the Memcached client, which can be downloaded from https://github.com/amazonwebservices/aws-elasticache-cluster-client-memcached-for-java. Once the JAR download is complete, you can add it to your Java Project class path: To start with, we will need to get the configuration endpoint of the Memcached cluster that we launched. This configuration endpoint can be found on the AWS ElastiCache console itself. Here is how we can save the configuration endpoint and port: static String configEndpoint = "my-elastic- cache.mlvymb.cfg.usw2.cache.amazonaws.com"; static Integer clusterPort = 11211; Similarly, we can instantiate the Memcached client: static MemcachedClient client; static { try { client = new MemcachedClient(new InetSocketAddress(configEndpoint, clusterPort)); } catch (IOException e) { e.printStackTrace(); } } Now, we can write the getItem method as we did for the previous recipe. Here, we will first check whether the item is present in cache; if not, we will fetch it from DynamoDB, and put it into cache. If the same request comes the next time, we will return it from the cache itself. While putting the item into cache, we are also going to put the expiry time of the item. We are going to set it to 3,600 seconds; that is, after 1 hour, the key entry will be deleted automatically: private static Item getItem(int id, String type) { Item product = null; if (null != client.get(id + ":" + type)) { System.out.println("Returning from Cache"); return (Item) client.get(id + ":" + type); } else { AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); DynamoDB dynamoDB = new DynamoDB(client); Table table = dynamoDB.getTable("product"); product = table.getItem(new PrimaryKey("id", id, "type", type)); System.out.println("Making DynamoDB Call for getting the item"); ElasticCache.client.add(id + ":" + type, 3600, product); } return product; } How it works… A distributed cache also works in the same fashion as the local one works. A standalone cache keeps the data in memory and returns it if it finds the key. In distributed cache, we have multiple nodes; here, keys are kept in a distributed manner. The distributed nature helps you divide the keys based on the hash value of the keys. So, when any request comes, it is redirected to a specified node and the value is returned from there. Note that ElastiCache will help you provide a faster retrieval of items at the additional cost of the ElastiCache cluster. Also note that the preceding code will work if you execute the application from the EC2 instance only. If you try to execute this on the local machine, you will get connection errors. Compressing large data before storing it in DynamoDB We are all aware of DynamoDB's storage limitations for the item's size. Suppose that we get into a situation where storing large attributes in an item is a must. In that case, it's always a good choice to compress these attributes, and then save them in DynamoDB. In this recipe, we are going to see how to compress large items before storing them. Getting ready To get started with this recipe, you should have your workstation ready with Eclipse or any other IDE of your choice. How to do it… There are numerous algorithms with which we can compress the large items, for example, GZIP, LZO, BZ2, and so on. Each algorithm has a trade-off between the compression time and rate. So, it's your choice whether to go with a faster algorithm or with an algorithm, which provides a higher compression rate. Consider a scenario in our e-commerce website, where we need to save the product reviews written by various users. For this, we created a ProductReviews table, where we will save the reviewer's name, its detailed product review, and the time when the review was submitted. Here, there are chances that the product review messages can be large, and it would not be a good idea to store them as they are. So, it is important to understand how to compress these messages before storing them. Let's see how to compress large data: First of all, we will write a method that accepts the string input and returns the compressed byte buffer. Here, we are using the GZIP algorithm for compressions. Java has a built-in support, so we don't need to use any third-party library for this: private static ByteBuffer compressString(String input) throws UnsupportedEncodingException, IOException { // Write the input as GZIP output stream using UTF-8 encoding ByteArrayOutputStream baos = new ByteArrayOutputStream(); GZIPOutputStream os = new GZIPOutputStream(baos); os.write(input.getBytes("UTF-8")); os.finish(); byte[] compressedBytes = baos.toByteArray(); // Writing bytes to byte buffer ByteBuffer buffer = ByteBuffer.allocate(compressedBytes.length); buffer.put(compressedBytes, 0, compressedBytes.length); buffer.position(0); return buffer; } Now, we can simply use this method to store the data before saving it in DynamoDB. Here is an example of how to use this method in our code: private static void putReviewItem() throws UnsupportedEncodingException, IOException { AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); DynamoDB dynamoDB = new DynamoDB(client); Table table = dynamoDB.getTable("ProductReviews"); Item product = new Item() .withPrimaryKey(new PrimaryKey("id", 10)) .withString("reviewerName", "John White") .withString("dateTime", "20-06-2015T08:09:30") .withBinary("reviewMessage", compressString("My Review Message")); PutItemOutcome outcome = table.putItem(product); System.out.println(outcome.getPutItemResult()); } In a similar way, we can write a method that decompresses the data on retrieval from DynamoDB. Here is an example: private static String uncompressString(ByteBuffer input) throws IOException { byte[] bytes = input.array(); ByteArrayInputStream bais = new ByteArrayInputStream(bytes); ByteArrayOutputStream baos = new ByteArrayOutputStream(); GZIPInputStream is = new GZIPInputStream(bais); int chunkSize = 1024; byte[] buffer = new byte[chunkSize]; int length = 0; while ((length = is.read(buffer, 0, chunkSize)) != -1) { baos.write(buffer, 0, length); } return new String(baos.toByteArray(), "UTF-8"); } How it works… Compressing data at client side has numerous advantages. Lesser size means lesser use of network and disk resources. Compression algorithms generally maintain a dictionary of words. While compressing, if they see the words getting repeated, then those words are replaced by their positions in the dictionary. In this way, the redundant data is eliminated and only their references are kept in the compressed string. While uncompressing the same data, the word references are replaced with the actual words, and we get our normal string back. Various compression algorithms contain various compression techniques. Therefore, the compression algorithm you choose will depend on your need. Using AWS S3 for storing large items Sometimes, we might get into a situation where storing data in a compressed format might not be sufficient enough. Consider a case where we might need to store large images or binaries that might exceed the DynamoDB's storage limitation per items. In this case, we can use AWS S3 to store such items and only save the S3 location in our DynamoDB table. AWS S3: Simple Storage Service allows us to store data in a cheaper and efficient manner. To know more about AWS S3, you can visit http://aws.amazon.com/s3/. Getting ready To get started with this recipe, you should have your workstation ready with the Eclipse IDE. How to do it… Consider a case in our e-commerce website where we would like to store the product images along with the product data. So, we will save the images on AWS S3, and only store their locations along with the product information in the product table: First of all, we will see how to store data in AWS S3. For this, we need to go to the AWS console, and create an S3 bucket. Here, I created a bucket called e-commerce-product-images, and inside this bucket, I created folders to store the images. For example, /phone/apple/iphone6. Now, let's write the code to upload the images to S3: private static void uploadFileToS3() { String bucketName = "e-commerce-product-images"; String keyName = "phone/apple/iphone6/iphone.jpg"; String uploadFileName = "C:\tmp\iphone.jpg"; // Create an instance of S3 client AmazonS3 s3client = new AmazonS3Client(new ProfileCredentialsProvider()); // Start the file uploading File file = new File(uploadFileName); s3client.putObject(new PutObjectRequest(bucketName, keyName, file)); } Once the file is uploaded, you can save its path in one of the attributes of the product table, as follows: private static void putItemWithS3Link() { AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); DynamoDB dynamoDB = new DynamoDB(client); Table table = dynamoDB.getTable("productTable"); Map<String, String> features = new HashMap<String, String>(); features.put("camera", "13MP"); features.put("intMem", "16GB"); features.put("processor", "Dual-Core 1.4 GHz Cyclone (ARM v8-based)"); Set<String> imagesSet = new HashSet<String>(); imagesSet.add("https://s3-us-west-2.amazonaws.com/ e-commerce-product-images/phone/apple/iphone6/iphone.jpg"); Item product = new Item() .withPrimaryKey(new PrimaryKey("id", 250, "type", "phone")) .withString("mnfr", "Apple").withNumber("stock", 15) .withString("name", "iPhone 6").withNumber("price", 45) .withMap("features", features) .withStringSet("productImages", imagesSet); PutItemOutcome outcome = table.putItem(product); System.out.println(outcome.getPutItemResult()); } So whenever required, we can fetch the item by its key, and fetch the actual images from S3 using the URL saved in the productImages attribute. How it works… AWS S3 provides storage services at very cheaper rates. It's like a flat data dumping ground where we can store any type of file. So, it's always a good option to store large datasets in S3 and only keep its URL references in DynamoDB attributes. The URL reference will be the connecting link between the DynamoDB item and the S3 file. If your file is too large to be sent in one S3 client call, you may want to explore its multipart API, which allows you to send the file in chunks. Catching DynamoDB errors Till now, we discussed how to perform various operations in DynamoDB. We saw how to use AWS provided by SDK and play around with DynamoDB items and attributes. Amazon claims that AWS provides high availability and reliability, which is quite true considering the years of experience I have been using their services, but we still cannot deny the possibility where services such as DynamoDB might not perform as expected. So, it's important to make sure that we have a proper error catching mechanism to ensure that the disaster recovery system is in place. In this recipe, we are going to see how to catch such errors. Getting ready To get started with this recipe, you should have your workstation ready with the Eclipse IDE. How to do it… Catching errors in DynamoDB is quite easy. Whenever we perform any operations, we need to put them in the try block. Along with it, we need to put a couple of catch blocks in order to catch the errors. Here, we will consider a simple operation to put an item into the DynamoDB table: try { AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); DynamoDB dynamoDB = new DynamoDB(client); Table table = dynamoDB.getTable("productTable"); Item product = new Item() .withPrimaryKey(new PrimaryKey("id", 10, "type", "mobile")) .withString("mnfr", "Samsung").withNumber("stock", 15) .withBoolean("isProductionStopped", true) .withNumber("price", 45); PutItemOutcome outcome = table.putItem(product); System.out.println(outcome.getPutItemResult()); } catch (AmazonServiceException ase) { System.out.println("Error Message: " + ase.getMessage()); System.out.println("HTTP Status Code: " + ase.getStatusCode()); System.out.println("AWS Error Code: " + ase.getErrorCode()); System.out.println("Error Type: " + ase.getErrorType()); System.out.println("Request ID: " + ase.getRequestId()); } catch (AmazonClientException e) { System.out.println("Amazon Client Exception :" + e.getMessage()); } We should first catch AmazonServiceException, which arrives if the service you are trying to access throws any exception. AmazonClientException should be put last in order to catch any client-related exceptions. How it works… Amazon assigns a unique request ID for each and every request that it receives. Keeping this request ID is very important if something goes wrong, and if you would like to know what happened, then this request ID is the only source of information. We need to contact Amazon to know more about the request ID. There are two types of errors in AWS: Client errors: These errors normally occur when the request we submit is incorrect. The client errors are normally shown with a status code starting with 4XX. These errors normally occur when there is an authentication failure, bad requests, missing required attributes, or for exceeding the provisioned throughput. These errors normally occur when users provide invalid inputs. Server errors: These errors occur when there is something wrong from Amazon's side and they occur at runtime. The only way to handle such errors is retries; and if it does not succeed, you should log the request ID, and then you can reach the Amazon support with that ID to know more about the details. You can read more about DynamoDB specific errors at http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ErrorHandling.html. Performing auto-retries on DynamoDB errors As mentioned in the previous recipe, we can perform auto-retries on DynamoDB requests if we get errors. In this recipe, we are going to see how to perform auto=retries. Getting ready To get started with this recipe, you should have your workstation ready with the Eclipse IDE. How to do it… Auto-retries are required if we get any errors during the first request. We can use the Amazon client configurations to set our retry strategy. By default, the DynamoDB client auto-retries a request if any error is generated three times. If we think that this is not efficient for us, then we can define this on our own, as follows: First of all, we need to create a custom implementation of RetryCondition. It contains a method called shouldRetry, which we need to implement as per our needs. Here is a sample CustomRetryCondition class: public class CustomRetryCondition implements RetryCondition { public boolean shouldRetry(AmazonWebServiceRequest originalRequest, AmazonClientException exception, int retriesAttempted) { if (retriesAttempted < 3 && exception.isRetryable()) { return true; } else { return false; } } } Similarly, we can implement CustomBackoffStrategy. The back-off strategy gives a hint on after what time the request should be retried. You can choose either a flat back-off time or an exponential back-off time: public class CustomBackoffStrategy implements BackoffStrategy { /** Base sleep time (milliseconds) **/ private static final int SCALE_FACTOR = 25; /** Maximum exponential back-off time before retrying a request */ private static final int MAX_BACKOFF_IN_MILLISECONDS = 20 * 1000; public long delayBeforeNextRetry(AmazonWebServiceRequest originalRequest, AmazonClientException exception, int retriesAttempted) { if (retriesAttempted < 0) return 0; long delay = (1 << retriesAttempted) * SCALE_FACTOR; delay = Math.min(delay, MAX_BACKOFF_IN_MILLISECONDS); return delay; } } Next, we need to create an instance of RetryPolicy, and set the RetryCondition and BackoffStrategy classes, which we created. Apart from this, we can also set a maximum number of retries. The last parameter is honorMaxErrorRetryInClientConfig. It means whether this retry policy should honor the maximum error retry set by ClientConfiguration.setMaxErrorRetry(int): RetryPolicy retryPolicy = new RetryPolicy(customRetryCondition, customBackoffStrategy, 3, false); Now, initiate the ClientConfiguration, and set the RetryPolicy we created earlier: ClientConfiguration clientConfiguration = new ClientConfiguration(); clientConfiguration.setRetryPolicy(retryPolicy); Now, we need to set this client configuration when we initiate the AmazonDynamoDBClient; and once done, your retry policy with a custom back-off strategy will be in place: AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider(), clientConfiguration); How it works… Auto-retries are quite handy when we receive a sudden burst in DynamoDB requests. If there are more number of requests than the provisioned throughputs, then auto-retries with an exponential back-off strategy will definitely help in handling the load. So if the client gets an exception, then it will get auto retried after sometime; and if by then the load is less, then there wouldn't be any loss for your application. The Amazon DynamoDB client internally uses HttpClient to make the calls, which is quite a popular and reliable implementation. So if you need to handle such cases, this kind of an implementation is a must. In case of batch operations, if any failure occurs, DynamoDB does not fail the complete operation. In case of batch write operations, if a particular operation fails, then DynamoDB returns the unprocessed items, which can be retried. Performing atomic transactions on DynamoDB tables I hope we are all aware that operations in DynamoDB are eventually consistent. Considering this nature it obviously does not support transactions the way we do in RDBMS. A transaction is a group of operations that need to be performed in one go, and they should be handled in an atomic nature. (If one operation fails, the complete transaction should be rolled back.) There might be use cases where you would need to perform transactions in your application. Considering this need, AWS has provided open sources, client-side transaction libraries, which helps us achieve atomic transactions in DynamoDB. In this recipe, we are going to see how to perform transactions on DynamoDB. Getting ready To get started with this recipe, you should have your workstation ready with the Eclipse IDE. How to do it… To get started, we will first need to download the source code of the library from GitHub and build the code to generate the JAR file. You can download the code from https://github.com/awslabs/dynamodb-transactions/archive/master.zip. Next, extract the code and run the following command to generate the JAR file: mvn clean install –DskipTests On a successful build, you will see a JAR generated file in the target folder. Add this JAR to the project by choosing a configure build path in Eclipse: Now, let's understand how to use transactions. For this, we need to create the DynamoDB client and help this client to create two helper tables. The first table would be the Transactions table to store the transactions, while the second table would be the TransactionImages table to keep the snapshots of the items modified in the transaction: AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); // Create transaction table TransactionManager.verifyOrCreateTransactionTable(client, "Transactions", 10, 10, (long) (10 * 60)); // Create transaction images table TransactionManager.verifyOrCreateTransactionImagesTable(client, "TransactionImages", 10, 10, (long) (60 * 10)); Next, we need to create a transaction manager by providing the names of the tables we created earlier: TransactionManager txManager = new TransactionManager(client, "Transactions", "TransactionImages"); Now, we create one transaction, and perform the operations you will need to do in one go. Consider our product table where we need to add two new products in one single transaction, and the changes will reflect only if both the operations are successful. We can perform these using transactions, as follows: Transaction t1 = txManager.newTransaction(); Map<String, AttributeValue> product = new HashMap<String, AttributeValue>(); AttributeValue id = new AttributeValue(); id.setN("250"); product.put("id", id); product.put("type", new AttributeValue("phone")); product.put("name", new AttributeValue("MI4")); t1.putItem(new PutItemRequest("productTable", product)); Map<String, AttributeValue> product1 = new HashMap<String, AttributeValue>(); id.setN("350"); product1.put("id", id); product1.put("type", new AttributeValue("phone")); product1.put("name", new AttributeValue("MI3")); t1.putItem(new PutItemRequest("productTable", product1)); t1.commit(); Now, execute the code to see the results. If everything goes fine, you will see two new entries in the product table. In case of an error, none of the entries would be in the table. How it works… The transaction library when invoked, first writes the changes to the Transaction table, and then to the actual table. If we perform any update item operation, then it keeps the old values of that item in the TransactionImages table. It also supports multi-attribute and multi-table transactions. This way, we can use the transaction library and perform atomic writes. It also supports isolated reads. You can refer to the code and examples for more details at https://github.com/awslabs/dynamodb-transactions. Performing asynchronous requests to DynamoDB Till now, we have used a synchronous DynamoDB client to make requests to DynamoDB. Synchronous requests block the thread unless the operation is not performed. Due to network issues, sometimes, it can be difficult for the operation to get completed quickly. In that case, we can go for asynchronous client requests so that we submit the requests and do some other work. Getting ready To get started with this recipe, you should have your workstation ready with the Eclipse IDE. How to do it… Asynchronous client is easy to use: First, we need to the AmazonDynamoDBAsync class: AmazonDynamoDBAsync dynamoDBAsync = new AmazonDynamoDBAsyncClient( new ProfileCredentialsProvider()); Next, we need to create the request to be performed in an asynchronous manner. Let's say we need to delete a certain item from our product table. Then, we can create the DeleteItemRequest, as shown in the following code snippet: Map<String, AttributeValue> key = new HashMap<String, AttributeValue>(); AttributeValue id = new AttributeValue(); id.setN("10"); key.put("id", id); key.put("type", new AttributeValue("phone")); DeleteItemRequest deleteItemRequest = new DeleteItemRequest( "productTable", key); Next, invoke the deleteItemAsync method to delete the item. Here, we can optionally define AsyncHandler if we want to use the result of the request we had invoked. Here, I am also printing the messages with time so that we can confirm its asynchronous nature: dynamoDBAsync.deleteItemAsync(deleteItemRequest, new AsyncHandler<DeleteItemRequest, DeleteItemResult>() { public void onSuccess(DeleteItemRequest request, DeleteItemResult result) { System.out.println("Item deleted successfully: "+ System.currentTimeMillis()); } public void onError(Exception exception) { System.out.println("Error deleting item in async way"); } }); System.out.println("Delete item initiated" + System.currentTimeMillis()); How it works Asynchronous clients use AsyncHttpClient to invoke the DynamoDB APIs. This is a wrapper implementation on top of Java asynchronous APIs. Hence, they are quite easy to use and understand. The AsyncHandler is an optional configuration you can do in order to use the results of asynchronous calls. We can also use the Java Future object to handle the response. Summary We have covered various recipes on cost and performance efficient use of DynamoDB. Recipes like error handling and auto retries helps readers in make their application robust. It also highlights use of transaction library in order to implement atomic transaction on DynamoDB. Resources for Article: Further resources on this subject: The EMR Architecture[article] Amazon DynamoDB - Modelling relationships, Error handling[article] Index, Item Sharding, and Projection in DynamoDB [article]
Read more
  • 0
  • 0
  • 23026

article-image-perform-regression-analysis-using-sas
Gebin George
27 Feb 2018
7 min read
Save for later

How to perform regression analysis using SAS

Gebin George
27 Feb 2018
7 min read
[box type="note" align="" class="" width=""]This article is an excerpt from the book, Big Data Analysis with SAS written by David Pope. This book will help you leverage the power of SAS for data management, analysis and reporting. It contains practical use-cases and real-world examples on predictive modelling, forecasting, optimizing, and reporting your Big Data analysis using SAS.[/box] Today, we will perform regression analysis using SAS in a step-by-step manner with a practical use-case. Regression analysis is one of the earliest predictive techniques most people learn because it can be applied across a wide variety of problems dealing with data that is related in linear and non-linear ways. Linear data is one of the easier use cases, and as such PROC REG is a well-known and often-used procedure to help predict likely outcomes before they happen. The REG procedure provides extensive capabilities for fitting linear regression models that involve individual numeric independent variables. Many other procedures can also fit regression models, but they focus on more specialized forms of regression, such as robust regression, generalized linear regression, nonlinear regression, nonparametric regression, quantile regression, regression modeling of survey data, regression modeling of survival data, and regression modeling of transformed variables. The SAS/STAT procedures that can fit regression models include the ADAPTIVEREG, CATMOD, GAM, GENMOD, GLIMMIX, GLM, GLMSELECT, LIFEREG, LOESS, LOGISTIC, MIXED, NLIN, NLMIXED, ORTHOREG, PHREG, PLS, PROBIT, QUANTREG, QUANTSELECT, REG, ROBUSTREG, RSREG, SURVEYLOGISTIC, SURVEYPHREG, SURVEYREG, TPSPLINE, and TRANSREG procedures. Several procedures in SAS/ETS software also fit regression models. SAS/STAT14.2 / SAS/STAT User's Guide - Introduction to Regression Procedures - Overview: Regression Procedures (http://documentation.sas.com/?cdcId=statcdccdcVersion=14.2 docsetId=statugdocsetTarget=statug_introreg_sect001.htmlocale=enshowBanner=yes). Regression analysis attempts to model the relationship between a response or output variable and a set of input variables. The response is considered the target variable or the variable that one is trying to predict, while the rest of the input variables make up parameters used as input into the algorithm. They are used to derive the predicted value for the response variable. PROC REG One of the easiest ways to determine if regression analysis is applicable to helping you answer a question is if the type of question being asked has only two answers. For example, should a bank lend an applicant money? Yes or no? This is known as a binary response, and as such, regression analysis can be applied to help determine the answer. In the following example, the reader will use the SASHELP.BASEBALL dataset to create a regression model to predict the value of a baseball player's salary. The SASHELP.BASEBALL dataset contains salary and performance information for Major League. Baseball players who played at least one game in both the 1986 and 1987 seasons, excluding pitchers. The salaries (Sports Illustrated, April 20, 1987) are for the 1987 season and the performance measures are from 1986 (Collier Books, The 1987 Baseball Encyclopedia Update). SAS/STAT® 14.2 / SAS/STAT User's Guide - Example 99: Modeling Salaries of Major League Baseball Players (http://documentation.sas.com/ ?cdcId= statcdc cdcVersion= 14.2 docsetId=statugdocsetTarget= statug_ reg_ examples01.htmlocale= en showBanner= yes). Let's first use PROC UNIVARIATE to learn something about this baseball data by submitting the following code: proc univariate data=sashelp.baseball; quit; While reviewing the results of the output, the reader will notice that the variance associated with logSalary, 0.79066, is much less than the variance associated with the actual target variable Salary, 203508. In this case, it makes better sense to attempt to predict the logSalary value of a player instead of Salary. Write the following code in a SAS Studio program section and submit it: proc reg data=sashelp.baseball; id name team league; model logSalary = nAtBat nHits nHome nRuns nRBI YrMajor CrAtBat CrHits CrHome CrRuns CrRbi; Quit; Notice that there are 59 observations as specified in the first output table with at least one of the input variables with missing values; as such those are not used in the development of the regression model. The Root Mean Squared Error (RMSE) and R-square are statistics that typically inform the analyst how good the model is in predicting the target. These range from 0 to 1.0 with higher values typically indicating a better model. The higher the Rsquared values typically indicate a better performing model but sometimes conditions or the data used to train the model over-fit and don't represent the true value of the prediction power of that particular model. Over-fitting can happen when an analyst doesn't have enough real-life data and chooses data or a sample of data that over-presents the target event, and therefore it will produce a poor performing model when using real-world data as input. Since several of the input values appear to have little predictive power on the target, an analyst may decide to drop these variables, thereby reducing the need for that information to make a decent prediction. In this case, it appears we only need to use four input variables. YrMajor, nHits, nRuns, and nAtBat. Modify the code as follows and submit it again: proc reg data=sashelp.baseball; id name team league; model logSalary = YrMajor nHits nRuns nAtBat; Quit; The p-value associated with each of the input variables provides the analyst with an insight into which variables have the biggest impact on helping to predict the target variable. In this case, the smaller the value, the higher the predictive value of the input variable. Both the RMSE and R-square values for this second model are slightly lower than the original. However, the adjusted R-square value is slightly higher. In this case, an analyst may chose to use the second model since it requires much less data and provides basically the same predictive power. Prior to accepting any model, an analyst should determine whether there are a few observations that may be over-influencing the results by investigating the influence and fit diagnostics. The default output from PROC REG provides this type of visual insight: The top-right corner plot, showing the externally studentized residuals (RStudent) by leverage values, shows that there are a few observations with high leverage that may be overly influencing the fit produced. In order to investigate this further, we will add a plots statement to our PROC REG to produce a labeled version of this plot. Type the following code in a SAS Studio program section and submit: proc reg data=sashelp.baseball plots(only label)=(RStudentByLeverage); id name team league; model logSalary = YrMajor nHits nRuns nAtBat; Quit; Sure enough, there are three to five individuals whose input variables may have excessive influence on fitting this model. Let's remove those points and see if the model improves. Type this code in a SAS Studio program section and submit it: proc reg data=sashelp.baseball plots=(residuals(smooth)); where name NOT IN ("Mattingly, Don", "Henderson, Rickey", "Boggs, Wade", "Davis, Eric", "Rose, Pete"); id name team league; model logSalary = YrMajor nHits nRuns nAtBat; Quit; This change, in itself, has not improved the model but actually made the model worse as can be seen by the R-square, 0.5592. However, the plots residuals(smooth) option gives some insights as it pertains to YrMajor; players at the beginning and the end of their careers tend to be paid less compared to others, as can be seen in Figure 4.12: In order to address this lack of fit, an analyst can use polynomials of degree two for this variable, YrMajor. Type the following code in a SAS Studio program section and submit it: data work.baseball; set sashelp.baseball; where name NOT IN ("Mattingly, Don", "Henderson, Rickey", "Boggs, Wade", "Davis, Eric", "Rose, Pete"); YrMajor2 = YrMajor*YrMajor; run; proc reg data=work.baseball; id name team league; model logSalary = YrMajor YrMajor2 nHits nRuns nAtBat; Quit; After removing some outliers and adjusting for the YrMajor variable, the model's predictive power has improved significantly as can be seen in the much improved R-square value of 0.7149. We saw an effective way of performing regression analysis using SAS platform. If you found our post useful, do check out this book Big Data Analysis with SAS to understand other data analysis models and perform them practically using SAS.    
Read more
  • 0
  • 0
  • 22959
article-image-learn-to-build-a-scatterplot-in-ibm-spss
Kartikey Pandey
27 Nov 2017
4 min read
Save for later

How to build a Scatterplot in IBM SPSS

Kartikey Pandey
27 Nov 2017
4 min read
[box type="note" align="" class="" width=""] ----The following excerpt is from the title Data Analysis with IBM SPSS Statistics, Chapter 5, written by Kenneth Stehlik-Barry and Anthony J. Babinec. Analytical tools such as SPSS can readily provide even a novice user with an overwhelming amount of information and a broad range of options for analyzing patterns in the data. [/box] In this article we help you learn the techniques of SPSS to build Scatterplot using the Chart Builder feature. One of the most valuable methods for examining the relationship between two variables containing scale-level data is a scatterplot. In the previous chapter, scatterplots were used to detect points that deviated from the typical pattern--multivariate outliers. To produce a similar scatterplot using two fields from the 2016 General Social Survey data, navigate to Graphs | Chart Builder. An information box is displayed indicating that each field's measurement properties will be used to identify the types of graphs available so adjusting these properties is advisable. In this example, the properties will be modified as a part of the graph specification process but you may want to alter the properties of some variables permanently so that they don't need to be changed for each use. For now, just select OK to move ahead. In the main Chart Builder window, select Scatter/Dot from the menu at the lower left, double-click on the first graph to the right (Simple Scatter) to place it in the preview pane at the upper right, and then right-click on the first field labeled HIGHEST YEAR OF SCHOOL. Change this variable from Nominal to Scale, as shown in the following Screenshot: After changing the respondent's education to Scale, drag this field to the X-Axis location in the preview pane and drag spouse's education to the Y-Axis location. Once both elements are in place, the OK choice will become available. Select it to produce the scatterplot in the following screenshot: The scatterplot produced by default provides some sense of the trend in that the denser circles are concentrated in a band from the lower left to the upper right. This pattern, however, is rather subtle visually. With some editing, the relationship can be made more Evident. Double-click on the graph to open the Chart Editor and select the X icon at the top and change the major increment to 4 so that there are numbers corresponding to completing high school and college. Do the same for the y-axis values. Select a point on the graph to highlight all the "dots" and right-click to display the following dialog. Click on the Marker tab and change the symbol to the star shape, increase the size to 6, increase the border to 2, and change the border color to a dark blue. Use Apply to make the changes visible on the scatterplot: Use the Add Fit line at Total icon above the graph to show the regression line for this data. Drag the R2 box from the upper right to the bottom, below the graph and drag the box on the graphs with the equation displayed to the lower left away from the points: The modifications to the original scatterplot make it easier to see the pattern since the “stars” near the line are darker and denser than those farther from the line indicating fewer cases are associated with those points. The SPSS capabilities with respect to scatterplot in this article will give you a foundation to create a visual representation of data for both deeper pattern discovery and to communicate results to a broader audience. Several other graph types such as pie charts and multiple line charts  can be built and edited using the approach shown in Chapter 5, Visually Exploring the Data from our title Data Analysis with IBM SPSS Statistics. Go on, explore these alternative graph styles to see when they may be better suited to your needs.  
Read more
  • 0
  • 0
  • 22902

article-image-use-m-functions-within-power-bi-querying-data
Amarabha Banerjee
21 May 2018
10 min read
Save for later

How to use M functions within Microsoft Power BI for querying data

Amarabha Banerjee
21 May 2018
10 min read
Microsoft Power BI Desktop contains a rich set of data source connectors and transformation capabilities that support the integration and enhancement of source data. These features are all driven by a powerful functional language and query engine, M, which leverages source system resources when possible and can greatly extend the scope and robustness of the data retrieval process beyond the possibilities of the standard query editor interface alone. As with almost all BI projects, the design and development of the data access and retrieval process has great implications for the analytical value, scalability, and sustainability of the overall Power BI solution. [box type="note" align="" class="" width=""]Our article is an excerpt from the book Microsoft Power BI Cookbook, written by Brett Powell. This book shows how to leverage  Microsoft Power BI and the development tools to create better data driven analytics and visualizations. [/box] In this article, we dive into Power BI Desktop's Get Data experience and go through the process of establishing and managing data source connections and queries. Examples are provided of using the Query Editor interface and the M language directly to construct and refine queries to meet common data transformation and cleansing needs. In practice and as per the examples, a combination of both tools is recommended to aid the query development process. Viewing and analyzing M functions Every time you click on a button to connect to any of Power BI Desktop's supported data sources or apply any transformation to a data source object, such as changing a column's data type, one or multiple M expressions are created reflecting your choices. These M expressions are automatically written to dedicated M documents and, if saved, are stored within the Power BI Desktop file as Queries. M is a functional programming language like F#, and it's important that Power BI developers become familiar with analyzing and later writing and enhancing the M code that supports their queries. Getting ready Build a query through the user interface that connects to the AdventureWorksDW2016CTP3 SQL Server database on the ATLAS server and retrieves the DimGeography table, filtered by United States for English. Click on Get Data from the Home tab of the ribbon, select SQL Server from the list of database sources, and provide the server and database names. For the Data Connectivity mode, select Import. A navigation window will appear, with the different objects and schemas of the database. Select the DimGeography table from the Navigation window and click on Edit. In the Query Editor window, select the EnglishCountryRegionName column and then filter on United States from its dropdown. Figure 2: Filtering for United States only in the Query Editor At this point, a preview of the filtered table is exposed in the Query Editor and the Query Settings pane displays the previous steps. Figure 3: The Query Settings pane in the Query Editor How to do it Formula Bar With the Formula Bar visible in the Query Editor, click on the Source step under Applied Steps in the Query Settings pane. You should see the following formula expression: Figure 4: The SQL.Database() function created for the Source step Click on the Navigation step to expose the following expression: Figure 5: The metadata record created for the Navigation step The navigation expression (2) references the source expression (1) The Formula Bar in the Query Editor displays individual query steps, which are technically individual M expressions It's convenient and very often essential to view and edit all the expressions in a centralized window, and for this, there's the Advanced Editor M is a functional language, and it can be useful to think of query evaluation in M as similar to Excel spreadsheet formulas in which multiple formulas can reference each other. The M engine can determine which expressions are required by the final expression to return and evaluate only those expressions. Configuring Power BI Development Tools, the display setting for both the Query Settings pane and the Formula bar should be enabled as GLOBAL | Query Editor options. Figure 6: Global layout options for the Query Editor Alternatively, on a per file basis, you can control these settings and others from the View tab of the Query Editor toolbar. Figure 7: Property settings of the View tab in the Query Editor Advanced Editor window Given its importance to the query development process, the Advanced Editor dialog is exposed on both the Home and View tabs of the Query Editor. It's recommended to use the Query Editor when getting started with a new query and when learning the M language. After several steps have been applied, use the Advanced Editor to review and optionally enhance or customize the M query. As a rich, functional programming language, there are many M functions and optional parameters not exposed via the Query Editor; going beyond the limits of the Query Editor enables more robust data retrieval and integration processes. Figure 8: The Home tab of the Query Editor Click on Advanced Editor from either the View or Home tabs (Figure 8 and Figure 9, respectively). All M function expressions and any comments are exposed Figure 9: The Advanced Editor view of the DimGeography query When developing retrieval processes for Power BI models, consider these common ETL questions: How are our queries impacting the source systems? Can we make our retrieval queries more resilient to changes in source data such that they avoid failure? Is our retrieval process efficient and simple to follow and support or are there unnecessary steps and queries? Are our retrieval queries delivering sufficient performance to the BI application? Is our process flexible such that we can quickly apply changes to data sources and logic? M queries are not intended as a substitute for the workloads typically handled by enterprise ETL tools such as SSIS or Informatica. However, just as BI professionals would carefully review the logic and test the performance of SQL stored procedures and ETL packages supporting their various cubes and reports environment, they should also review the M queries created to support Power BI models and reports. How it works Two of the top performance and scalability features of M's engine are Query Folding and Lazy Evaluation. If possible, the M queries developed in Power BI Desktop are converted (folded) into SQL statements and passed to source systems for processing. M can also reduce the required resources for a given query by ignoring any unnecessary or redundant steps (variables). M is a case-sensitive language. This includes referencing variables in M expressions (RenameColumns versus Renamecolumns) as well as the values in M queries. For example, the values "Apple" and "apple" are considered unique values in an M query; the Table.Distinct() function will not remove rows for one of the values. Variable names in M expressions cannot have spaces without a hash sign and double quotes. Per Figure 10, when the Query Editor graphical interface is used to create M queries this syntax is applied automatically, along with a name describing the M transformation applied. Applying short, descriptive variable names (with no spaces) improves the readability of M queries.  Query folding The query from this recipe was "folded" into the following SQL statement and sent to the ATLAS server for processing. Figure 10: The SQL statement generated from the DimGeography M query Right-click on the Filtered Rows step and select View Native Query to access the Native Query window from Figure 11: Figure 11: View Native Query in Query Settings Finding and revising queries that are not being folded to source systems is a top technique for enhancing large Power BI datasets. See the Pushing Query Processing Back to Source Systems recipe of Chapter 11, Enhancing and Optimizing Existing Power BI Solutions for an example of this process. M query structure The great majority of queries created for Power BI will follow the let...in structure as per this recipe, as they contain multiple steps with dependencies among them. Individual expressions are separated by commas. The expression referred to following the in keyword is the expression returned by the query. The individual step expressions are technically "variables", and if the identifiers for these variables (the names of the query steps) contain spaces then the step is placed in quotes, and prefixed with a # sign as per the Filtered Rows step in Figure 10. Lazy evaluation The M engine also has powerful "lazy evaluation" logic for ignoring any redundant or unnecessary variables, as well as short-circuiting evaluation (computation) once a result is determinate, such as when one side (operand) of an OR logical operator is computed as True. The order of evaluation of the expressions is determined at runtime; it doesn't have to be sequential from top to bottom. In the following example, a step for retrieving Canada was added and the step for the United States was ignored. Since the CanadaOnly variable satisfies the overall let expression of the query, only the Canada query is issued to the server as if the United States row were commented out or didn't exist. Figure 12: Revised query that ignores Filtered Rows step to evaluate Canada only View Native Query (Figure 12) is not available given this revision, but a SQL Profiler trace against the source database server (and a refresh of the M query) confirms that CanadaOnly was the only SQL query passed to the source database. Figure 13: Capturing the SQL statement passed to the server via SQL Server Profiler trace There's more Partial query folding A query can be "partially folded", in which a SQL statement is created resolving only part of an overall query The results of this SQL statement would be returned to Power BI Desktop (or the on-premises data gateway) and the remaining logic would be computed using M's in-memory engine with local resources M queries can be designed to maximize the use of the source system resources, by using standard expressions supported by query folding early in the query process Minimizing the use of local or on-premises data gateway resources is a top consideration Limitations of query folding No folding will take place once a native SQL query has been passed to the source system. For example, passing a SQL query directly through the Get Data dialog. The following query, specified in the Get Data dialog, is included in the Source Step: Figure 14: Providing a user defined native SQL query Any transformations applied after this native query will use local system resources. Therefore, the general implication for query development with native or user-defined SQL queries is that if they're used, try to include all required transformations (that is, joins and derived columns), or use them to utilize an important feature of the source database not being utilized by the folded query, such as an index. Not all data sources support query folding, such as text and Excel files. Not all transformations available in the Query Editor or via M functions directly are supported by some data sources. The privacy levels defined for the data sources will also impact whether folding is used or not. SQL statements are not parsed before they're sent to the source system. The Table.Buffer() function can be used to avoid query folding. The table output of this function is loaded into local memory and transformations against it will remain local. We have discussed effective techniques for accessing and retrieving data using Microsoft Power BI. Do check out this book Microsoft Power BI Cookbook for more information on using Microsoft power BI for data analysis and visualization. Expert Interview: Unlocking the secrets of Microsoft Power BI Tutorial: Building a Microsoft Power BI Data Model Expert Insights:Ride the third wave of BI with Microsoft Power BI    
Read more
  • 0
  • 0
  • 22707

article-image-tensorflow-js-contributor-kai-sasaki-on-how-tensorflow-js-eases-web-based-machine-learning-application-development
Sugandha Lahoti
28 Nov 2019
6 min read
Save for later

TensorFlow.js contributor Kai Sasaki on how TensorFlow.js eases web-based machine learning application development

Sugandha Lahoti
28 Nov 2019
6 min read
Running Machine Learning applications on the web browser is one of the hottest trends in software development right now. Many notable machine learning projects are being built with Tensorflow.js. It is one of the most popular frameworks for building performant machine learning applications that run smoothly in a web browser. Recently, we spoke with Kai Sasaki, who is one of the initial contributors to TensorFlow.js. He talked about current and future versions of TF.js, how it compares to other browser-based ML tools and his contributions to the community. He also shared his views on why he thinks Javascript good for Machine Learning. If you are a web developer with working knowledge of Javascript who wants to learn how to integrate machine learning techniques with web-based applications, we recommend you to read the book, Hands-on Machine Learning with TensorFlow.js. This hands-on course covers important aspects of machine learning with TensorFlow.js using practical examples. Throughout the course, you'll learn how different algorithms work and follow step-by-step instructions to implement them through various examples. On how TensorFlow.js has improved web-based machine learning How do you think Machine Learning for the Web has evolved in the last 2-3 years? What are some current applications of web-based machine learning and TensorFlow.js? What can we expect in future releases? Machine Learning on the web platform is a field attracting more developers and machine learning practitioners. There are two reasons. First, the web platform is universally available. The web browser mostly provides us a way to access the underlying resource transparently. The second reason is security.raining a model on the client-side means you can keep sensitive data inside the client environment as the entire training process is completed on the client-side itself. The data is not sent to the cloud, making it more secure and less susceptible to vulnerabilities or hacking. In future releases as well, TensorFlow.js is expected to provide more secure and accessible functionalities. You can find various kinds of TensorFlow.js based applications here. How does TensorFlow.js compare with other web and browser-based machine learning tools? Does it make web-based machine learning application development easier? The most significant advantage of TensorFlow.js is the full compatibility of the TensorFlow ecosystem. Not only can a TensorFlow model be seamlessly used in TensorFlow.js, tools for visualization and model deployment in the TensorFlow ecosystem can also be used in TensorFlow.js. TensorFlow 2 was released in October. What are some new changes made specific to TensorFlow.js as a part of TF 2.0 that machine learning developers will find useful? What are your first impressions of this new release? Although there is nothing special related to TensorFlow 2.0, the full support of new backends is actively developed, such as WASM and WebGPU. These hardware acceleration mechanisms provided by the web platform can enhance performance for any TensorFlow.js application. It surely makes the potential of TensorFlow.js stronger and possible use cases broader. On Kai’s experience working on his book, Hands-on Machine Learning with TensorFlow.js Tell us the motivation behind writing your book Hands-on Machine Learning with TensorFlow.js. What are some of your favorite chapters/projects from the book? TensorFlow.js does not have much history because only three years have passed since its initial publication. Due to the lack of resources to learn TensorFlow.js usage, I was motivated to write a book illustrating how to using TensorFlow.js practically. I think chapters 4 - 9 of my book Hands-On Machine Learning with TensorFlow.js provide readers good material to practice how to write the ML application with TensorFlow.js. Why Javascript for Machine Learning Why do you think Javascript is good for Machine Learning? What are some of the good machine learning packages available in Javascript? How does it compare to other languages like Python, R, Matlab, especially in terms of performance? JavaScript is a primary programming language in the web platform so it can work as a bridge between the web and machine learning applications. We have several other libraries working similarly. For example, machinelearn.js is a general machine learning framework running with JavaScript. Although JavaScript is not a highly performant language, its universal availability in the web platform is attractive to developers as they can build their machine learning applications that are “write once, run anywhere”. We can compare the performance by running state-of-the-art machine learning models such as MobileNet or ResNet practically. On his contribution towards TF.js You are a contributor for TensorFlow.js and were awarded by the Google Open Source Peer Bonus Program. What were your main contributions? How was your experience working for TF.js? One of the significant contributions I have made was fast Fourier transformation operations. I have created the initial implementation of fft, ifft, rfft and irfft. I also added stft (short term Fourier transformation). These operators are mainly used for performing signal analysis for audio applications. I have done several bug fixes and test enhancements in TensorFlow.js too. What are the biggest challenges today in the field of Machine Learning and AI in web development? What do you see as some of the greatest technology disruptors in the next 5 years? While many developers are writing Python programming languages in the machine learning field, not many web developers have familiarity and knowledge of machine learning in spite of the substantial advantage of the integration between machine learning and web platform. I believe machine learning technologies will be democratized among web developers so that a vast amount of creativity is flourished in the next five years. By cooperating with these enthusiastic developers in the community, I believe the machine learning on the client-side or edge device will be one of the major contributions in the machine learning field. About the author Kai Sasaki works as a software engineer in Treasure Data to build large-scale distributed systems. He is one of the initial contributors to TensorFlow.js and contributes to developing operators for newer machine learning models. He has also received the Google Open Source Peer Bonus in 2018. You can find him on Twitter, Linkedin, and GitHub. About the book Hands-On Machine Learning with TensorFlow.js is a comprehensive guide that will help you easily get started with machine learning algorithms and techniques using TensorFlow.js. Throughout the course, you'll learn how different algorithms work and follow step-by-step instructions to implement them through various examples. By the end of this book, you will be able to create and optimize your own web-based machine learning applications using practical examples. Baidu adds Paddle Lite 2.0, new development kits, EasyDL Pro, and other upgrades to its PaddlePaddle platform. Introducing Spleeter, a Tensorflow based python library that extracts voice and sound from any music track TensorFlow 2.0 released with tighter Keras integration, eager execution enabled by default, and more!
Read more
  • 0
  • 0
  • 22626
article-image-tracking-faces-haar-cascades
Packt
13 May 2013
4 min read
Save for later

OpenCV: Tracking Faces with Haar Cascades

Packt
13 May 2013
4 min read
Conceptualizing Haar cascades When we talk about classifying objects and tracking their location, what exactly are we hoping to pinpoint? What constitutes a recognizable part of an object? Photographic images, even from a webcam, may contain a lot of detail for our (human) viewing pleasure. However, image detail tends to be unstable with respect to variations in lighting, viewing angle, viewing distance, camera shake, and digital noise. Moreover, even real differences in physical detail might not interest us for the purpose of classification. I was taught in school, that no two snowflakes look alike under a microscope. Fortunately, as a Canadian child, I had already learned how to recognize snowflakes without a microscope, as the similarities are more obvious in bulk. Thus, some means of abstracting image detail is useful in producing stable classification and tracking results. The abstractions are called features, which are said to be extracted from the image data. There should be far fewer features than pixels, though any pixel might influence multiple features. The level of similarity between two images can be evaluated based on distances between the images' corresponding features. For example, distance might be defined in terms of spatial coordinates or color coordinates. Haar-like features are one type of feature that is often applied to real-time face tracking. They were first used f or this purpose by Paul Viola and Michael Jones in 2001. Each Haar-like feature describes the pattern of contrast among adjacent image regions. For example, edges, vertices, and thin lines each generate distinctive features. For any given image, the features may vary depending on the regions' size, which may be called the window size. Two images that differ only in scale should be capable of yielding similar features, albeit for different window sizes. Thus, it is useful to generate features for multiple window sizes. Such a collection of features is called a cascade. We may say a Haar cascade is scale-invariant or, in other words, robust to changes in scale. OpenCV provides a classifier and tracker for scale-invariant Haar cascades, whic h it expects to be in a certain file format. Haar cascades, as implemented in OpenCV, are not robust to changes in rotation. For example, an upside-down face is not considered similar to an upright face and a face viewed in profile is not considered similar to a face viewed from the front. A more complex and more resource-intensive implementation could improve Haar cascades' robustness to rotation by considering multiple transformations of images as well as multiple window sizes. However, we will confine ourselves to the implementation in OpenCV. Getting Haar cascade data As part of your OpenCV setup, you probably have a directory called haarcascades. It contains cascades that are trained for certain subjects using tools that come with OpenCV. The directory's full path depends on your system and method of setting up OpenCV, as follows: Build from source archive:: <unzip_destination>/data/haarcascades Windows with self-extracting ZIP:<unzip_destination>/data/haarcascades Mac with MacPorts:MacPorts: /opt/local/share/OpenCV/haarcascades Mac with Homebrew:The haarcascades file is not included; to get it, download the source archive Ubuntu with apt or Software Center: The haarcascades file is not included; to get it, download the source archive If you cannot find haarcascades, then download the source archive from http://sourceforge.net/projects/opencvlibrary/files/opencv-unix/2.4.3/OpenCV-2.4.3.tar.bz2/download (or the Windows self-extracting ZIP from http://sourceforge.net/projects/opencvlibrary/files/opencvwin/ 2.4.3/OpenCV-2.4.3.exe/download), unzip it, and look for <unzip_destination>/data/haarcascades. Once you find haarcascades, create a directory called cascades in the same folder as cameo.py and copy the following files from haarcascades into cascades: haarcascade_frontalface_alt.xmlhaarcascade_eye.xmlhaarcascade_mcs_nose.xmlhaarcascade_mcs_mouth.xml As their names suggest, these cascades are for tracking faces, eyes, noses, and mouths. They require a frontal, upright view of the subject. With a lot of patience and a powerful computer, you can make your own cascades, trained for various types of objects. Creating modules We should continue to maintain good separation between application-specific code and reusable code. Let's make new modules for tracking classes and their helpers. A file called trackers.py should be created in the same directory as cameo.py (and, equivalently, in the parent directory of cascades ). Let's put the following import statements at the start of trackers.py: import cv2import rectsimport utils Alongside trackers.py and cameo.py, let's make another file called rects.py containing the following import statement: import cv2 Our face tracker and a definition of a face will go in trackers.py, while various helpers will go in rects.py and our preexisting utils.py file.
Read more
  • 0
  • 0
  • 22603

article-image-recurrent-neural-networks-lstm-architecture
Richard Gall
11 Apr 2018
2 min read
Save for later

Recurrent neural networks and the LSTM architecture

Richard Gall
11 Apr 2018
2 min read
A recurrent neural network is a class of artificial neural networks that contain a network like series of nodes, each with a directed or one-way connection to every other node. These nodes can be classified as either input, output, or hidden. Input nodes receive data from outside of the network, hidden nodes modify the input data, and output nodes provide the intended results. RNNs are well known for their extensive usage in NLP tasks. The video tutorial above has been taken from Natural Language Processing with Python. Why are recurrent neural networks well suited for NLP? What makes RNNs so popular and effective for natural language processing tasks is that they operate sequentially over data sets. For example, a movie review is an arbitrary sequence of letters and characters, which the RNN can take as an input. The subsequent hidden and output layers are also capable of working with sequences. In a basic sentiment analysis example, you might just have a binary output - like classifying movie reviews as positive or negative. RNNs can do more than this - they are capable of generating a sequential output, such as taking an input sentence in English and translating it into Spanish. This ability to sequentially process data is what makes recurrent neural networks so well suited for NLP tasks. RNNs and long short-term memory Recurrent neural networks can sometimes become unstable due to the complexity of the connections they are built upon. That's where LSTM architecture helps. LSTM introduces something called a memory cell. The memory cell simplifies what could be incredibly by using a series of different gates to govern the way it changes within the network. The input gate manages inputs The output gates manage outputs Self-recurrent connection that keeps the memory cell in a consistent state between different steps The forget gate simply allows the memory cell to 'forget' its previous state [dropcap]R[/dropcap]ead Next High-level concepts Neural Network Architectures 101: Understanding Perceptrons 4 ways to enable Continual learning into Neural Networks Tutorials Build a generative chatbot using recurrent neural networks (LSTM RNNs) Training RNNs for Time Series Forecasting Implement Long-short Term Memory (LSTM) with TensorFlow How to auto-generate texts from Shakespeare writing using deep recurrent neural networks Research in this area Paper in Two minutes: Attention Is All You Need
Read more
  • 0
  • 0
  • 22547
Modal Close icon
Modal Close icon