Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Data

1229 Articles
article-image-challenge-deep-learning-sustain-current-pace-innovation-ivan-vasilev-machine-learning-engineer
Sugandha Lahoti
13 Dec 2019
8 min read
Save for later

“The challenge in Deep Learning is to sustain the current pace of innovation”, explains Ivan Vasilev, machine learning engineer

Sugandha Lahoti
13 Dec 2019
8 min read
If we talk about recent breakthroughs in the software community, machine learning and deep learning is a major contender - the usage, adoption, and experimentation of deep learning has exponentially increased. Especially in the areas of computer vision, speech, natural language processing and understanding, deep learning has made unprecedented progress. GANs, variational autoencoders and deep reinforcement learning are also creating impressive AI results. To know more about the progress of deep learning, we interviewed Ivan Vasilev, a machine learning engineer and researcher based in Bulgaria. Ivan is also the author of the book Advanced Deep Learning with Python. In this book, he teaches advanced deep learning topics like attention mechanism, meta-learning, graph neural networks, memory augmented neural networks, and more using the Python ecosystem. In this interview, he shares his experiences working on this book, compares TensorFlow and PyTorch, as well as talks about computer vision, NLP, and GANs. On why he chose Computer Vision and NLP as two major focus areas of his book Computer Vision and Natural Language processing are two popular areas where a number of developments are ongoing. In his book, Advanced Deep Learning with Python, Ivan delves deep into these two broad application areas. “One of the reasons I emphasized computer vision and NLP”, he clarifies, “is that these fields have a broad range of real-world commercial applications, which makes them interesting for a large number of people.” The other reason for focusing on Computer Vision, he says “is because of the natural (or human-driven if you wish) progress of deep learning. One of the first modern breakthroughs was in 2012, when a solution based on convolutional network won the ImageNet competition of that year with a large margin compared to any previous algorithms. Thanks in part to this impressive result, the interest in the field was renewed and brought many other advances including solving complex tasks like object detection and new generative models like generative adversarial networks. In parallel, the NLP domain saw its own wave of innovation with things like word vector embeddings and the attention mechanism.” On the ongoing battle between TensorFlow and PyTorch There are two popular machine learning frameworks that are currently at par - TensorFlow and PyTorch (Both had new releases in the past month, TensorFlow 2.0 and PyTorch 1.3). There is an ongoing debate that pitches TensorFlow and PyTorch as rivaling tech and communities. Ivan does not think there is a clear winner between the two libraries and this is why he has included them both in the book. He explains, “On the one hand, it seems that the API of PyTorch is more streamlined and the library is more popular with the academic community. On the other hand, TensorFlow seems to have better cloud support and enterprise features. In any case, developers will only benefit from the competition. For example, PyTorch has demonstrated the importance of eager execution and TensorFlow 2.0 now has much better support for eager execution to the point that it is enabled by default. In the past, TensorFlow had internal competing APIs, whereas now Keras is promoted as its main high-level API. On the other hand, PyTorch 1.3 has introduced experimental support for iOS and Android devices and quantization (computation operations with reduced precision for increased efficiency).” Using Machine Learning in the stock trading process can make markets more efficient Ivan discusses his venture into the field of financial machine learning, being the author of an ML-oriented event-based algorithmic trading library. However, financial machine learning (and stock price prediction in particular) is usually not in the focus of mainstream deep learning research. “One reason”, Ivan states, “is that the field isn’t as appealing as, say, computer vision or NLP. At first glance, it might even appear gimmicky to predict stock prices.” He adds, “Another reason is that quality training data isn’t freely available and can be quite expensive to obtain. Even if you have such data, pre-processing it in an ML-friendly way is not a straightforward process, because the noise-to-signal ratio is a lot higher compared to images or text. Additionally, the data itself could have huge volume.” “However”, he counters, “using ML in finance could have benefits, besides the obvious (getting rich by trading stocks). The participation of ML algorithms in the stock trading process can make the markets more efficient. This efficiency will make it harder for market imbalances to stay unnoticed for long periods of time. Such imbalances will be corrected early, thus preventing painful market corrections, which could otherwise lead to economic recessions.” GANs can be used for nefarious purposes, but that doesn’t warrant discarding them Ivan has also given a special emphasis to Generative adversarial networks in his book. Although extremely useful, in recent times GANs have been used to generate high-dimensional fake data that look very convincing. Many researchers and developers have raised concerns about the negative repercussions of using GANs and wondered if it is even possible to prevent and counter its misuse/abuse. Ivan acknowledges that GANs may have unintended outcomes but that shouldn’t be the sole reason to discard them. He says, “Besides great entertainment value, GANs have some very useful applications and could help us better understand the inner workings of neural networks. But as you mentioned, they can be used for nefarious purposes as well. Still, we shouldn’t discard GANs (or any algorithm with similar purpose) because of this. If only because the bad actors won’t discard them. I think the solution to this problem lies beyond the realm of deep learning. We should strive to educate the public on the possible adverse effects of these algorithms, but also to their benefits. In this way we can raise the awareness of machine learning and spark an honest debate about its role in our society.” Machine learning can have both intentional and unintentional harmful effects Awareness and Ethics go in parallel. Ethics is one of the most important topics to emerge in machine learning and artificial intelligence over the last year. Ivan agrees that the ethics and algorithmic bias in machine learning are of extreme importance. He says, “We can view the potential harmful effects of machine learning as either intentional and unintentional. For example, the bad actors I mentioned when we discussed GANs fall into the intentional category. We can limit their influence by striving to keep the cutting edge of ML research publicly available, thus denying them any unfair advantage of potentially better algorithms. Fortunately, this is largely the case now and hopefully will remain that way in the future. “ “I don't think algorithmic bias is necessarily intentional,'' he says. “Instead, I believe that it is the result of the underlying injustices in our society, which creep into ML through either skewed training datasets or unconscious bias of the researchers. Although the bias might not be intentional, we still have a responsibility to put a conscious effort to eliminate it.” Challenges in the Machine learning ecosystem “The field of ML exploded (in a good sense) a few years ago,'' says Ivan, “thanks to a combination of algorithmic and computer hardware advances. Since then, the researches have introduced new smarter and more elegant deep learning algorithms. But history has shown that AI can generate such a great hype that even the impressive achievements of the last few years could fall short of the expectations of the general public.” “So, in a broader sense, the challenge in front of ML is to sustain the current pace of innovation. In particular, current deep learning algorithms fall short in some key intelligence areas, where humans excel. For example, neural networks have a hard time learning multiple unrelated tasks. They also tend to perform better when working with unstructured data (like images), compared to structured data (like graphs).” “Another issue is that neural networks sometimes struggle to remember long-distance dependencies in sequential data. Solving these problems might require new fundamental breakthroughs, and it’s hard to give an estimation of such one time events. But even at the current level, ML can fundamentally change our society (hopefully for the better). For instance, in the next 5 to 10 years, we can see the widespread introduction of fully autonomous vehicles, which have the potential to transform our lives.” This is just a snapshot of some of the important focus areas in the deep learning ecosystem. You can check out more of Ivan’s work in his book Advanced Deep Learning with Python. In this book you will investigate and train CNN models with GPU accelerated libraries like TensorFlow and PyTorch. You will also apply deep neural networks to state-of-the-art domains like computer vision problems, NLP, GANs, and more. Author Bio Ivan Vasilev started working on the first open source Java Deep Learning library with GPU support in 2013. The library was acquired by a German company, where he continued its development. He has also worked as a machine learning engineer and researcher in the area of medical image classification and segmentation with deep neural networks. Since 2017 he has focused on financial machine learning. He is working on a Python based platform, which provides the infrastructure to rapidly experiment with different ML algorithms for algorithmic trading. You can find him on Linkedin and GitHub. Kaggle’s Rachel Tatman on what to do when applying deep learning is overkill  Brad Miro talks TensorFlow 2.0 features and how Google is using it internally François Chollet, creator of Keras on TensorFlow 2.0 and Keras integration, tricky design decisions in deep learning and more
Read more
  • 0
  • 0
  • 24286

article-image-postgresql-12-beta-1-released
Fatema Patrawala
24 May 2019
6 min read
Save for later

PostgreSQL 12 Beta 1 released

Fatema Patrawala
24 May 2019
6 min read
The PostgreSQL Global Development Group announced yesterday its first beta release of PostgreSQL 12. It is now also available for download. This release contains previews of all features that will be available in the final release of PostgreSQL 12, though some details of the release could also change. PostgreSQL 12 feature highlights Indexing Performance, Functionality, and Management PostgreSQL 12 will improve the overall performance of the standard B-tree indexes with improvements to the space management of these indexes as well. These improvements also provide a reduction of index size for B-tree indexes that are frequently modified, in addition to a performance gain. Additionally, PostgreSQL 12 adds the ability to rebuild indexes concurrently, which lets you perform a REINDEX operation without blocking any writes to the index. This feature should help with lengthy index rebuilds that could cause downtime when managing a PostgreSQL database in a production environment. PostgreSQL 12 extends the abilities of several of the specialized indexing mechanisms. The ability to create covering indexes, i.e. the INCLUDE clause that was introduced in PostgreSQL 11, has now been added to GiST indexes. SP-GiST indexes now support the ability to perform K-nearest neighbor (K-NN) queries for data types that support the distance (<->) operation. The amount of write-ahead log (WAL) overhead generated when creating a GiST, GIN, or SP-GiST index is also significantly reduced in PostgreSQL 12, which provides several benefits to the disk utilization of a PostgreSQL cluster and features such as continuous archiving and streaming replication. Inlined WITH queries (Common table expressions) Common table expressions (or WITH queries) can now be automatically inlined in a query if they: a) are not recursive b) do not have any side-effects c) are only referenced once in a later part of a query This removes an "optimization fence" that has existed since the introduction of the WITH clause in PostgreSQL 8.4 Partitioning PostgreSQL 12 while processing tables with thousands of partitions for operations, it only needs to use a small number of partitions. This release also provides improvements to the performance of both INSERT and COPY into a partitioned table. ATTACH PARTITION can now be performed without blocking concurrent queries on the partitioned table. Additionally, the ability to use foreign keys to reference partitioned tables is now permitted in PostgreSQL 12. JSON path queries per SQL/JSON specification PostgreSQL 12 now allows execution of JSON path queries per the SQL/JSON specification in the SQL:2016 standard. Similar to XPath expressions for XML, JSON path expressions let you evaluate a variety of arithmetic expressions and functions in addition to comparing values within JSON documents. A subset of these expressions can be accelerated with GIN indexes, allowing the execution of highly performant lookups across sets of JSON data. Collations PostgreSQL 12 now supports case-insensitive and accent-insensitive comparisons for ICU provided collations, also known as "nondeterministic collations". When used, these collations can provide convenience for comparisons and sorts, but can also lead to a performance penalty as a collation may need to make additional checks on a string. Most-common Value Extended Statistics CREATE STATISTICS, introduced in PostgreSQL 12 to help collect more complex statistics over multiple columns to improve query planning, now supports most-common value statistics. This leads to improved query plans for distributions that are non-uniform. Generated Columns PostgreSQL 12 allows the creation of generated columns that compute their values with an expression using the contents of other columns. This feature provides stored generated columns, which are computed on inserts and updates and are saved on disk. Virtual generated columns, which are computed only when a column is read as part of a query, are not implemented yet. Pluggable Table Storage Interface PostgreSQL 12 introduces the pluggable table storage interface that allows for the creation and use of different methods for table storage. New access methods can be added to a PostgreSQL cluster using the CREATE ACCESS METHOD command and subsequently added to tables with the new USING clause on CREATE TABLE. A table storage interface can be defined by creating a new table access method. In PostgreSQL 12, the storage interface that is used by default is the heap access method, which is currently is the only built-in method. Page Checksums The pg_verify_checkums command has been renamed to pg_checksums and now supports the ability to enable and disable page checksums across a PostgreSQL cluster that is offline. Previously, page checksums could only be enabled during the initialization of a cluster with initdb. Authentication & Connection Security GSSAPI now supports client-side and server-side encryption and can be specified in the pg_hba.conf file using the hostgssenc and hostnogssencrecord types. PostgreSQL 12 also allows for discovery of LDAP servers based on DNS SRV records if PostgreSQL was compiled with OpenLDAP. Few noted behavior changes in PostgreSQL 12 There are several changes introduced in PostgreSQL 12 that can affect the behavior as well as management of your ongoing operations. A few of these are noted below; for other changes, visit the "Migrating to Version 12" section of the release notes. The recovery.conf configuration file is now merged into the main postgresql.conf file. PostgreSQL will not start if it detects thatrecovery.conf is present. To put PostgreSQL into a non-primary mode, you can use the recovery.signal and the standby.signal files. You can read more about archive recovery here: https://www.postgresql.org/docs/devel/runtime-config-wal.html#RUNTIME-CONFIG-WAL-ARCHIVE-RECOVERY Just-in-Time (JIT) compilation is now enabled by default. OIDs can no longer be added to user created tables using the WITH OIDs clause. Operations on tables that have columns that were created using WITH OIDS (i.e. columns named "OID") will need to be adjusted. Running a SELECT * command on a system table will now also output the OID for the rows in the system table as well, instead of the old behavior which required the OID column to be specified explicitly. Testing for Bugs & Compatibility The stability of each PostgreSQL release greatly depends on the community, to test the upcoming version with the workloads and testing tools in order to find bugs and regressions before the general availability of PostgreSQL 12. As this is a Beta, minor changes to database behaviors, feature details, and APIs are still possible. The PostgreSQL team encourages the community to test the new features of PostgreSQL 12 in their database systems to help eliminate any bugs or other issues that may exist. A list of open issues is publicly available in the PostgreSQL wiki. You can report bugs using this form on the PostgreSQL website: Beta Schedule This is the first beta release of version 12. The PostgreSQL Project will release additional betas as required for testing, followed by one or more release candidates, until the final release in late 2019. For further information please see the Beta Testing page. Many other new features and improvements have been added to PostgreSQL 12. Please see the Release Notes for a complete list of new and changed features. PostgreSQL 12 progress update Building a scalable PostgreSQL solution PostgreSQL security: a quick look at authentication best practices [Tutorial]
Read more
  • 0
  • 0
  • 24274

article-image-k-means-clustering-python
Aaron Lazar
09 Nov 2017
9 min read
Save for later

Implementing K-Means Clustering in Python

Aaron Lazar
09 Nov 2017
9 min read
This article is an adaptation of content from the book Data Science Algorithms in a Week, by David Natingga. I’ve modified it a bit and made turned it into a sequence from a thriller, starring Agents Hobbs and O’Connor, from the FBI. The idea is to practically show you how to implement a k-means cluster in your friendly neighborhood language, Python. Agent Hobbs: Agent… Agent O’Connor… O’Connor! Agent O’Connor: Blimey! Uh.. Ohh.. Sorry, sir! Hobbs: ‘Tat’s abou’ the fifth time oive’ caught you sleeping on duty, young man! O’Connor: Umm. Apologies, sir. I just arrived here, and didn’t have much to… Hobbs: Cut the bull, agent! There’s an important case oime workin’ on and oi’ need information on this righ’ awai’! Here’s the list of missing persons kidnapped so far by the suspects. The suspects now taunt us with open clues abou’ their next target! Based on the information, we’ve narrowed their target list down to Miss. Gibbons and Mr. Hudson. Hobbs throws a file across O’Connor’s desk. Hobbs says as he storms out the door: You ‘ave an hour to find out who needs the special security, so better get working. O’Connor: Yes, sir! Bloody hell, that was close! Here’s the information O’Connor has: He needs to find what the probability is, that the 11th person with a height of 172cm, weight of 60kg, and with long hair is a man. O’Connor gets to work. To simplify matters, he removes the column Hair length as well as the column Gender, since he would like to cluster the people in the table based on their height and weight. To find out whether the 11th person in the table is more likely to be a man or a woman, he uses Clustering: Analysis O’Connor may apply scaling to the initial data, but to simplify the matters, he uses the unscaled data in the algorithm. He clusters the data into the two clusters since there are two possibilities for genders – a male or a female. Then he aims to classify a person with the height 172cm and weight 60kg to be more likely a man if and only if there are more men in that cluster. The clustering algorithm is a very efficient technique. Thus classifying this way is very fast, especially if there are a large number of the features to classify. So he goes on to apply the k-means clustering algorithm to the data he has. First, he picks up the initial centroids. He assumes the first centroid be, for example, a person with the height 180cm and the weight 75kg denoted in a vector as (180,75). Then the point that is furthest away from (180,75) is (155,46). So that will be the second centroid. The points that are closer to the first centroid (180,75) by taking Euclidean distance are (180,75), (174,71), (184,83), (168,63), (178,70), (170,59), (172,60). So these points will be in the first cluster. The points that are closer to the second centroid (155,46) are (155,46), (164,53), (162,52), (166,55). So these points will be in the second cluster. He displays the current situation of these two clusters in the image as below. Clustering of people by their height and weight He then recomputes the centroids of the clusters. The blue cluster with the features (180,75), (174,71), (184,83), (168,63), (178,70), (170,59), (172,60) will have the centroid ((180+174+184+168+178+170+172)/7 (75+71+83+63+70+59+60)/7)~(175.14,68.71). The red cluster with the features (155,46), (164,53), (162,52), (166,55) will have the centroid ((155+164+162+166)/4, (46+53+52+55)/4) = (161.75, 51.5). Reclassifying the points using the new centroid, the classes of the points do not change. The blue cluster will have the points (180,75), (174,71), (184,83), (168,63), (178,70), (170,59), (172,60). The red cluster will have the points (155,46), (164,53), (162,52), (166,55). Therefore the clustering algorithm terminates with clusters as displayed in the following image: Clustering of people by their height and weight Now he classifies the instance (172,60) as to whether it is a male or a female. The instance (172,60) is in the blue cluster. So it is similar to the features in the blue cluster. Are the remaining features in the blue cluster more likely males or females? 5 out of 6 features are males, only 1 is a female. Since the majority of the features are males in the blue cluster and the person (172,60) is in the blue cluster as well, he classifies the person with the height 172cm and the weight 60kg as a male. Implementing K-Means clustering in Python O’Connor implements the k-means clustering algorithm in Python. It takes as an input a CSV file with one data item per line. A data item is converted to a point. The algorithm classifies these points into the specified number of clusters. In the end, the clusters are visualized on the graph using the matplotlib library: # source_code/5/k-means_clustering.py import math import imp import sys import matplotlib.pyplot as plt import matplotlib import sys sys.path.append('../common') import common # noqa matplotlib.style.use('ggplot') # Returns k initial centroids for the given points. def choose_init_centroids(points, k): centroids = [] centroids.append(points[0]) while len(centroids) < k: # Find the centroid that with the greatest possible distance # to the closest already chosen centroid. candidate = points[0] candidate_dist = min_dist(points[0], centroids) for point in points: dist = min_dist(point, centroids) if dist > candidate_dist: candidate = point candidate_dist = dist centroids.append(candidate) return centroids # Returns the distance of a point from the closest point in points. def min_dist(point, points): min_dist = euclidean_dist(point, points[0]) for point2 in points: dist = euclidean_dist(point, point2) if dist < min_dist: min_dist = dist return min_dist # Returns an Euclidean distance of two 2-dimensional points. def euclidean_dist((x1, y1), (x2, y2)): return math.sqrt((x1 - x2) * (x1 - x2) + (y1 - y2) * (y1 - y2)) # PointGroup is a tuple that contains in the first coordinate a 2d point # and in the second coordinate a group which a point is classified to. def choose_centroids(point_groups, k): centroid_xs = [0] * k centroid_ys = [0] * k group_counts = [0] * k for ((x, y), group) in point_groups: centroid_xs[group] += x centroid_ys[group] += y group_counts[group] += 1 centroids = [] for group in range(0, k): centroids.append(( float(centroid_xs[group]) / group_counts[group], float(centroid_ys[group]) / group_counts[group])) return centroids # Returns the number of the centroid which is closest to the point. # This number of the centroid is the number of the group where # the point belongs to. def closest_group(point, centroids): selected_group = 0 selected_dist = euclidean_dist(point, centroids[0]) for i in range(1, len(centroids)): dist = euclidean_dist(point, centroids[i]) if dist < selected_dist: selected_group = i selected_dist = dist return selected_group # Reassigns the groups to the points according to which centroid # a point is closest to. def assign_groups(point_groups, centroids): new_point_groups = [] for (point, group) in point_groups: new_point_groups.append( (point, closest_group(point, centroids))) return new_point_groups # Returns a list of pointgroups given a list of points. def points_to_point_groups(points): point_groups = [] for point in points: point_groups.append((point, 0)) return point_groups # Clusters points into the k groups adding every stage # of the algorithm to the history which is returned. def cluster_with_history(points, k): history = [] centroids = choose_init_centroids(points, k) point_groups = points_to_point_groups(points) while True: point_groups = assign_groups(point_groups, centroids) history.append((point_groups, centroids)) new_centroids = choose_centroids(point_groups, k) done = True for i in range(0, len(centroids)): if centroids[i] != new_centroids[i]: done = False Break if done: return history centroids = new_centroids # Program start csv_file = sys.argv[1] k = int(sys.argv[2]) everything = False # The third argument sys.argv[3] represents the number of the step of the # algorithm starting from 0 to be shown or "last" for displaying the last # step and the number of the steps. if sys.argv[3] == "last": everything = True Else: step = int(sys.argv[3]) data = common.csv_file_to_list(csv_file) points = data_to_points(data) # Represent every data item by a point. history = cluster_with_history(points, k) if everything: print "The total number of steps:", len(history) print "The history of the algorithm:" (point_groups, centroids) = history[len(history) - 1] # Print all the history. print_cluster_history(history) # But display the situation graphically at the last step only. draw(point_groups, centroids) else: (point_groups, centroids) = history[step] print "Data for the step number", step, ":" print point_groups, centroids draw(point_groups, centroids) Input data from gender classification He saves data from the classification into the CSV file: # source_code/5/persons_by_height_and_weight.csv 180,75 174,71 184,83 168,63 178,70 170,59 164,53 155,46 162,52 166,55 172,60 Program output for the classification data O’Connor runs the program implementing k-means clustering algorithm on the data from the classification. The numerical argument 2 means that he would like to cluster the data into 2 clusters: $ python k-means_clustering.py persons_by_height_weight.csv 2 last The total number of steps: 2 The history of the algorithm: Step number 0: point_groups = [((180.0, 75.0), 0), ((174.0, 71.0), 0), ((184.0, 83.0), 0), ((168.0, 63.0), 0), ((178.0, 70.0), 0), ((170.0, 59.0), 0), ((164.0, 53.0), 1), ((155.0, 46.0), 1), ((162.0, 52.0), 1), ((166.0, 55.0), 1), ((172.0, 60.0), 0)] centroids = [(180.0, 75.0), (155.0, 46.0)] Step number 1: point_groups = [((180.0, 75.0), 0), ((174.0, 71.0), 0), ((184.0, 83.0), 0), ((168.0, 63.0), 0), ((178.0, 70.0), 0), ((170.0, 59.0), 0), ((164.0, 53.0), 1), ((155.0, 46.0), 1), ((162.0, 52.0), 1), ((166.0, 55.0), 1), ((172.0, 60.0), 0)] centroids = [(175.14285714285714, 68.71428571428571), (161.75, 51.5)] The program also outputs a graph visible in the 2nd image. The parameter last means that O’Connor would like the program to do the clustering until the last step. If he wants to display only the first step (step 0), he can change last to 0 to run: $ python k-means_clustering.py persons_by_height_weight.csv 2 0 Upon the execution of the program, O’Connor gets the graph of the clusters and their centroids at the initial step, as in image 1. He heaves a sigh of relief. Hobbs returns just then: Oye there O’Connor, not snoozing again now O’are ya? O’Connor: Not at all, sir. I think we need to provide Mr. Hudson with special protection because it looks like he’s the next target. Hobbs raises an eyebrow as he adjusts his gun in it’s holster: Emm, O’are ya sure, agent? O’Connor replies with a smile: 83.33% confident, sir! Hobbs: Wha’ are we waiting for then, eh? Let’s go get em! If you liked reading this mystery, go ahead and buy the book it was inspired by: Data Science Algorithms in a Week, by David Natingga.
Read more
  • 0
  • 1
  • 24066

article-image-four-ibm-facial-recognition-patents-in-2018-we-found-intriguing
Natasha Mathur
11 Aug 2018
10 min read
Save for later

Four IBM facial recognition patents in 2018, we found intriguing

Natasha Mathur
11 Aug 2018
10 min read
The media has gone into a frenzy over Google’s latest facial recognition patent that shows an algorithm can track you across social media and gather your personal details. We thought, we’d dive further into what other patents Google has applied for in facial recognition tehnology in 2018. What we discovered was an eye opener (pun intended). Google is only the 3rd largest applicant with IBM and Samsung leading the patents race in facial recognition. As of 10th Aug, 2018, 1292 patents have been granted in 2018 on Facial recognition. Of those, IBM received 53. Here is the summary comparison of leading companies in facial recognition patents in 2018. Read Also: Top four Amazon patents in 2018 that use machine learning, AR, and robotics IBM has always been at the forefront of innovation. Let’s go back about a quarter of a century, when IBM invented its first general-purpose computer for business. It built complex software programs that helped in launching Apollo missions, putting the first man on the moon. It’s chess playing computer, Deep Blue, back in 1997,  beat Garry Kasparov, in a traditional chess match (the first time a computer beat a world champion). Its researchers are known for winning Nobel Prizes. Coming back to 2018, IBM unveiled the world’s fastest supercomputer with AI capabilities, and beat the Wall Street expectations by making $20 billion in revenue in Q3 2018 last month, with market capitalization worth $132.14 billion as of August 9, 2018. Its patents are a major part of why it continues to be valuable highly. IBM continues to come up with cutting-edge innovations and to protect these proprietary inventions, it applies for patent grants. United States is the largest consumer market in the world, so patenting the technologies that the companies come out with is a standard way to attain competitive advantage. As per the United States Patent and Trademark Office (USPTO), Patent is an exclusive right to invention and “the right to exclude others from making, using, offering for sale, or selling the invention in the United States or “importing” the invention into the United States”. As always, IBM has applied for patents for a wide spectrum of technologies this year from Artificial Intelligence, Cloud, Blockchain, Cybersecurity, to Quantum Computing. Today we focus on IBM’s patents in facial recognition field in 2018. Four IBM facial recognition innovations patented in 2018 Facial recognition is a technology which identifies and verifies a person from a digital image or a video frame from a video source and IBM seems quite invested in it. Controlling privacy in a face recognition application Date of patent: January 2, 2018 Filed: December 15, 2015 Features: IBM has patented for a face-recognition application titled “Controlling privacy in a face recognition application”. Face recognition technologies can be used on mobile phones and wearable devices which may hamper the user privacy. This happens when a "sensor" mobile user identifies a "target" mobile user without his or her consent. The present mobile device manufacturers don’t provide the privacy mechanisms for addressing this issue. This is the major reason why IBM has patented this technology. Editor’s Note: This looks like an answer to the concerns raised over Google’s recent social media profiling facial recognition patent.   How it works? Controlling privacy in a face recognition application It consists of a privacy control system, which is implemented using a cloud computing node. The system uses a camera to find out information about the people, by using a face recognition service deployed in the cloud. As per the patent application “the face recognition service may have access to a face database, privacy database, and a profile database”. Controlling privacy in a face recognition application The facial database consists of one or more facial signatures of one or more users. The privacy database includes privacy preferences of target users. Privacy preferences will be provided by the target user and stored in the privacy database.The profile database contains information about the target user such as name, age, gender, and location. It works by receiving an input which includes a face recognition query and a digital image of a face. The privacy control system then detects a facial signature from the digital image. The target user associated with the facial signature is identified, and profile of the target user is extracted. It then checks the privacy preferences of the user. If there are no privacy preferences set, then it transmits the profile to the sensor user. But, if there are privacy preferences then the censored profile of the user is generated omitting out the private elements in the profile. There are no announcements, as for now, regarding when this technology will hit the market. Evaluating an impact of a user's content utilized in a social network Date of patent: January 30, 2018 Filed: April 11, 2015 Features:  IBM has patented for an application titled “Evaluating an impact of a user's content utilized in a social network”.  With so much data floating around on social network websites, it is quite common for the content of a document (e.g., e-mail message, a post, a word processing document, a presentation) to be reused, without the knowledge of an original author. Evaluating an impact of a user's content utilised in a social network Evaluating an impact of a user's content utilized in a social network Because of this, the original author of the content may not receive any credit, which creates less motivation for the users to post their original content in a social network. This is why IBM has decided to patent for this application. Evaluating an impact of a user's content utilized in a social network As per the patent application, the method/system/product  “comprises detecting content in a document posted on a social network environment being reused by a second user. The method further comprises identifying an author of the content. The method additionally comprises incrementing a first counter keeping track of a number of times the content has been adopted in derivative works”. There’s a processor, which generates an “impact score” which  represents the author's ability to influence other users to adopt the content. This is based on the number of times the content has been adopted in the derivative works. Also, “the method comprises providing social credit to the author of the content using the impact score”. Editor’s Note: This is particularly interesting to us as IBM, unlike other tech giants, doesn’t own a popular social network or media product. (Google has Google+, Microsoft has LinkedIn, Facebook and Twitter are social, even Amazon has stakes in a media entity in the form of Washington Post). No information is present about when or if this system will be used among social network sites. Spoof detection for facial recognition Date of patent: February 20, 2018 Filed: December 10, 2015 Features: IBM patented an application named “Spoof detection for facial recognition”.  It provides a method to determine whether the image is authentic or not. As per the patent “A facial recognition system is a computer application for automatically identifying or verifying a person from a digital image or a video frame from a video source.” Editor’s Note: This seems to have a direct impact on the work around tackling deepFakes, which incidentally is something DARPA is very keen on. Could IBM be vying for a long term contract with the government? How it works? The patent consists of a system that helps detect “if a face in a facial recognition authentication system is a three-dimensional structure based on multiple selected images from the input video”.                                      Spoof detection for facial recognition There are four or more two-dimensional feature points which are located via an image processing device connected to the camera. Here the two-dimensional feature points do not lie on the same two-dimensional plane. The patent reads that “one or more additional images of the user's face can be received with the camera; and, the at least four two-dimensional feature points can be located on each additional image with the image processor. The image processor can identify displacements between the two-dimensional feature points on the additional image and the two-dimensional feature points on the first image for each additional image” Spoof detection for facial recognition There is also a processor connected to the image processing device that helps figure out whether the displacements conform to a three-dimensional surface model. The processor can then determine whether to authenticate the user depending on whether the displacements conform to the three-dimensional surface model. Facial feature location using symmetry line Date of patent: June 5, 2018 Filed: July 20, 2015 Features: IBM patented for an application titled “Facial feature location using symmetry line”. As per the patent, “In many image processing applications, identifying facial features of the subject may be desired. Currently, location of facial features require a search in four dimensions using local templates that match the target features. Such a search tends to be complex and prone to errors because it has to locate both (x, y) coordinates, scale parameter and rotation parameter”. Facial feature location using symmetry line Facial feature location using symmetry line The application consists of a computer-implemented method that obtains an image of the subject’s face. After that it automatically detects a symmetry line of the face in the image, where the symmetry line intersects at least a mouth region of the face. It then automatically locates a facial feature of the face using the symmetry line. There’s also a computerised apparatus with a processor which performs the steps of obtaining an image of a subject’s face and helps locate the facial feature.  Editor’s note: Atleast, this patent makes direct sense to us. IBM is majorly focusing on bring AI to healthcare. A patent like this can find a lot of use in not just diagnostics and patient care, but also in cutting edge areas like robotics enabled surgeries. IBM is continually working on new technologies to provide the world with groundbreaking innovations. Its big investments in facial recognition technology speaks volumes about how IBM is well-versed with its endless possibilities. With the facial recognition technological progress,  come the privacy fears. But, IBM’s facial recognition application patent has got it covered as it lets the users set privacy preferences. This can be a great benchmark for IBM as no many existing applications are currently doing it. The social credit score evaluating app can really help bring the voice back to the users interested in posting content on social media platforms. The spoof detection application will help maintain authenticity by detecting forged images. Lastly, the facial feature detection can act as a great additional feature for image processing applications. IBM has been heavily investing in facial recognition technology. There are no guarantees by IBM as to whether these patents will ever make it to practical applications, but it does say a lot about how IBM thinks about the technology. Four interesting Amazon patents in 2018 that use machine learning, AR, and robotics Facebook patents its news feed filter tool to provide more relevant news to its users Google’s new facial recognition patent uses your social network to identify you!  
Read more
  • 0
  • 0
  • 24018

article-image-amazon-remars-day-1-kicks-off-showcasing-amazons-next-gen-ai-robots-spot-the-robo-dog-and-a-guest-appearance-from-iron-man
Savia Lobo
06 Jun 2019
11 min read
Save for later

Amazon re:MARS Day 1 kicks off showcasing Amazon’s next-gen AI robots; Spot, the robo-dog and a guest appearance from ‘Iron Man’

Savia Lobo
06 Jun 2019
11 min read
Amazon’s inaugural re:MARS event kicked off on Tuesday, June 4 at the Aria in Las Vegas. This 4-day event is inspired by MARS, a yearly invite-only event hosted by Jeff Bezos that brings together innovative minds in Machine learning, Automation, Robotics, and Space to share new ideas across these rapidly advancing domains. re:MARS featured a lot of announcements revealing a range of robots each engineered for a different purpose. Some of them include helicopter drones for delivery, two robot dogs by Boston Dynamics, Autonomous human-like acrobats by Walt Disney Imagineering, and much more. Amazon also revealed Alexa’s new Dialog Modeling for Natural, Cross-Skill Conversations. Let us have a brief look at each of the announcements. Robert Downey Jr. announces ‘The Footprint Coalition’ project to clean up the environment using Robotics Popularly known as the “Iron Man”, Robert Downey Jr.’s visit was one of the exciting moments where he announced a new project called The Footprint Coalition to clean up the planet using advanced technologies at re:MARS. “Between robotics and nanotechnology we could probably clean up the planet significantly, if not entirely, within a decade,” he said. According to The Forbes, “Amazon did not immediately respond to questions about whether it was investing financially or technologically in Downey Jr.’s project.” “At this point, the effort is severely light on details, with only a bare-bones website to accompany Downey’s public statement, but the actor said he plans to officially launch the project by April 2020,” Forbes reports. A recent United Nations report found that humans are having an unprecedented and devastating effect on global biodiversity, and researchers have found microplastics polluting the air, ocean, and soil. The announcement of this project has been opened to the public because the “company itself is under fire for its policies around the environment and climate change”. Additionally, Morgan Pope and Tony Dohi of Walt Disney Imagineering, also demonstrated their work to create autonomous acrobats. https://twitter.com/jillianiles/status/1136082571081555968 https://twitter.com/thesullivan/status/1136080570549563393 Amazon will soon deliver orders using drones On Wednesday, Amazon unveiled a revolutionary new drone that will test deliver toothpaste and other household goods starting within months. This drone is “part helicopter and part science-fiction aircraft” with built-in AI features and sensors that will help it fly robotically without threatening traditional aircraft or people on the ground. Gur Kimchi, vice president of Amazon Prime Air, said in an interview to Bloomberg, “We have a design that is amazing. It has performance that we think is just incredible. We think the autonomy system makes the aircraft independently safe.” However, he refused to provide details on where the delivery tests will be conducted. Also, the drones have received a year’s approval from the FAA to test the devices in limited ways that still won't allow deliveries. According to a Bloomberg report, “It can take years for traditional aircraft manufacturers to get U.S. Federal Aviation Administration approval for new designs and the agency is still developing regulations to allow drone flights over populated areas and to address national security concerns. The new drone presents even more challenges for regulators because there aren’t standards yet for its robotic features”. Competitors to Amazon’s unnamed drone include Alphabet Inc.’s Wing, which became the first drone to win an FAA approval to operate as a small airline, in April. Also, United Parcel Service Inc. and drone startup Matternet Inc. began using drones to move medical samples between hospitals in Raleigh, North Carolina, in March. Amazon’s drone is about six feet across with six propellers that lift it vertically off the ground. It is surrounded by a six-sided shroud that will protect people from the propellers, and also serves as a high-efficiency wing such that it can fly more horizontally like a plane. Once it gets off the ground, the craft tilts and flies sideways -- the helicopter blades becoming more like airplane propellers. Kimchi said, “Amazon’s business model for the device is to make deliveries within 7.5 miles (12 kilometers) from a company warehouse and to reach customers within 30 minutes. It can carry packages weighing as much as five pounds. More than 80% of packages sold by the retail behemoth are within that weight limit.” According to the company, one of the things the drone has mastered is detecting utility wires and clotheslines. They have been notoriously difficult to identify reliably and pose a hazard for a device attempting to make deliveries in urban and suburban areas. To know more about these high-tech drones in detail, head over to Amazon’s official blogpost. Boston Dynamics’ first commercial robot, Spot Boston Dynamics revealed its first commercial product, a quadrupedal robot named Spot.  Boston Dynamics’ CEO Marc Raibert told The Verge, “Spot is currently being tested in a number of “proof-of-concept” environments, including package delivery and surveying work.” He also said that although there’s no firm launch date for the commercial version of Spot, it should be available within months, certainly before the end of the year. “We’re just doing some final tweaks to the design. We’ve been testing them relentlessly”, Raibert said. These Spot robots are capable of navigating environments autonomously, but only when their surroundings have been mapped in advance. They can withstand kicks and shoves and keep their balance on tricky terrain, but they don’t decide for themselves where to walk. These robots are simple to control; using a D-pad, users can steer the robot as just like an RC car or mechanical toy. A quick tap on the video feed streamed live from the robot’s front-facing camera allows to select a destination for it to walk to, and another tap lets the user assume control of a robot arm mounted on top of the chassis. With 3D cameras mounted atop, a Spot robot can map environments like construction sites, identifying hazards and work progress. It also has a robot arm which gives it greater flexibility and helps it open doors and manipulate objects. https://twitter.com/jjvincent/status/1136096290016595968 The commercial version will be “much less expensive than prototypes [and] we think they’ll be less expensive than other peoples’ quadrupeds”, Raibert said. Here’s a demo video of the Spot robot at the re:MARS event. https://youtu.be/xy_XrAxS3ro Alexa gets new dialog modeling for improved natural, cross-skill conversations Amazon unveiled new features in Alexa that would help the conversational agent to answer more complex questions and carry out more complex tasks. Rohit Prasad, Alexa vice president and head scientist, said, “We envision a world where customers will converse more naturally with Alexa: seamlessly transitioning between skills, asking questions, making choices, and speaking the same way they would with a friend, family member, or co-worker. Our objective is to shift the cognitive burden from the customer to Alexa.” This new update to Alexa is a set of AI modules that work together to generate responses to customers’ questions and requests. With every round of dialog, the system produces a vector — a fixed-length string of numbers — that represents the context and the semantic content of the conversation. “With this new approach, Alexa will predict a customer’s latent goal from the direction of the dialog and proactively enable the conversation flow across topics and skills,” Prasad says. “This is a big leap for conversational AI.” At re:MARS, Prasad also announced the developer preview of Alexa Conversations, a new deep learning-based approach for skill developers to create more-natural voice experiences with less effort, fewer lines of code, and less training data than before. The preview allows skill developers to create natural, flexible dialogs within a single skill; upcoming releases will allow developers to incorporate multiple skills into a single conversation. With Alexa Conversations, developers provide: (1) application programming interfaces, or APIs, that provide access to their skills’ functionality; (2) a list of entities that the APIs can take as inputs, such as restaurant names or movie times;  (3) a handful of sample dialogs annotated to identify entities and actions and mapped to API calls. Alexa Conversations’ AI technology handles the rest. “It’s way easier to build a complex voice experience with Alexa Conversations due to its underlying deep-learning-based dialog modeling,” Prasad said. To know more about this announcement in detail, head over to Alexa’s official blogpost. Amazon Robotics unveiled two new robots at its fulfillment centers Brad Porter, vice president of robotics at Amazon, announced two new robots, one is, code-named Pegasus and the other one, Xanthus. Pegasus, which is built to sort packages, is a 3-foot-wide robot equipped with a conveyor belt on top to drop the right box in the right location. “We sort billions of packages a year. The challenge in package sortation is, how do you do it quickly and accurately? In a world of Prime one-day [delivery], accuracy is super-important. If you drop a package off a conveyor, lose track of it for a few hours  — or worse, you mis-sort it to the wrong destination, or even worse, if you drop it and damage the package and the inventory inside — we can’t make that customer promise anymore”, Porter said. Porter said Pegasus robots have already driven a total of 2 million miles, and have reduced the number of wrongly sorted packages by 50 percent. Porter said the Xanthus, represents the latest incarnation of Amazon’s drive robot. Amazon uses tens of thousands of the current-generation robot, known as Hercules, in its fulfillment centers. Amazon unveiled Xanthus Sort Bot and Xanthus Tote Mover. “The Xanthus family of drives brings innovative design, enabling engineers to develop a portfolio of operational solutions, all of the same hardware base through the addition of new functional attachments. We believe that adding robotics and new technologies to our operations network will continue to improve the associate and customer experience,” Porter says. To know more about these new robots watch the video below: https://youtu.be/4MH7LSLK8Dk StyleSnap: An AI-powered shopping Amazon announced StyleSnap, a recent move to promote AI-powered shopping. StyleSnap helps users pick out clothes and accessories. All they need to do is upload a photo or screenshot of what they are looking for, when they are unable to describe what they want. https://twitter.com/amazonnews/status/1136340356964999168 Amazon said, "You are not a poet. You struggle to find the right words to explain the shape of a neckline, or the spacing of a polka dot pattern, and when you attempt your text-based search, the results are far from the trend you were after." To use StyleSnap, just open the Amazon app, click the camera icon in the upper right-hand corner, select the StyleSnap option, and then upload an image of the outfit. Post this, StyleSnap provides recommendations of similar outfits on Amazon to purchase, with users able to filter across brand, pricing, and reviews. Amazon's AI system can identify colors and edges, and then patterns like floral and denim. Using this information, its algorithm can then accurately pick a matching style. To know more about StyleSnap in detail, head over to Amazon’s official blog post. Amazon Go trains cashierless store algorithms using synthetic data Amazon at the re:MARS shared more details about Amazon Go, the company’s brand for its cashierless stores. They said Amazon Go uses synthetic data to intentionally introduce errors to its computer vision system. Challenges that had to be addressed before opening stores to avoid queues include the need to make vision systems that account for sunlight streaming into a store, little time for latency delays, and small amounts of data for certain tasks. Synthetic data is being used in a number of ways to power few-shot learning, improve AI systems that control robots, train AI agents to walk, or beat humans in games of Quake III. Dilip Kumar, VP of Amazon Go, said, “As our application improved in accuracy — and we have a very highly accurate application today — we had this interesting problem that there were very few negative examples, or errors, which we could use to train our machine learning models.” He further added, “So we created synthetic datasets for one of our challenging conditions, which allowed us to be able to boost the diversity of the data that we needed. But at the same time, we have to be careful that we weren’t introducing artifacts that were only visible in the synthetic data sets, [and] that the data translates well to real-world situations — a tricky balance.” To know more about this news in detail, check out this video: https://youtu.be/jthXoS51hHA The Amazon re:MARS event is still ongoing and will have many more updates. To catch live updates from Vegas visit Amazon’s blog. World’s first touch-transmitting telerobotic hand debuts at Amazon re:MARS tech showcase Amazon introduces S3 batch operations to process millions of S3 objects Amazon Managed Streaming for Apache Kafka (Amazon MSK) is now generally available
Read more
  • 0
  • 0
  • 23940

article-image-neural-network-architectures-101-understanding-perceptrons
Kunal Chaudhari
12 Feb 2018
9 min read
Save for later

Neural Network Architectures 101: Understanding Perceptrons

Kunal Chaudhari
12 Feb 2018
9 min read
[box type="note" align="" class="" width=""]This article is an excerpt taken from a book Neural Network Programming with Java Second Edition written by Fabio M. Soares and Alan M. F. Souza. This book is for Java developers who want to master developing smarter applications like weather forecasting, pattern recognition etc using neural networks. [/box] In this article we will discuss about perceptrons along with their features, applications and limitations. Perceptrons are a very popular neural network architecture that implements supervised learning. Projected by Frank Rosenblatt in 1957, it has just one layer of neurons, receiving a set of inputs and producing another set of outputs. This was one of the first representations of neural networks to gain attention, especially because of their simplicity. In our Java implementation, this is illustrated with one neural layer (the output layer). The following code creates a perceptron with three inputs and two outputs, having the linear function at the output layer: int numberOfInputs=3; int numberOfOutputs=2; Linear outputAcFnc = new Linear(1.0); NeuralNet perceptron = new NeuralNet(numberOfInputs,numberOfOutputs,             outputAcFnc); Applications and limitations However, scientists did not take long to conclude that a perceptron neural network could only be applied to simple tasks, according to that simplicity. At that time, neural networks were being used for simple classification problems, but perceptrons usually failed when faced with more complex datasets. Let's illustrate this with a very basic example (an AND function) to understand better this issue. Linear separation The example consists of an AND function that takes two inputs, x1 and x2. That function can be plotted in a two-dimensional chart as follows: And now let's examine how the neural network evolves the training using the perceptron rule, considering a pair of two weights, w1 and w2, initially 0.5, and bias valued 0.5 as well. Assume learning rate η equals 0.2: Epoch x1 x2 w1 w2 b y t E Δw1 Δw2 Δb 1 0 0 0.5 0.5 0.5 0.5 0 -0.5 0 0 -0.1 1 0 1 0.5 0.5 0.4 0.9 0 -0.9 0 -0.18 -0.18 1 1 0 0.5 0.32 0.22 0.72 0 -0.72 -0.144 0 -0.144 1 1 1 0.356 0.32 0.076 0.752 1 0.248 0.0496 0.0496 0.0496 2 0 0 0.406 0.370 0.126 0.126 0 -0.126 0.000 0.000 -0.025 2 0 1 0.406 0.370 0.100 0.470 0 -0.470 0.000 -0.094 -0.094 2 1 0 0.406 0.276 0.006 0.412 0 -0.412 -0.082 0.000 -0.082 2 1 1 0.323 0.276 -0.076 0.523 1 0.477 0.095 0.095 0.095 … … 89 0 0 0.625 0.562 -0.312 -0.312 0 0.312 0 0 0.062 89 0 1 0.625 0.562 -0.25 0.313 0 -0.313 0 -0.063 -0.063 89 1 0 0.625 0.500 -0.312 0.313 0 -0.313 -0.063 0 -0.063 89 1 1 0.562 0.500 -0.375 0.687 1 0.313 0.063 0.063 0.063 After 89 epochs, we find the network to produce values near to the desired output. Since in this example the outputs are binary (zero or one), we can assume that any value produced by the network that is below 0.5 is considered to be 0 and any value above 0.5 is considered to be 1. So, we can draw a function , with the final weights and bias found by the learning algorithm w1=0.562, w2=0.5 and b=-0.375, defining the linear boundary in the chart: This boundary is a definition of all classifications given by the network. You can see that the boundary is linear, given that the function is also linear. Thus, the perceptron network is really suitable for problems whose patterns are linearly separable. The XOR case Now let's analyze the XOR case: We see that in two dimensions, it is impossible to draw a line to separate the two patterns. What would happen if we tried to train a single layer perceptron to learn this function? Suppose we tried, let's see what happened in the following table: Epoch x1 x2 w1 w2 b y t E Δw1 Δw2 Δb 1 0 0 0.5 0.5 0.5 0.5 0 -0.5 0 0 -0.1 1 0 1 0.5 0.5 0.4 0.9 1 0.1 0 0.02 0.02 1 1 0 0.5 0.52 0.42 0.92 1 0.08 0.016 0 0.016 1 1 1 0.516 0.52 0.436 1.472 0 -1.472 -0.294 -0.294 -0.294 2 0 0 0.222 0.226 0.142 0.142 0 -0.142 0.000 0.000 -0.028 2 0 1 0.222 0.226 0.113 0.339 1 0.661 0.000 0.132 0.132 2 1 0 0.222 0.358 0.246 0.467 1 0.533 0.107 0.000 0.107 2 1 1 0.328 0.358 0.352 1.038 0 -1.038 -0.208 -0.208 -0.208 … … 127 0 0 -0.250 -0.125 0.625 0.625 0 -0.625 0.000 0.000 -0.125 127 0 1 -0.250 -0.125 0.500 0.375 1 0.625 0.000 0.125 0.125 127 1 0 -0.250 0.000 0.625 0.375 1 0.625 0.125 0.000 0.125 127 1 1 -0.125 0.000 0.750 0.625 0 -0.625 -0.125 -0.125 -0.125 The perceptron just could not find any pair of weights that would drive the following error 0.625. This can be explained mathematically as we already perceived from the chart that this function cannot be linearly separable in two dimensions. So what if we add another dimension? Let's see the chart in three dimensions: In three dimensions, it is possible to draw a plane that would separate the patterns, provided that this additional dimension could properly transform the input data. Okay, but now there is an additional problem: how could we derive this additional dimension since we have only two input variables? One obvious, but also workaround, answer would be adding a third variable as a derivation from the two original ones. And being this third variable a (derivation), our neural network would probably get the following shape: Okay, now the perceptron has three inputs, one of them being a composition of the other. This also leads to a new question: how should that composition be processed? We can see that this component could act as a neuron, so giving the neural network a nested architecture. If so, there would another new question: how would the weights of this new neuron be trained, since the error is on the output neuron? Multi-layer perceptrons As we can see, one simple example in which the patterns are not linearly separable has led us to more and more issue using the perceptron architecture. That need led to the application of multilayer perceptrons. The fact that the natural neural network is structured in layers as well, and each layer captures pieces of information from a specific environment is already established. In artificial neural networks, layers of neurons act in this way, by extracting and abstracting information from data, transforming them into another dimension or shape. In the XOR example, we found the solution to be the addition of a third component that would make possible a linear separation. But there remained a few questions regarding how that third component would be computed. Now let's consider the same solution as a two-layer perceptron: Now we have three neurons instead of just one, but in the output the information transferred by the previous layer is transformed into another dimension or shape, whereby it would be theoretically possible to establish a linear boundary on those data points. However, the question on finding the weights for the first layer remains unanswered, or can we apply the same training rule to neurons other than the output? We are going to deal with this issue in the Generalized delta rule section. MLP properties Multi-layer perceptrons can have any number of layers and also any number of neurons in each layer. The activation functions may be different on any layer. An MLP network is usually composed of at least two layers, one for the output and one hidden layer. There are also some references that consider the input layer as the nodes that collect input data; therefore, for those cases, the MLP is considered to have at least three layers. For the purpose of this article, let's consider the input layer as a special type of layer which has no weights, and as the effective layers, that is, those enabled to be trained, we'll consider the hidden and output layers. A hidden layer is called that because it actually hides its outputs from the external world. Hidden layers can be connected in series in any number, thus forming a deep neural network. However, the more layers a neural network has, the slower would be both training and running, and according to mathematical foundations, a neural network with one or two hidden layers at most may learn as well as deep neural networks with dozens of hidden layers. But it depends on several factors. MLP weights In an MLP feedforward network, one particular neuron i receives data from a neuron j of the previous layer and forwards its output to a neuron k of the next layer: The mathematical description of a neural network is recursive: Here, yo is the network output (should we have multiple outputs, we can replace yo with Y, representing a vector); fo is the activation function of the output; l is the number of hidden layers; nhi is the number of neurons in the hidden layer i; wi is the weight connecting the i th neuron of the last hidden layer to the output; fi is the activation function of the neuron i; and bi is the bias of the neuron i. It can be seen that this equation gets larger as the number of layers increases. In the last summing operation, there will be the inputs xi. Recurrent MLP The neurons on an MLP may feed signals not only to neurons in the next layers (feedforward network), but also to neurons in the same or previous layers (feedback or recurrent). This behavior allows the neural network to maintain state on some data sequence, and this feature is especially exploited when dealing with time series or handwriting recognition. Recurrent networks are usually harder to train, and eventually the computer may run out of memory while executing them. In addition, there are recurrent network architectures better than MLPs, such as Elman, Hopfield, Echo state, Bidirectional RNNs (recurrent neural networks). But we are not going to dive deep into these architectures. Coding an MLP Bringing these concepts into the OOP point of view, we can review the classes already designed so far: One can see that the neural network structure is hierarchical. A neural network is composed of layers that are composed of neurons. In the MLP architecture, there are three types of layers: input, hidden, and output. So suppose that in Java, we would like to define a neural network consisting of three inputs, one output (linear activation function) and one hidden layer (sigmoid function) containing five neurons. The resulting code would be as follows: int numberOfInputs=3; int numberOfOutputs=1; int[] numberOfHiddenNeurons={5};     Linear outputAcFnc = new Linear(1.0); Sigmoid hiddenAcFnc = new Sigmoid(1.0); NeuralNet neuralnet = new NeuralNet(numberOfInputs, numberOfOutputs, numberOfHiddenNeurons, hiddenAcFnc, outputAcFnc); To summarize, we saw how perceptrons can be applied to solve linear separation problems, their limitations in classifying nonlinear data and how to suppress those limitations with multi-layer perceptrons (MLPs). If you enjoyed this excerpt, check out the book Neural Network Programming with Java Second Edition for a better understanding of neural networks and how they fit in different real-world projects.
Read more
  • 0
  • 0
  • 23853
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-brad-miro-talks-tensorflow-2-0-features-and-how-google-is-using-it-internally
Sugandha Lahoti
10 Dec 2019
6 min read
Save for later

Brad Miro talks TensorFlow 2.0 features and how Google is using it internally

Sugandha Lahoti
10 Dec 2019
6 min read
TensorFlow 2.0, released in October, has got developers excited about a myriad of features and its ease of use.  At the EuroPython Conference 2019, Brad Miro, developer programs engineer at Google talked about the updates being made to TensorFlow 2.0. He also gave an overview of how Google is using TensorFlow, moving on to why Python is important for TensorFlow development and how to migrate from TF 1.x to TF 2.0. EuroPython is one of the most popular Python programming language community conferences. Below are some highlights from Brad’s talk at EuroPython. What is TensorFlow? TensorFlow, an open-source deep learning library developed at Google, first released in 2015. It’s a Python framework that includes a number of utilities for helping you write deep neural networks supporting both GPUs and TPUs. A lot of deep learning involves using mathematics, statistics, and algebra and perform low-level optimizations with your system. TensorFlow removes a lot of those abstractions leaving you to focus on actually writing your model. How TensorFlow is used internally at Google Tensorflow is used internally at Google to power all of its machine learning and AI. Google’s data centers are powered using AI and TensorFlow to help optimize the usage of these data centers to reduce bandwidth, to ensure network connections are optimized, and to reduce power consumption. TensorFlow also is useful for performing global localization in Google Maps. It is also used heavily in the Google Pixel range of smartphones to help optimize the software. These technologies are also used in medical research specifically in the field of Computer Vision. For example, Tensorflow is used to distinguish between the retinal image of a healthy eye from the retinal image of an eye that has diabetic retinopathy.   Further Learning If you want to learn to build more computer vision applications with TensorFlow 2.0, check out the book Hands-On Computer Vision with TensorFlow 2 by Benjamin Planche, and Eliot Andres. This book by Packt Publishing is a practical guide to building high-performance systems for object detection, segmentation, video processing, smartphone applications, and more. By the end of the book, you will have both the theoretical understanding and practical skills to solve advanced computer vision problems with TensorFlow 2.0. Furthermore, Google is using AI and TensorFlow to predict whether or not objects in space are planets. To summarize, they use AI to predict whether or not fluctuations in the brightness of an object is due to it being a planet.   Why Python is so important for TensorFlow Python has always been the choice for TensorFlow due to the language being extremely easy to use and having a rich ecosystem for data science including tools such as numpy, scikit-learn, and pandas. When TensorFlow was being built, the idea was that it should have the simplicity of numpy, performance of C but ease of use of Python.  What does TensorFlow 2.0 bring to the table TensorFlow 2.0 is powerful, flexible, scalable and easily deployable.  What’s gone Session.run tf.control_dependencies tf.global_variables_initializer tf.cond, tf.while_loop Tf.contrib What’s new Eager execution enabled by default  tf.function Keras as main high level API Distribution Strategy API SavedModel API  TensorFlow 2.0 had a major API Cleanup. Many API symbols are removed or renamed for better consistency and clarity.  Session.run has been replaced with eager execution which effectively means that your tensorflow code runs like numpy code.  Eager execution enables fast iteration and intuitive debugging without building a graph. It also makes creating and experimenting with models using TensorFlow easier. It can be especially useful when using the tf.keras model subclassing API. TensorFlow 2.0 has tf.function, a python decorator that lets you run regular Python code which is later compiled down to TensorFlow code using AutoGraph. The Distribution Strategy API in TensorFlow 2.0 allows machine learning researchers to distribute training across a wide variety of compute configurations. This release also allows distributed training with Keras’ model.fit and custom training loops. Keras is introduced as the main high-level API. Keras is a popular high-level API used for easy and fast prototyping, building, and training of deep learning models. This will enable developers to easily leverage their various model-building APIs. Using Keras with TensorFlow has two main methods.  Symbolic (Keras sequential) Your model is a graph of layers Any graph you compile will run  TensorFlow helps you debug by catching errors at compile time Imperative method (Keras subclassing) Your model is Python bytecode Complete flexibility and control  Harder to debug/ Harder to maintain  There are pros and cons of using each method; it really just depends on what your specific use cases are. The SavedModel API allows you to save your trained ML model into a language-neutral format. With TensorFlow 2.0, all TensorFlow ecosystem projects including TensorFlow Lite, TensorFlow JS, TensorFlow Serving, and TensorFlow Hub, support SavedModels. On Tensorflow Hub, you can store and download pre-built models. You can use TensorFlow Extended which is a Python library that can be run on your servers to productionalize your models. TensorFlow Lite lets you run your TensorFlow models on edge devices. With TensorFlow.js, you can run machine learning models using javascript in the browser or run them on servers using node. TensorFlow also has Swift for TensorFlow to help developers use Swift to develop machine learning models. “Swift for TensorFlow provides a new programming model that combines the performance of graphs with the flexibility and expressivity of Eager execution, with a strong focus on improved usability at every level of the stack. This is not just a TensorFlow API wrapper written in Swift — we added compiler and language enhancements to Swift to provide a first-class user experience for machine learning developers.”  Other packages that exist in the TensorFlow ecosystem used for niche use cases are TF Probability, TF Agents (reinforcement learning), Tensor2Tensor, TF Ranking, TF Text (natural language processing), TF Federated, TF privacy and more.  How to upgrade from TensorFlow 1.x to TensorFlow 2.0 There are several migration guides available on TensorFlow’s website. You can also use the tf.compat.v1 library for backwards compatibility and the tf_upgrade_v2 script which you can execute on top of any Python script to convert TF 1.x code to 2.0 code. You can also read more about TF 2.0 migration in our book Hands-On Computer Vision with TensorFlow 2 which introduces the automatic migration tool  and compares TensorFlow 1 concepts with their TensorFlow 2 counterparts with a detailed guide on migrating to idiomatic TensorFlow 2 code. You can watch Brad’s full talk on YouTube. This video is licensed under the CC BY-NC-SA 3.0 license.  TensorFlow.js contributor Kai Sasaki on how TensorFlow.js eases web-based machine learning application development Introducing Spleeter, a Tensorflow based python library that extracts voice and sound from any music track. TensorFlow 2.0 released with tighter Keras integration, eager execution enabled by default, and more!
Read more
  • 0
  • 0
  • 23677

article-image-18-striking-ai-trends-2018-part-1
Sugandha Lahoti
27 Dec 2017
14 min read
Save for later

18 striking AI Trends to watch in 2018 - Part 1

Sugandha Lahoti
27 Dec 2017
14 min read
Artificial Intelligence is the talk of the town. It has evolved past merely being a buzzword in 2016, to be used in a more practical manner in 2017. As 2018 rolls out, we will gradually notice AI transitioning into a necessity. We have prepared a detailed report, on what we can expect from AI in the upcoming year. So sit back, relax, and enjoy the ride through the future. (Don’t forget to wear your VR headgear! ) Here are 18 things that will happen in 2018 that are either AI driven or driving AI: Artificial General Intelligence may gain major traction in research. We will turn to AI enabled solution to solve mission-critical problems. Machine Learning adoption in business will see rapid growth. Safety, ethics, and transparency will become an integral part of AI application design conversations. Mainstream adoption of AI on mobile devices Major research on data efficient learning methods AI personal assistants will continue to get smarter Race to conquer the AI optimized hardware market will heat up further We will see closer AI integration into our everyday lives. The cryptocurrency hype will normalize and pave way for AI-powered Blockchain applications. Advancements in AI and Quantum Computing will share a symbiotic relationship Deep learning will continue to play a significant role in AI development progress. AI will be on both sides of the cybersecurity challenge. Augmented reality content will be brought to smartphones. Reinforcement learning will be applied to a large number of real-world situations. Robotics development will be powered by Deep Reinforcement learning and Meta-learning Rise in immersive media experiences enabled by AI. A large number of organizations will use Digital Twin. 1. General AI: AGI may start gaining traction in research. AlphaZero is only the beginning. 2017 saw Google’s AlphaGo Zero (and later AlphaZero) beat human players at Go, Chess, and other games. In addition to this, computers are now able to recognize images, understand speech, drive cars, and diagnose diseases better with time. AGI is an advancement of AI which deals with bringing machine intelligence as close to humans as possible. So, machines can possibly do any intellectual task that a human can! The success of AlphaGo covered one of the crucial aspects of AGI systems—the ability to learn continually, avoiding catastrophic forgetting. However, there is a lot more to achieving human-level general intelligence than the ability to learn continually. For instance, AI systems of today can draw on skills it learned on one game to play another. But they lack the ability to generalize the learned skill. Unlike humans, these systems do not seek solutions from previous experiences. An AI system cannot ponder and reflect on a new task, analyze its capabilities, and work out how best to apply them. In 2018, we expect to see advanced research in the areas of deep reinforcement learning, meta-learning, transfer learning, evolutionary algorithms and other areas that aid in developing AGI systems. Detailed aspects of these ideas are highlighted in later points. We can indeed say, Artificial General Intelligence is inching closer than ever before and 2018 is expected to cover major ground in that direction. 2. Enterprise AI: Machine Learning adoption in enterprises will see rapid growth. 2017 saw a rise in cloud offerings by major tech players, such as the Amazon Sagemaker, Microsoft Azure Cloud, Google Cloud Platform, allowing business professionals and innovators to transfer labor-intensive research and analysis to the cloud. Cloud is a $130 billion industry as of now, and it is projected to grow.  Statista carried out a survey to present the level of AI adoption among businesses worldwide, as of 2017.  Almost 80% of the participants had incorporated some or other form of AI into their organizations or planned to do so in the future. Source: https://www.statista.com/statistics/747790/worldwide-level-of-ai-adoption-business/ According to a report from Deloitte, medium and large enterprises are set to double their usage of machine learning by the end of 2018. Apart from these, 2018 will see better data visualization techniques, powered by machine learning, which is a critical aspect of every business.  Artificial intelligence is going to automate the cycle of report generation and KPI analysis, and also, bring in deeper analysis of consumer behavior. Also with abundant Big data sources coming into the picture, BI tools powered by AI will emerge, which can harness the raw computing power of voluminous big data for data models to become streamlined and efficient. 3. Transformative AI: We will turn to AI enabled solutions to solve mission-critical problems. 2018 will see the involvement of AI in more and more mission-critical problems that can have world-changing consequences: read enabling genetic engineering, solving the energy crisis, space exploration, slowing climate change, smart cities, reducing starvation through precision farming, elder care etc. Recently NASA revealed the discovery of a new exoplanet, using data crunched from Machine learning and AI. With this recent reveal, more AI techniques would be used for space exploration and to find other exoplanets. We will also see the real-world deployment of AI applications. So it will not be only about academic research, but also about industry readiness. 2018 could very well be the year when AI becomes real for medicine. According to Mark Michalski, executive director, Massachusetts General Hospital and Brigham and Women’s Center for Clinical Data Science, “By the end of next year, a large number of leading healthcare systems are predicted to have adopted some form of AI within their diagnostic groups.”  We would also see the rise of robot assistants, such as virtual nurses, diagnostic apps in smartphones, and real clinical robots that can monitor patients, take care of the elderly, alert doctors, and send notifications in case of emergency. More research will be done on how AI enabled technology can help in difficult to diagnose areas in health care like mental health, the onset of hereditary diseases among others. Facebook's attempt at detection of potential suicidal messages using AI is a sign of things to come in this direction. As we explore AI enabled solutions to solve problems that have a serious impact on individuals and societies at large, considering the ethical and moral implications of such solutions will become central to developing them, let alone hard to ignore. 4. Safe AI: Safety, Ethics, and Transparency in AI applications will become integral to conversations on AI adoption and app design. The rise of machine learning capabilities has also given rise to forms of bias, stereotyping and unfair determination in such systems. 2017 saw some high profile news stories about gender bias, object recognition datasets like MS COCO, to racial disparities in education AI systems. At NIPS 2017, Kate Crawford talked about bias in machine learning systems which resonated greatly with the community and became pivotal to starting conversations and thinking by other influencers on how to address the problems raised.  DeepMind also launched a new unit, the DeepMind Ethics & Society,  to help technologists put ethics into practice, and to help society anticipate and direct the impact of AI for the benefit of all. Independent bodies like IEEE also pushed for standards in it’s ethically aligned design paper. As news about the bro culture in Silicon Valley and the lack of diversity in the tech sector continued to stay in the news all of 2017, it hit closer home as the year came to an end, when Kristian Lum, Lead Statistician at HRDAG, described her experiences with harassment as a graduate student at prominent stat conferences. This has had a butterfly effect of sorts with many more women coming forward to raise the issue in the ML/AI community. They talked about the formation of a stronger code of conduct by boards of key conferences such as NIPS among others. Eric Horvitz, a Microsoft research director, called Lum’s post a "powerful and important report." Jeff Dean, head of Google’s Brain AI unit applauded Lum for having the courage to speak about this behavior. Other key influencers from the ML and statisticians community also spoke in support of Lum and added their views on how to tackle the problem. While the road to recovery is long and machines with moral intelligence may be decades away, 2018 is expected to start that journey in the right direction by including safety, ethics, and transparency in AI/ML systems. Instead of just thinking about ML contributing to decision making in say hiring or criminal justice, data scientists would begin to think of the potential role of ML in the harmful representation of human identity. These policies will not only be included in the development of larger AI ecosystems but also in national and international debates in politics, businesses, and education. 5. Ubiquitous AI: AI will start redefining life as we know it, and we may not even know it happened. Artificial Intelligence will gradually integrate into our everyday lives. We will see it in our everyday decisions like what kind of food we eat, the entertainment we consume, the clothes we wear, etc.  Artificially intelligent systems will get better at complex tasks that humans still take for granted, like walking around a room and over objects. We’re going to see more and more products that contain some form of AI enter our lives. AI enabled stuff will become more common and available. We will also start seeing it in the background for life-altering decisions we make such as what to learn, where to work, whom to love, who our friends are,  whom should we vote for, where should we invest, and where should we live among other things. 6. Embedded AI: Mobile AI means a radically different way of interacting with the world. There is no denying that AI is the power source behind the next generation of smartphones. A large number of organizations are enabling the use of AI in smartphones, whether in the form of deep learning chips, or inbuilt software with AI capabilities. The mobile AI will be a  combination of on-device AI and cloud AI. Intelligent phones will have end-to-end capabilities that support coordinated development of chips, devices, and the cloud. The release of iPhone X’s FaceID—which uses a neural network chip to construct a mathematical model of the user’s face— and self-driving cars are only the beginning. As 2018 rolls out we will see vast applications on smartphones and other mobile devices which will run deep neural networks to enable AI. AI going mobile is not just limited to the embedding of neural chips in smartphones. The next generation of mobile networks 5G will soon greet the world. 2018 is going to be a year of closer collaborations and increasing partnerships between telecom service providers, handset makers, chip markers and AI tech enablers/researchers. The Baidu-Huawei partnership—to build an open AI mobile ecosystem, consisting of devices, technology, internet services, and content—is an example of many steps in this direction. We will also see edge computing rapidly becoming a key part of the Industrial Internet of Things (IIoT) to accelerate digital transformation. In combination with cloud computing, other forms of network architectures such as fog and mist would also gain major traction. All of the above will lead to a large-scale implementation of cognitive IoT, which combines traditional IoT implementations with cognitive computing. It will make sensors capable of diagnosing and adapting to their environment without the need for human intervention. Also bringing in the ability to combine multiple data streams that can identify patterns. This means we will be a lot closer to seeing smart cities in action. 7. Data-sparse AI: Research into data efficient learning methods will intensify 2017 saw highly scalable solutions for problems in object detection and recognition, machine translation, text-to-speech, recommender systems, and information retrieval.  The second conference on Machine Translation happened in September 2017.  The 11th ACM Conference on Recommender Systems in August 2017 witnessed a series of papers presentations, featured keynotes, invited talks, tutorials, and workshops in the field of recommendation system. Google launched the Tacotron 2 for generating human-like speech from text. However, most of these researches and systems attain state-of-the-art performance only when trained with large amounts of data. With GDPR and other data regulatory frameworks coming into play, 2018 is expected to witness machine learning systems which can learn efficiently maintaining performance, but in less time and with less data. A data-efficient learning system allows learning in complex domains without requiring large quantities of data. For this, there would be developments in the field of semi-supervised learning techniques, where we can use generative models to better guide the training of discriminative models. More research would happen in the area of transfer learning (reuse generalize knowledge across domains), active learning, one-shot learning, Bayesian optimization as well as other non-parametric methods.  In addition, researchers and organizations will exploit bootstrapping and data augmentation techniques for efficient reuse of available data. Other key trends propelling data efficient learning research are growing in-device/edge computing, advancements in robotics, AGI research, and energy optimization of data centers, among others. 8. Conversational AI: AI personal assistants will continue to get smarter AI-powered virtual assistants are expected to skyrocket in 2018. 2017 was filled to the brim with new releases. Amazon brought out the Echo Look and the Echo Show. Google made its personal assistant more personal by allowing linking of six accounts to the Google Assistant built into the Home via the Home app. Bank of America unveiled Erica, it’s AI-enabled digital assistant. As 2018 rolls out, AI personal assistants will find its way into an increasing number of homes and consumer gadgets. These include increased availability of AI assistants in our smartphones and smart speakers with built-in support for platforms such as Amazon’s Alexa and Google Assistant. With the beginning of the new year, we can see personal assistants integrating into our daily routines. Developers will build voice support into a host of appliances and gadgets by using various voice assistant platforms. More importantly, developers in 2018 will try their hands on conversational technology which will include emotional sensitivity (affective computing) as well as machine translational technology (the ability to communicate seamlessly between languages). Personal assistants would be able to recognize speech patterns, for instance, of those indicative of wanting help. AI bots may also be utilized for psychiatric counseling or providing support for isolated people.  And it’s all set to begin with the AI assistant summit in San Francisco scheduled on 25 - 26 January 2018. It will witness talks by world's leading innovators in advances in AI Assistants and artificial intelligence. 9. AI Hardware: Race to conquer the AI optimized hardware market will heat up further Top tech companies (read Google, IBM, Intel, Nvidia) are investing heavily in the development of AI/ML optimized hardware. Research and Markets have predicted the global AI chip market will have a growth rate of about 54% between 2017 and 2021. 2018 will see further hardware designs intended to greatly accelerate the next generation of applications and run AI computational jobs. With the beginning of 2018 chip makers will battle it out to determine who creates the hardware that artificial intelligence lives on. Not only that, there would be a rise in the development of new AI products, both for hardware and software platforms that run deep learning programs and algorithms. Also, chips which move away from the traditional one-size-fits-all approach to application-based AI hardware will grow in popularity. 2018 would see hardware which does not only store data, but also transform it into usable information. The trend for AI will head in the direction of task-optimized hardware. 2018 may also see hardware organizations move to software domains and vice-versa. Nvidia, most famous for their Volta GPUs have come up with NVIDIA DGX-1, a software for AI research, designed to streamline the deep learning workflow. More such transitions are expected at the highly anticipated CES 2018. [dropcap]P[/dropcap]hew, that was a lot of writing! But I hope you found it just as interesting to read as I found writing it. However, we are not done yet. And here is part 2 of our 18 AI trends in ‘18. 
Read more
  • 0
  • 0
  • 23572

article-image-twitter-sentiment-analysis
Packt
19 Feb 2016
30 min read
Save for later

Twitter Sentiment Analysis

Packt
19 Feb 2016
30 min read
In this article, we will cover: Twitter and it's importance Getting hands on with Twitter's data and using various Twitter APIs Use of data to solve business problems—comparison of various businesses based on tweets (For more resources related to this topic, see here.) Twitter and its importance Twitter can be considered as extension of the short messages service or SMS but on an Internet-based platform. In the words of Jack Dorsey, co-founder and co-creator of Twitter: "...We came across the word 'twitter', and it was just perfect. The definition was 'a short burst of inconsequential information,' and 'chirps from birds'. And that's exactly what the product was" Twitter acts as a utility where one can send their SMSs to the whole world. It enables people to instantaneously get heard and get a response. Since the audience of this SMS is so large, many a times responses are very quick. So, Twitter facilitates the basic social instincts of humans. By sharing on Twitter, a user can easily express his/her opinion for just about everything and at anytime. Friends who are connected or, in case of Twitter, followers, immediately get the information about what's going on in someone's life. This in turn severs another humanemotion—the innate need to know about what is going on in someone's life. Apart from being real time, Twitter's UI is really easy to work with. It's naturally and instinctively understood, that is, the UI is very intuitive in nature. Each tweet on Twitter is a short message with maximum of 140 characters. Twitter is an excellent example of a microblogging service. As of July 2014, the Twitter user base reached above 500 million, with more than 271 million active users. Around 23 percent are adult Internet users, which is also about 19 percent of the entire adult population. If we can properly mine what users are tweeting about, Twitter can act as a great tool for advertisement and marketing. But this not the only information Twitter provides. Because of its non-symmetric nature in terms of followers and followings, Twitter assists better in terms of understanding user interests rather than its impact on the social network. An interest graph can be thought of as a method to learn the links between individuals and their diverse interests. Computing the degree of association or correlations between individual's interests and the potential advertisements are one of the most important applications of the interest graphs. Based on these correlations, a user can be targeted so as to attain a maximum response to an advertisement campaign along with followers' recommendations. One interesting fact about Twitter (and Facebook) is that the user does not need to be a real person. A user on Twitter (or on Facebook) can be anything and anyone, for example, an organization, a campaign itself, a famous but imaginary personality (a fictional character recognizable in the media) apart from a real/actual person. If a real person follows these users on Twitter, a lot can be inferred about their personality and hence they can be recommended ads or other followers based on such information. For example, @fakingnews is an Indian blog that publishes news satires ranging from Indian politics to typical Indian mindsets. People who follow @fakingnews are the ones who, in general, like to read sarcasm news. Hence, these people can be thought of as to belonging to the same cluster or a community. If we have another sarcastic blog, we can always recommend it to this community and improve on advertisement return on investment. The chances of getting more hits via people belonging to this community will be higher than a community who don't follows @fakingnews, or any such news, in general. Once you have comprehended that Twitter allows you to create, link, and investigate a community of interest for a random topic, the influence of Twitter and the knowledge one can find from mining it becomes clearer. Understanding Twitter's API Twitter APIs provide a means to access the Twitter data, that is, tweets sent by its millions of users. Let's get to know these APIs a bit better. Twitter vocabulary As described earlier, Twitter is a microblogging service with social aspect associated. It allows its users to express their views/sentiments with the means of Internet SMS, called tweets in the context of Twitter. These tweets are entities formed of maximum of 140 characters. The content of these tweets can be anything ranging from a person's mood to person's location to a person's curiosity. The platform where these tweets are posted is called Timeline. To use Twitter's APIs, one must understand the basic terminology. Tweets are the crux of Twitter. Theoretically, a tweet is just 140 characters of text content tweeted by a user, but there is more to it than just that. There is more metadata associated with the same tweet, which are classified by Twitter as entities and places. The entities constitute of hash tags, URLs, and other media data that users have included in their tweet. The places are nothing but locations from where the tweet originated. It possible the place is a real world location from where the tweet was sent, or it is a location mentioned in the text of the tweet. Take the following tweet as an example: Learn how to consume millions of tweets with @twitterapi at #TDC2014 in São Paulo #bigdata tomorrow at 2:10pm http://t.co/pTBlWzTvVd The preceding tweet was tweeted by @TwitterDev and it's about 132 characters long. The following are the entities mentioned in this tweet: Handle: @twitterapi Hashtags: #TDC2014, #bigdata URL: http://t.co/pTBlWzTvVd São Paulo is the place mentioned in this tweet. This is a one such example of a tweet with a fairly good amount of metadata. Although the actual tweet's length is well within the 140-character limit, it contains more information than one can think of. This actually enables us to figure out that this tweet belongs to a specific community based on the cross referencing the topics presents in the hash tags, the URL to the website, the different users mentioned in it, and so on. The interface (web or mobile) on to which the tweets are displayed is called timeline. The tweets are, in general, arranged in chronological order of posting time. On a specific user's account, only certain number of tweets are displayed by Twitter. This is generally based on users the given user is following and is being followed by. This is the interface a user will see when he/she login his/her Twitter account. A Twitter stream is different from Twitter timeline in the sense that they are not for a specific user. The Tweets on a user's Twitter timeline will be displayed from only certain number of users will be displayed/updated less frequently while the Twitter stream is chronological collection of the all the tweets posted by all the users. The number of active users on Twitter is in orders of hundreds of millions. All the users tweeting during some public events of widespread interest such as presidential debates can achieve speeds of several hundreds of thousands of tweets per minute. The behavior is very similar to a stream; hence the name of such collection is Twitter stream. You can try the following by creating a Twitter account (it would be more insightful if you have less number of followers already with you). Before creating the account, it is advised that you read all the terms and conditions of the same. You can also start reading its API's documentation. Creating a Twitter API connection We need to have an app created at https://dev.twitter.com/apps before making any API requests to Twitter. It's a standard method for developers to gain API access and more important it helps Twitter to observe and restricts developer from making high load API requests. The ROAuth package is the one we are going to use in our experiments. Tokens allow users to authorize third-party apps to access the data from any user account without the need to have their passwords (or other sensitive information). ROAuth basically facilitates the same. Creating new app The first step to getting any kind of token access from twitter is to create an app on it. The user has to go to https://dev.twitter.com/ and log in with their Twitter credentials. With you logged in using your credentials, the step for creating app are as follows: Go to https://apps.twitter.com/app/new. Put the name of your application in the Name field. This name can be anything you like. Similarly, enter the description in the Description field. The Website field needs to be filled with a valid URL, but again that can be any random URL. You can leave the Callback URL field blank. After the creation of this app, we need to find the API Key and API Secret values from the Key and Access Token tab. Consider the example shown in the following figure:   Under the Key and Access Tokens tab, you will find a button to generate access tokens. Click on it and you will be provided with an Access Token and Access Token Secret value. Before using the preceding keys, we need to install twitteRto access the data in R using the app we just created, using following code: Install.packages(c("devtools", "rjson", "bit64", "httr")) library(devtools) install_github("geoffjentry/twitteR"). library(twitteR) Here's sample code that helps us access the tweets posted since any give date and which contain a specific keyword. In this example, we are searching for tweeting containing the word Earthquake in the tweets posted since September 29, 2014. In order to get this information, we provide four special types of information to get the authorization token: key secret access token access token secret We'll show you how to use the preceding information to get an app authorized by the user and access its resources on Twitter. The ROAuh function in twitteR will make our next steps very smooth and clear: api_key<- "your_api_key" api_secret<- "your_api_secret" access_token<- "your_access_token" access_token_secret<- "your_access_token_secret" setup_twitter_oauth (api_key,api_secret,access_token,access_token_secret) EarthQuakeTweets = searchTwitter("EarthQuake", since='2014-09-29') The results of this example should simply display Using direct authentication with 25 tweets loaded in the EarthQuakeTweets variable as shown here. head(EarthQuakeTweets,2) [[1]] [1] "TamamiJapan: RT @HistoricalPics: Japan. Top: One Month After Hiroshima, 1945. Bottom: One Month After The Earthquake and Tsunami, 2011. Incredible. http…" [[2]] [1] "OldhamDs: RT @HistoricalPics: Japan. Top: One Month After Hiroshima, 1945. Bottom: One Month After The Earthquake and Tsunami, 2011. Incredible. http…" We have shown in the first two of the 25 tweets containing the word Earthquake since September 29, 2014. If you closely observe the results, you'll find all the metadata using str(EarthQuakeTweets[1]). Finding trending topics Now that we understand how to create API connections to Twitter and fetch data using it, we will see how to get answer to what is trending on Twitter to list what topic (worldwide or local) is being talked about the most right now. Using the same API, we can easily access the trending information: #return data frame with name, country & woeid. Locs <- availableTrendLocations() # Where woeid is a numerical identification code describing a location ID # Filter the data frame for Delhi (India) and extract the woeid of the same LocsIndia = subset(Locs, country == "India") woeidDelhi = subset(LocsIndia, name == "Delhi")$woeid # getTrends takes a specified woeid and returns the trending topics associated with that woeid trends = getTrends(woeid=woeidDelhi) The function availableTrendLocations() returns R data frame containing the name, country, and woeid parameters. We than filter this data frame for a location of our choosing; in this example, its Delhi, India. The function getTrends() fetches the top 10 trends in the location determined by the woeid. Here are the top four trending hash tags in the region defined by woeid = 20070458, that is, Delhi, India. head(trends) name url query woeid 1 #AntiHinduNGOsExposed http://twitter.com/search?q=%23AntiHinduNGOsExposed %23AntiHinduNGOsExposed 20070458 2 #KhaasAadmi http://twitter.com/search?q=%23KhaasAadmi %23KhaasAadmi 20070458 3 #WinGOSF14 http://twitter.com/search?q=%23WinGOSF14 %23WinGOSF14 20070458 4 #ItsForRealONeBay http://twitter.com/search?q=%23ItsForRealONeBay %23ItsForRealONeBay 20070458 Searching tweets Now, similar to the trends there is one more important function that comes with the TwitteR package: searchTwitter(). This function will return tweets containing the searched string along with the other constraints. Some of the constraints that can be imposed are as follows: lang: This constraints the tweets of given language. since/until: This constraints the tweets to be since the given date or until the given date. geocode: This constraints tweets to be from only those users who are located within certain distance from the given latitude/longitude. For example, extracting tweets about the cricketer Sachin Tendulkar in the month of November 2014: head(searchTwitter('Sachin Tendulkar', since='2014-11-01', until= '2014-11-30')) [[1]] [1] "TendulkarFC: RT @Moulinparikh: Sachin Tendulkar had a long session with the Mumbai Ranji Trophy team after today's loss." [[2]] [1] "tyagi_niharika: @WahidRuba @Anuj_dvn @Neel_D_ @alishatariq3 @VWellwishers @Meenal_Rathore oh... Yaadaaya....hmaraesachuuu sirxedxa0xbdxedxb8x8d..i mean sachin Tendulkar" [[3]] [1] "Meenal_Rathore: @WahidRuba @Anuj_dvn @tyagi_niharika @Neel_D_ @alishatariq3 @AliaaFcc @VWellwishers .. Sachin Tendulkar xedxa0xbdxedxb8x8a☺️" [[4]] [1] "MishraVidyanand: Vidyanand Mishra is following the Interest "The Living Legend SachinTendu..." on http://t.co/tveHXMB4BM - http://t.co/CocNMcxFge" [[5]] [1] "CSKalwaysWin: I have never tried to compare myself to anyone else.n - Sachin Tendulkar" Twitter sentiment analysis Depending on the objective and based on the functionality to search any type of tweets from the public timeline, one can always collect the required corpus. For example, you may want to learn about customer satisfaction levels with various cab services, which are coming in Indian market. These start-ups are offering various discounts and coupons to attract customers but at the end of the day, the service quality determines the business of any organization. These startups are constantly promoting themselves on various social media websites. Customers are showing various levels of sentiments on the same platform. Let's target the following: Meru Cabs: A radio cabs service based in Mumbai, India. Launched in 2007. Ola Cabs: A taxi aggregator company based in Bangalore, India. Launched in 2011. TaxiForSure: A taxi aggregator company based in Bangalore, India. Launched in 2011. Uber India: A taxi aggregator company headquartered in San Francisco, California. Launched in India in 2014. Let's set our goal to get the general sentiments about each of the preceding services providers based on the customer sentiments present in the tweets on Twitter. Collecting tweets as a corpus We'll start with the searchTwitter()function (discussed previously) on the TwitteR package to gather the tweets for each of the preceding organizations. Now, in order to avoid writing same code again and again, we pushed the following authorization code in the file called authenticate.R. library(twitteR) api_key<- "xx" api_secret<- "xx" access_token<- "xx" access_token_secret<- "xx" setup_twitter_oauth(api_key,api_secret,access_token, access_token_secret) We run the following scripts to get the required tweets: # Load the necessary packages source('authenticate.R') Meru_tweets = searchTwitter("MeruCabs", n=2000, lang="en") Ola_tweets = searchTwitter("OlaCabs", n=2000, lang="en") TaxiForSure_tweets = searchTwitter("TaxiForSure", n=2000, lang="en") Uber_tweets = searchTwitter("Uber_Delhi", n=2000, lang="en") Now, as mentioned in Twitter's Rest API documentation, we get the message "Due to capacity constraints, the index currently only covers about a week's worth of tweets". We do not always get the desired number of tweets (for example, here it's 2000). Instead, the following are the size of each of the above Tweet lists we get the following: >length(Meru_tweets) [1] 393 >length(Ola_tweets) [1] 984 > length(TaxiForSure_tweets) [1] 720 > length(Uber_tweets) [1] 2000 As you can see from the preceding code, the length of these tweets is not equal to the number of tweets we had asked for in our query scripts. There are many takeaways from this information. Since these tweets are only from last one week's tweets on Twitter, they suggest there is more discussion about these taxi services in the following order: Uber India Ola Cabs TaxiForSure Meru Cabs A ban was imposed on Uber India after an alleged rape incident by one Uber India driver. The decision to put a ban on the entire organization because one of its drivers committed a crime became a matter of public outcry. Hence, the number of tweets about Uber increased on social media. Now, Meru Cabs have been in India for almost 7 years now. Hence, they are quite a stable organization. They amount of promotion Ola Cabs and TaxiForSure are doing is way higher than that of Meru Cabs. This can be one reason for Meru Cabs having theleast number (393) of tweets in last week. The number of tweets in last week is comparable for Ola Cabs (984) and TaxiForSure (720). There can be several numbers of reasons for the same. They were both started their business in same year and more importantly they follow the same business model. While Meru Cabs is a radio taxi service and they own and manage a fleet of cars while Ola Cabs, TaxiForSure, or Uber are a marketplace for users to compare the offerings of various operators and book easily. Let's dive deep into the data and get more insights. Cleaning the corpus Before applying any intelligent algorithms to gather more insights out of the tweets collected so far, let's first clean it. In order to clean up, we should understand how the list of tweets looks like: head(Meru_tweets) [[1]] [1] "MeruCares: @KapilTwitts 2&gt;...and other details at feedback@merucabs.com We'll check back and reach out soon." [[2]] [1] "vikasraidhan: @MeruCabs really disappointed with @GenieCabs. Cab is never assigned on time. Driver calls after 30 minutes. Why would I ride with Meru?" [[3]] [1] "shiprachowdhary: fallback of #ubershame , #MERUCABS taking customers for a ride" [[4]] [1] "shiprachowdhary: They book Genie, but JIT inform of cancellation &amp; send full fare #MERUCABS . Very disappointed.Always used these guys 4 and recommend them." [[5]] [1] "shiprachowdhary: No choice bt to take the #merucabs premium service. Driver told me that this happens a lot with #merucabs." [[6]] [1] "shiprachowdhary: booked #Merucabsyestrdy. Asked for Meru Genie. 10 mins 4 pick up time, they call to say Genie not available, so sending the full fare cab" The first tweet here is a grievance solution, while the second, fourth and fifth are actually customer sentiments about the services provided by Meru Cabs. We see: Lots of meta information such as @people, URLs and #hashtags Punctuation marks, numbers, and unnecessary spaces Some of these tweets are retweets from other users; for the given application, we would not like to consider retweets (RTs) in sentiment analysis We clean all these data using the following code block: MeruTweets <- sapply(Meru_tweets, function(x) x$getText()) OlaTweets = sapply(Ola_tweets, function(x) x$getText()) TaxiForSureTweets = sapply(TaxiForSure_tweets, function(x) x$getText()) UberTweets = sapply(Uber_tweets, function(x) x$getText()) catch.error = function(x) { # let us create a missing value for test purpose y = NA # Try to catch that error (NA) we just created catch_error = tryCatch(tolower(x), error=function(e) e) # if not an error if (!inherits(catch_error, "error")) y = tolower(x) # check result if error exists, otherwise the function works fine. return(y) } cleanTweets<- function(tweet){ # Clean the tweet for sentiment analysis # remove html links, which are not required for sentiment analysis tweet = gsub("(f|ht)(tp)(s?)(://)(.*)[.|/](.*)", " ", tweet) # First we will remove retweet entities from the stored tweets (text) tweet = gsub("(RT|via)((?:\b\W*@\w+)+)", " ", tweet) # Then remove all "#Hashtag" tweet = gsub("#\w+", " ", tweet) # Then remove all "@people" tweet = gsub("@\w+", " ", tweet) # Then remove all the punctuation tweet = gsub("[[:punct:]]", " ", tweet) # Then remove numbers, we need only text for analytics tweet = gsub("[[:digit:]]", " ", tweet) # finally, we remove unnecessary spaces (white spaces, tabs etc) tweet = gsub("[ t]{2,}", " ", tweet) tweet = gsub("^\s+|\s+$", "", tweet) # if anything else, you feel, should be removed, you can. For example "slang words" etc using the above function and methods. # Next we'll convert all the word in lower case. This makes uniform pattern. tweet = catch.error(tweet) tweet } cleanTweetsAndRemoveNAs<- function(Tweets) { TweetsCleaned = sapply(Tweets, cleanTweets) # Remove the "NA" tweets from this tweet list TweetsCleaned = TweetsCleaned[!is.na(TweetsCleaned)] names(TweetsCleaned) = NULL # Remove the repetitive tweets from this tweet list TweetsCleaned = unique(TweetsCleaned) TweetsCleaned } MeruTweetsCleaned = cleanTweetsAndRemoveNAs(MeruTweets) OlaTweetsCleaned = cleanTweetsAndRemoveNAs(OlaTweets) TaxiForSureTweetsCleaned <- cleanTweetsAndRemoveNAs(TaxiForSureTweets) UberTweetsCleaned = cleanTweetsAndRemoveNAs(UberTweets) Here's the size of each of the cleaned tweet lists: > length(MeruTweetsCleaned) [1] 309 > length(OlaTweetsCleaned) [1] 811 > length(TaxiForSureTweetsCleaned) [1] 574 > length(UberTweetsCleaned) [1] 1355 Estimating sentiment (A) There are many sophisticated resources available to estimate sentiments. Many research papers and software packages are available open source,and they implement very complex algorithms for sentiments analysis. After getting the cleaned Twitter data, we are going to use few of such R packages available to assess the sentiments in the tweets. It's worth mentioning here that not all the tweets represent a sentiment. Few tweets can be just information/facts, while others can be customer care responses. Ideally, they should not be used to assess the customer sentiment about a particular organization. As a first step, we'll use a Naïve algorithm, which gives a score based on the number of times a positive or a negative word occurred in the given sentence (and in our case, in a tweet). Please download the positive and negative opinion/sentiment (nearly 68, 000) words from English language. These opinion lexicon will be used as a first example in our sentiment analysis experiment. The good thing about this approach is that we are relying on a highly researched upon and at the same time customizable input parameters. Here are a few examples of existing positive and negative sentiments words: Positive: Love, best, cool, great, good, and amazing Negative: Hate, worst, sucks, awful, and nightmare >opinion.lexicon.pos = scan('opinion-lexicon-English/positive-words.txt', what='character', comment.char=';') >opinion.lexicon.neg = scan('opinion-lexicon-English/negative-words.txt', what='character', comment.char=';') > head(opinion.lexicon.neg) [1] "2-faced" "2-faces" "abnormal" "abolish" "abominable" "abominably" > head(opinion.lexicon.pos) [1] "a+" "abound" "abounds" "abundance" "abundant" "accessable" We'll add a few industry-specific and/or especially emphatic terms based on our requirements: pos.words = c(opinion.lexicon.pos,'upgrade') neg.words = c(opinion.lexicon.neg,'wait', 'waiting', 'wtf', 'cancellation') Now, we create a function score.sentiment(), which computes the raw sentiment based on the simple matching algorithm: getSentimentScore = function(sentences, words.positive, words.negative, .progress='none') { require(plyr) require(stringr) scores = laply(sentences, function(sentence, words.positive, words.negative) { # Let first remove the Digit, Punctuation character and Control characters: sentence = gsub('[[:cntrl:]]', '', gsub('[[:punct:]]', '', gsub('\d+', '', sentence))) # Then lets convert all to lower sentence case: sentence = tolower(sentence) # Now lets split each sentence by the space delimiter words = unlist(str_split(sentence, '\s+')) # Get the boolean match of each words with the positive & negative opinion-lexicon pos.matches = !is.na(match(words, words.positive)) neg.matches = !is.na(match(words, words.negative)) # Now get the score as total positive sentiment minus the total negatives score = sum(pos.matches) - sum(neg.matches) return(score) }, words.positive, words.negative, .progress=.progress ) # Return a data frame with respective sentence and the score return(data.frame(text=sentences, score=scores)) } Now, we apply the preceding function on the corpus of tweets collected and cleaned so far: MeruResult = getSentimentScore(MeruTweetsCleaned, words.positive , words.negative) OlaResult = getSentimentScore(OlaTweetsCleaned, words.positive , words.negative) TaxiForSureResult = getSentimentScore(TaxiForSureTweetsCleaned, words.positive , words.negative) UberResult = getSentimentScore(UberTweetsCleaned, words.positive , words.negative) Here are some sample results: Tweet for Meru Cabs Score gt and other details at feedback com we ll check back and reach out soon 0 really disappointed with cab is never assigned on time driver calls after minutes why would i ride with meru -1 so after years of bashing today i m pleasantly surprised clean car courteous driver prompt pickup mins efficient route 4 a min drive cost hrs used to cost less ur unreliable and expensive trying to lose ur customers -3 Tweet For Ola Cabs Score the service is going from bad to worse the drivers deny to come after a confirmed booking -3 love the olacabs app give it a swirl sign up with my referral code dxf n and earn rs download the app from 1 crn kept me waiting for mins amp at last moment driver refused pickup so unreliable amp irresponsible -4 this is not the first time has delighted me punctuality and free upgrade awesome that 4 Tweet For TaxiForSure Score great service now i have become a regular customer of tfs thank you for the upgrade as well happy taxi ing saving 5 really disappointed with cab is never assigned on time driver calls after minutes why would i ride with meru -1 horrible taxi service had to wait for one hour with a new born in the chilly weather of new delhi waiting for them -4 what do i get now if you resolve the issue after i lost a crucial business because of the taxi delay -3 Tweet For Uber India Score that s good uber s fares will prob be competitive til they gain local monopoly then will go sky high as in new york amp delhi saving 3 from a shabby backend app stack to daily pr fuck ups its increasingly obvious that is run by child minded blow hards -3 you say that uber is illegally running were you stupid to not ban earlier and only ban it now after the rape -3 perhaps uber biz model does need some looking into it s not just in delhi that this happens but in boston too 0 From the preceding observations, it's clear that this basic sentiment analysis method works fine in normal circumstances, but in case of Uber India the results deviated too much from a subjective score. It's safe to say that basic word matching gives a good indicator of overall customer sentiments, except in the case when the data itself is not reliable. In our case, the tweets from Uber India are not really related to the services that Uber provides, rather the one incident of crime by its driver and whole score went haywire. Let's not compute a point statistic of the scores we have computed so far. Since the numbers of tweets are not equal for each of the four organizations, we compute a mean and standard deviation for each. Organization Mean Sentiment Score Standard Deviation Meru Cabs -0.2218543 1.301846 Ola Cabs 0.197724 1.170334 TaxiForSure -0.09841828 1.154056 Uber India -0.6132666 1.071094 Estimating sentiment (B) Let's now move one step further. Now instead of using simple matching of opinion lexicon, we'll use something called Naive Bayes to decide on the emotion present in any tweet. We would require packages called Rstem and sentiment to assist in this. It's important to mention here that both these packages are no longer available in CRAN and hence we have to provide either the repository location as a parameter install.package() function. Here's the R script to install the required packages: install.packages("Rstem", repos = "http://www.omegahat.org/R", type="source") require(devtools) install_url("http://cran.r-project.org/src/contrib/Archive/sentiment/sentiment_0.2.tar.gz") require(sentiment) ls("package:sentiment") Now that we have the sentiment and Rstem packages installed in our R workspace, we can build the bayes classifier for sentiment analysis: library(sentiment) # classify_emotion function returns an object of class data frame # with seven columns (anger, disgust, fear, joy, sadness, surprise, # # best_fit) and one row for each document: MeruTweetsClassEmo = classify_emotion(MeruTweetsCleaned, algorithm="bayes", prior=1.0) OlaTweetsClassEmo = classify_emotion(OlaTweetsCleaned, algorithm="bayes", prior=1.0) TaxiForSureTweetsClassEmo = classify_emotion(TaxiForSureTweetsCleaned, algorithm="bayes", prior=1.0) UberTweetsClassEmo = classify_emotion(UberTweetsCleaned, algorithm="bayes", prior=1.0) The following figure shows few results from Bayesian analysis using thesentiment package for Meru Cabs tweets. Similarly, we generated results for other cab-services from our problem setup. The sentiment package was built to use a trained dataset of emotion words (nearly 1500 words). The function classify_emotion() generates results belonging to one of the following six emotions: anger, disgust, fear, joy, sadness, and surprise. Hence, when the system is not able to classify the overall emotion to any of the six,NA is returned: Let's substitute these NA values with the word unknown to make the further analysis easier: # we will fetch emotion category best_fit for our analysis purposes. MeruEmotion = MeruTweetsClassEmo[,7] OlaEmotion = OlaTweetsClassEmo[,7] TaxiForSureEmotion = TaxiForSureTweetsClassEmo[,7] UberEmotion = UberTweetsClassEmo[,7] MeruEmotion[is.na(MeruEmotion)] = "unknown" OlaEmotion[is.na(OlaEmotion)] = "unknown" TaxiForSureEmotion[is.na(TaxiForSureEmotion)] = "unknown" UberEmotion[is.na(UberEmotion)] = "unknown" The best-fit emotions present in these tweets are as follows: Further, we'll use another function classify_polarity() provided by the sentiment package to classify the tweets into two classes, pos (positive sentiment) or neg (negative sentiment). The idea is to compute the log likelihood of a tweet assuming it to belong to either of two classes. Once these likelihoods are calculated, a ratio of the pos-likelihood to neg-likelihood is calculated and based on this ratio the tweets are classified to belong to a particular class. It's important to note that if this ratio turns out to be 1, then the overall sentiment of the tweet is assumed to be "neutral". The code is as follows: MeruTweetsClassPol = classify_polarity(MeruTweetsCleaned, algorithm="bayes") OlaTweetsClassPol = classify_polarity(OlaTweetsCleaned, algorithm="bayes") TaxiForSureTweetsClassPol = classify_polarity(TaxiForSureTweetsCleaned, algorithm="bayes") UberTweetsClassPol = classify_polarity(UberTweetsCleaned, algorithm="bayes") We get the following output: The preceding figure shows few results from obtained using the classify_polarity() function of sentiment package for Meru Cabs tweets. We'll now generate consolidated results from the two functions in a data frame for each cab service for plotting purposes: # we will fetch polarity category best_fit for our analysis purposes, MeruPol = MeruTweetsClassPol[,4] OlaPol = OlaTweetsClassPol[,4] TaxiForSurePol = TaxiForSureTweetsClassPol[,4] UberPol = UberTweetsClassPol[,4] # Let us now create a data frame with the above results MeruSentimentDataFrame = data.frame(text=MeruTweetsCleaned, emotion=MeruEmotion, polarity=MeruPol, stringsAsFactors=FALSE) OlaSentimentDataFrame = data.frame(text=OlaTweetsCleaned, emotion=OlaEmotion, polarity=OlaPol, stringsAsFactors=FALSE) TaxiForSureSentimentDataFrame = data.frame(text=TaxiForSureTweetsCleaned, emotion=TaxiForSureEmotion, polarity=TaxiForSurePol, stringsAsFactors=FALSE) UberSentimentDataFrame = data.frame(text=UberTweetsCleaned, emotion=UberEmotion, polarity=UberPol, stringsAsFactors=FALSE) # rearrange data inside the frame by sorting it MeruSentimentDataFrame = within(MeruSentimentDataFrame, emotion <- factor(emotion, levels=names(sort(table(emotion), decreasing=TRUE)))) OlaSentimentDataFrame = within(OlaSentimentDataFrame, emotion <- factor(emotion, levels=names(sort(table(emotion), decreasing=TRUE)))) TaxiForSureSentimentDataFrame = within(TaxiForSureSentimentDataFrame, emotion <- factor(emotion, levels=names(sort(table(emotion), decreasing=TRUE)))) UberSentimentDataFrame = within(UberSentimentDataFrame, emotion <- factor(emotion, levels=names(sort(table(emotion), decreasing=TRUE)))) plotSentiments1<- function (sentiment_dataframe,title) { library(ggplot2) ggplot(sentiment_dataframe, aes(x=emotion)) + geom_bar(aes(y=..count.., fill=emotion)) + scale_fill_brewer(palette="Dark2") + ggtitle(title) + theme(legend.position='right') + ylab('Number of Tweets') + xlab('Emotion Categories') } plotSentiments1(MeruSentimentDataFrame, 'Sentiment Analysis of Tweets on Twitter about MeruCabs') plotSentiments1(OlaSentimentDataFrame, 'Sentiment Analysis of Tweets on Twitter about OlaCabs') plotSentiments1(TaxiForSureSentimentDataFrame, 'Sentiment Analysis of Tweets on Twitter about TaxiForSure') plotSentiments1(UberSentimentDataFrame, 'Sentiment Analysis of Tweets on Twitter about UberIndia') The output is as follows: In the preceding figure, we showed sample results using generated results on Meru Cabs tweets using both the functions. Let's now plot them one by one. First, let's create a single function to be used by each business's tweets. We call it plotSentiments1() and then we plot it for each business: The following dashboard shows the analysis for Ola Cabs: The following dashboard shows the analysis for TaxiForSure: The following dashboard shows the analysis for Uber India: These sentiments basically reflect the more or less the same observations as we did with the basic word-matching algorithm. The number of tweets with joy constitute the largest part of tweets for all these organizations, indicating that these organizations are trying their best to provide good business in the country. The sadness tweets are less numerous than the joy tweets. However, if compared with each other, they indicate the overall market share versus level of customer satisfaction of each service provider in question. Similarly, these graphs can be used to assess the level of dissatisfaction in terms of anger and disgust in the tweets. Let's now consider only the positive and negative sentiments present in the tweets: # Similarly we will plot distribution of polarity in the tweets plotSentiments2 <- function (sentiment_dataframe,title) { library(ggplot2) ggplot(sentiment_dataframe, aes(x=polarity)) + geom_bar(aes(y=..count.., fill=polarity)) + scale_fill_brewer(palette="RdGy") + ggtitle(title) + theme(legend.position='right') + ylab('Number of Tweets') + xlab('Polarity Categories') } plotSentiments2(MeruSentimentDataFrame, 'Polarity Analysis of Tweets on Twitter about MeruCabs') plotSentiments2(OlaSentimentDataFrame, 'Polarity Analysis of Tweets on Twitter about OlaCabs') plotSentiments2(TaxiForSureSentimentDataFrame, 'Polarity Analysis of Tweets on Twitter about TaxiForSure') plotSentiments2(UberSentimentDataFrame, 'Polarity Analysis of Tweets on Twitter about UberIndia') The output is as follows: The following dashboard shows the polarity analysis for Ola Cabs: The following dashboard shows the analysis for TaxiForSure: The following dashboard shows the analysis for Uber India: It's a basic human trait to inform about other's what's wrong rather than informing if there was something right. That is say that we tend to tweets/report if something bad had happened rather reporting/tweeting if the experience was rather good. Hence, the negative tweets are supposed to be larger than the positive tweets in general. Still over a period of time (a week in our case) the ratio of the two easily reflect the overall market share versus the level of customer satisfaction of each service provider in question. Next, we try to get the sense of the overall content of the tweets using the word clouds. removeCustomeWords <- function (TweetsCleaned) { for(i in 1:length(TweetsCleaned)){ TweetsCleaned[i] <- tryCatch({ TweetsCleaned[i] = removeWords(TweetsCleaned[i], c(stopwords("english"), "care", "guys", "can", "dis", "didn", "guy" ,"booked", "plz")) TweetsCleaned[i] }, error=function(cond) { TweetsCleaned[i] }, warning=function(cond) { TweetsCleaned[i] }) } return(TweetsCleaned) } getWordCloud <- function (sentiment_dataframe, TweetsCleaned, Emotion) { emos = levels(factor(sentiment_dataframe$emotion)) n_emos = length(emos) emo.docs = rep("", n_emos) TweetsCleaned = removeCustomeWords(TweetsCleaned) for (i in 1:n_emos){ emo.docs[i] = paste(TweetsCleaned[Emotion == emos[i]], collapse=" ") } corpus = Corpus(VectorSource(emo.docs)) tdm = TermDocumentMatrix(corpus) tdm = as.matrix(tdm) colnames(tdm) = emos require(wordcloud) suppressWarnings(comparison.cloud(tdm, colors = brewer.pal(n_emos, "Dark2"), scale = c(3,.5), random.order = FALSE, title.size = 1.5)) } getWordCloud(MeruSentimentDataFrame, MeruTweetsCleaned, MeruEmotion) getWordCloud(OlaSentimentDataFrame, OlaTweetsCleaned, OlaEmotion) getWordCloud(TaxiForSureSentimentDataFrame, TaxiForSureTweetsCleaned, TaxiForSureEmotion) getWordCloud(UberSentimentDataFrame, UberTweetsCleaned, UberEmotion) The preceding figure shows word cloud from tweets about Meru Cabs. The preceding figure shows word cloud from tweets about Ola Cabs. The preceding figure shows word cloud from tweets about TaxiForSure. The preceding figure shows word cloud from tweets about Uber India. Summary In this article, we gained knowledge of the various Twitter APIs, we discussed how to create a connection with Twitter, and we saw how to retrieve the tweets with various attributes. We saw the power of Twitter in helping us determine the customer attitude toward today's various businesses. The activity can be done on the weekly basis and one can easily get the monthly or quarterly or yearly changes in customer sentiments. This can not only help the customer decide the trending businesses, but the business itself can get a well-defined metric of its own performance. It can use such scores/graphs to improve. We also discussed various methods of sentiment analysis varying from basic word matching to the advanced Bayesian algorithms. Resources for Article: Further resources on this subject: Find Friends on Facebook [article] Supervised learning[article] Warming Up [article]
Read more
  • 0
  • 0
  • 23321

article-image-how-to-perform-numeric-metric-aggregations-with-elasticsearch
Pravin Dhandre
22 Feb 2018
7 min read
Save for later

How to perform Numeric Metric Aggregations with Elasticsearch

Pravin Dhandre
22 Feb 2018
7 min read
[box type="note" align="" class="" width=""]This article is an excerpt from the book Learning Elastic Stack 6.0 written by Pranav Shukla and Sharath Kumar M N . This book provides detailed coverage on fundamentals of each components of Elastic Stack, making it easy to search, analyze and visualize data across different sources in real-time.[/box] Today, we are going to demonstrate how to run numeric and statistical queries such as summation, average, count and various similar metric aggregations on Elastic Stack to serve a better analytics engine on your dataset. Metric aggregations   Metric aggregations work with numeric data, computing one or more aggregate metrics within the given context. The context could be a query, filter, or no query to include the whole index/type. Metric aggregations can also be nested inside other bucket aggregations. In this case, these metrics will be computed for each bucket in the bucket aggregations. We will start with simple metric aggregations without nesting them inside bucket aggregations. When we learn about bucket aggregations later in the chapter, we will also learn how to use metric aggregations inside bucket aggregations. We will learn about the following metric aggregations: Sum, average, min, and max aggregations Stats and extended stats aggregations Cardinality aggregation Let us learn about them one by one. Sum, average, min, and max aggregations Finding the sum of a field, the minimum value for a field, the maximum value for a field, or an average, are very common operations. For the people who are familiar with SQL, the query to find the sum would look like the following: SELECT sum(downloadTotal) FROM usageReport; The preceding query will calculate the sum of the downloadTotal field across all records in the table. This requires going through all records of the table or all records in the given context and adding the values of the given fields. In Elasticsearch, a similar query can be written using the sum aggregation. Let us understand the sum aggregation first. Sum aggregation Here is how to write a simple sum aggregation: GET bigginsight/_search { "aggregations": { 1 "download_sum": { 2 "sum": { 3 "field": "downloadTotal" 4 } } }, "size": 0 5 } The aggs or aggregations element at the top level should wrap any aggregation. Give a name to the aggregation; here we are doing the sum aggregation on the downloadTotal field and hence the name we chose is download_sum. You can name it anything. This field will be useful while looking up this particular aggregation's result in the response. We are doing a sum aggregation, hence the sum element. We want to do term aggregation on the downloadTotal field. Specify size = 0 to prevent raw search results from being returned. We just want aggregation results and not the search results in this case. Since we haven't specified any top level query elements, it matches all documents. We do not want any raw documents (or search hits) in the result. The response should look like the following: { "took": 92, ... "hits": { "total": 242836, 1 "max_score": 0, "hits": [] }, "aggregations": { 2 "download_sum": { 3 "value": 2197438700 4 } } } Let us understand the key aspects of the response. The key parts are numbered 1, 2, 3, and so on, and are explained in the following points: The hits.total element shows the number of documents that were considered or were in the context of the query. If there was no additional query or filter specified, it will include all documents in the type or index. Just like the request, this response is wrapped inside aggregations to indicate as Such. The response of the aggregation requested by us was named download_sum, hence we get our response from the sum aggregation inside an element with the same name. The actual value after applying the sum aggregation. The average, min, and max aggregations are very similar. Let's look at them briefly. Average aggregation The average aggregation finds an average across all documents in the querying context: GET bigginsight/_search { "aggregations": { "download_average": { 1 "avg": { 2 "field": "downloadTotal" } } }, "size": 0 } The only notable differences from the sum aggregation are as follows: We chose a different name, download_average, to make it apparent that the aggregation is trying to compute the average. The type of aggregation that we are doing is avg instead of the sum aggregation that we were doing earlier. The response structure is identical but the value field will now represent the average of the requested field. The min and max aggregations are the exactly same. Min aggregation Here is how we will find the minimum value of the downloadTotal field in the entire index/type: GET bigginsight/_search { "aggregations": { "download_min": { "min": { "field": "downloadTotal" } } }, "size": 0 } Let's finally look at max aggregation also. Max aggregation Here is how we will find the maximum value of the downloadTotal field in the entire index/type: GET bigginsight/_search { "aggregations": { "download_max": { "max": { "field": "downloadTotal" } } }, "size": 0 } These aggregations were really simple. Now let's look at some more advanced yet simple stats and extended stats aggregations. Stats and extended stats aggregations These aggregations compute some common statistics in a single request without having to issue multiple requests. This saves resources on the Elasticsearch side as well because the statistics are computed in a single pass rather than being requested multiple times. The client code also becomes simpler if you are interested in more than one of these statistics. Let's look at the stats aggregation first. Stats aggregation The stats aggregation computes the sum, average, min, max, and count of documents in a single pass: GET bigginsight/_search { "aggregations": { "download_stats": { "stats": { "field": "downloadTotal" } } }, "size": 0 } The structure of the stats request is the same as the other metric aggregations we have seen so far, so nothing special is going on here. The response should look like the following: { "took": 4, ..., "hits": { "total": 242836, "max_score": 0, "hits": [] }, "aggregations": { "download_stats": { "count": 242835, "min": 0, "max": 241213, "avg": 9049.102065188297, "sum": 2197438700 } } } As you can see, the response with the download_stats element contains count, min, max, average, and sum; everything is included in the same response. This is very handy as it reduces the overhead of multiple requests and also simplifies the client code. Let us look at the extended stats aggregation. Extended stats Aggregation The extended stats aggregation returns a few more statistics in addition to the ones returned by the stats aggregation: GET bigginsight/_search { "aggregations": { "download_estats": { "extended_stats": { "field": "downloadTotal" } } }, "size": 0 } The response looks like the following: { "took": 15, "timed_out": false, ..., "hits": { "total": 242836, "max_score": 0, "hits": [] }, "aggregations": { "download_estats": { "count": 242835, "min": 0, "max": 241213, "avg": 9049.102065188297, "sum": 2197438700, "sum_of_squares": 133545882701698, "variance": 468058704.9782911, "std_deviation": 21634.664429528162, "std_deviation_bounds": { "upper": 52318.43092424462, "lower": -34220.22679386803 } } } } It also returns the sum of squares, variance, standard deviation, and standard deviation Bounds. Cardinality aggregation Finding the count of unique elements can be done with the cardinality aggregation. It is similar to finding the result of a query such as the following: select count(*) from (select distinct username from usageReport) u; Finding the cardinality or the number of unique values for a specific field is a very common requirement. If you have click-stream from the different visitors on your website, you may want to find out how many unique visitors you got in a given day, week, or month. Let us understand how we find out the count of unique users for which we have network traffic data: GET bigginsight/_search { "aggregations": { "unique_visitors": { "cardinality": { "field": "username" } } }, "size": 0 } The cardinality aggregation response is just like the other metric aggregations: { "took": 110, ..., "hits": { "total": 242836, "max_score": 0, "hits": [] }, "aggregations": { "unique_visitors": { "value": 79 } } } To summarize, we learned how to perform numerous metric aggregations on numeric datasets and easily deploy elasticsearch in building powerful analytics application. If you found this tutorial useful, do check out the book Learning Elastic Stack 6.0 to examine the fundamentals of Elastic Stack in detail and start developing solutions for problems like logging, site search, app search, metrics and more.      
Read more
  • 0
  • 0
  • 23309
article-image-tensorflow-models-mobile-embedded-devices
Savia Lobo
15 May 2018
12 min read
Save for later

How to Build TensorFlow Models for Mobile and Embedded devices

Savia Lobo
15 May 2018
12 min read
TensorFlow models can be used in applications running on mobile and embedded platforms. TensorFlow Lite and TensorFlow Mobile are two flavors of TensorFlow for resource-constrained mobile devices. TensorFlow Lite supports a subset of the functionality compared to TensorFlow Mobile. It results in better performance due to smaller binary size with fewer dependencies. The article covers topics for training a model to integrate TensorFlow into an application. The model can then be saved and used for inference and prediction in the mobile application. [box type="note" align="" class="" width=""]This article is an excerpt from the book Mastering TensorFlow 1.x written by Armando Fandango. This book will help you leverage the power of TensorFlow and Keras to build deep learning models, using concepts such as transfer learning, generative adversarial networks, and deep reinforcement learning.[/box] To learn how to use TensorFlow models on mobile devices, following topics are covered: TensorFlow on mobile platforms TF Mobile in Android apps TF Mobile demo on Android TF Mobile demo on iOS TensorFlow Lite TF Lite demo on Android TF Lite demo on iOS TensorFlow on mobile platforms TensorFlow can be integrated into mobile apps for many use cases that involve one or more of the following machine learning tasks: Speech recognition Image recognition Gesture recognition Optical character recognition Image or text classification Image, text, or speech synthesis Object identification To run TensorFlow on mobile apps, we need two major ingredients: A trained and saved model that can be used for predictions A TensorFlow binary that can receive the inputs, apply the model, produce the predictions, and send the predictions as output The high-level architecture looks like the following figure: The mobile application code sends the inputs to the TensorFlow binary, which uses the trained model to compute predictions and send the predictions back. TF Mobile in Android apps The TensorFlow ecosystem enables it to be used in Android apps through the interface class  TensorFlowInferenceInterface, and the TensorFlow Java API in the jar file libandroid_tensorflow_inference_java.jar. You can either use the jar file from the JCenter, download a precompiled jar from ci.tensorflow.org, or build it yourself. The inference interface has been made available as a JCenter package and can be included in the Android project by adding the following code to the build.gradle file: allprojects  { repositories  { jcenter() } } dependencies  { compile  'org.tensorflow:tensorflow-android:+' } Note : Instead of using the pre-built binaries from the JCenter, you can also build them yourself using Bazel or Cmake by following the instructions at this link: https://github.com/tensorflow/tensorflow/blob/r1.4/ tensorflow/contrib/android/README.md Once the TF library is configured in your Android project, you can call the TF model with the following four steps:  Load the model: TensorFlowInferenceInterface  inferenceInterface  = new  TensorFlowInferenceInterface(assetManager,  modelFilename);  Send the input data to the TensorFlow binary: inferenceInterface.feed(inputName, floatValues,  1,  inputSize,  inputSize,  3);  Run the prediction or inference: inferenceInterface.run(outputNames,  logStats);  Receive the output from the TensorFlow binary: inferenceInterface.fetch(outputName,  outputs); TF Mobile demo on Android In this section, we shall learn about recreating the Android demo app provided by the TensorFlow team in their official repo. The Android demo will install the following four apps on your Android device: TF  Classify: This is an object identification app that identifies the images in the input from the device camera and classifies them in one of the pre-defined classes. It does not learn new types of pictures but tries to classify them into one of the categories that it has already learned. The app is built using the inception model pre-trained by Google. TF  Detect: This is an object detection app that detects multiple objects in the input from the device camera. It continues to identify the objects as you move the camera around in continuous picture feed mode. TF  Stylize: This is a style transfer app that transfers one of the selected predefined styles to the input from the device camera. TF  Speech: This is a speech recognition app that identifies your speech and if it matches one of the predefined commands in the app, then it highlights that specific command on the device screen. Note: The sample demo only works for Android devices with an API level greater than 21 and the device must have a modern camera that supports FOCUS_MODE_CONTINUOUS_PICTURE. If your device camera does not have this feature supported, then you have to add the path submitted to TensorFlow by the author: https://github.com/ tensorflow/tensorflow/pull/15489/files. The easiest way to build and deploy the demo app on your device is using Android Studio. To build it this way, follow these steps:  Install Android Studio. We installed Android Studio on Ubuntu 16.04 from the instructions at the following link: https://developer.android.com/studio/ install.html  Check out the TensorFlow repository, and apply the patch mentioned in the previous tip. Let's assume you checked out the code in the tensorflow folder in your home directory.  Using Android Studio, open the Android project in the path ~/tensorflow/tensorflow/examples/Android.     Your screen will look similar to this:  Expand the Gradle Scripts option from the left bar and then open the  build.gradle file.  In the build.gradle file, locate the def  nativeBuildSystem definition and set it to 'none'. In the version of  the code we checked out, this definition is at line 43: def  nativeBuildSystem  =  'none'  Build the demo and run it on either a real or simulated device. We tested the app on these devices: 7.  You can also build the apk and install the apk file on the virtual or actual connected device. Once the app installs on the device, you will see the four apps we discussed earlier: You can also build the whole demo app from the source using Bazel or Cmake by following the instructions at this link: https://github.com/tensorflow/tensorflow/tree/r1.4/tensorflow/examples/android TF Mobile in iOS apps TensorFlow enables support for iOS apps by following these steps:  Include TF Mobile in your app by adding a file named Profile in the root directory of your project. Add the following content to the Profile: target  'Name-Of-Your-Project' pod  'TensorFlow-experimental'  Run the pod  install command to download and install the TensorFlow Experimental pod.  Run the myproject.xcworkspace command to open the workspace so you can add the      prediction code to your application logic. Note: To create your own TensorFlow binaries for iOS projects, follow the instructions at this link: https://github.com/tensorflow/tensorflow/ tree/master/tensorflow/examples/ios Once the TF library is configured in your iOS project, you can call the TF model with the following four steps:  Load the model: PortableReadFileToProto(file_path,  &tensorflow_graph);  Create a session: tensorflow::Status  s  =  session->Create(tensorflow_graph);  Run the prediction or inference and get the outputs: std::string  input_layer  =  "input"; std::string  output_layer  =  "output"; std::vector<tensorflow::Tensor>  outputs; tensorflow::Status  run_status  =  session->Run( {{input_layer,  image_tensor}}, {output_layer},  {},  &outputs);  Fetch the output data: tensorflow::Tensor*  output  =  &outputs[0]; TF Mobile demo on iOS In order to build the demo on iOS, you need Xcode 7.3 or later. Follow these steps to build the iOS demo apps:  Check out the TensorFlow code in a tensorflow folder in your home directory.  Open a terminal window and execute the following commands from your home folder to download the Inception V1 model, extract the label and graph files, and move these files into the data folders inside the sample app code: $ mkdir -p ~/Downloads $ curl -o ~/Downloads/inception5h.zip https://storage.googleapis.com/download.tensorflow.org/models/incep tion5h.zip && unzip ~/Downloads/inception5h.zip -d ~/Downloads/inception5h $ cp ~/Downloads/inception5h/* ~/tensorflow/tensorflow/examples/ios/benchmark/data/ $ cp ~/Downloads/inception5h/* ~/tensorflow/tensorflow/examples/ios/camera/data/ $ cp ~/Downloads/inception5h/* ~/tensorflow/tensorflow/examples/ios/simple/data/  Navigate to one of the sample folders and download the experimental pod: $ cd ~/tensorflow/tensorflow/examples/ios/camera $ pod install  Open the Xcode workspace: $ open tf_simple_example.xcworkspace  Run the sample app in the device simulator. The sample app will appear with a Run Model button. The camera app requires an Apple device to be connected, while the other two can run in a simulator too. TensorFlow Lite TF Lite is the new kid on the block and still in the developer view at the time of writing this book. TF Lite is a very small subset of TensorFlow Mobile and TensorFlow, so the binaries compiled with TF Lite are very small in size and deliver superior performance. Apart from reducing the size of binaries, TensorFlow employs various other techniques, such as: The kernels are optimized for various device and mobile architectures The values used in the computations are quantized The activation functions are pre-fused It leverages specialized machine learning software or hardware available on the device, such as the Android NN API The workflow for using the models in TF Lite is as follows:  Get the model: You can train your own model or pick a pre-trained model available from different sources, and use the pre-trained as is or retrain it with your own data, or retrain after modifying some parts of the model. As long as you have a trained model in the file with an extension .pb or .pbtxt, you are good to proceed to the next step. We learned how to save the models in the previous chapters.  Checkpoint the model: The model file only contains the structure of the graph, so you need to save the checkpoint file. The checkpoint file contains the serialized variables of the model, such as weights and biases. We learned how to save a checkpoint in the previous chapters.  Freeze the model: The checkpoint and the model files are merged, also known as freezing the graph. TensorFlow provides the freeze_graph tool for this step, which can be executed as follows: $ freeze_graph --input_graph=mymodel.pb --input_checkpoint=mycheckpoint.ckpt --input_binary=true --output_graph=frozen_model.pb --output_node_name=mymodel_nodes  Convert the model: The frozen model from step 3 needs to be converted to TF Lite format with the toco tool provided by TensorFlow: $ toco --input_file=frozen_model.pb --input_format=TENSORFLOW_GRAPHDEF --output_format=TFLITE --input_type=FLOAT --input_arrays=input_nodes --output_arrays=mymodel_nodes --input_shapes=n,h,w,c  The .tflite model saved in step 4 can now be used inside an Android or iOS app that employs the TFLite binary for inference. The process of including the TFLite binary in your app is continuously evolving, so we recommend the reader follows the information at this link to include the TFLite binary in your Android or iOS app: https://github.com/tensorflow/tensorflow/tree/master/ tensorflow/contrib/lite/g3doc Generally, you would use the graph_transforms:summarize_graph tool to prune the model obtained in step 1. The pruned model will only have the paths that lead from input to output at the time of inference or prediction. Any other nodes and paths that are required only for training or for debugging purposes, such as saving checkpoints, are removed, thus making the size of the final model very small. The official TensorFlow repository comes with a TF Lite demo that uses a pre-trained mobilenet to classify the input from the device camera in the 1001 categories. The demo app displays the probabilities of the top three categories. TF Lite Demo on Android To build a TF Lite demo on Android, follow these steps: Install Android Studio. We installed Android Studio on Ubuntu 16.04 from the instructions at the following link: https://developer.android.com/studio/ install.html Check out the TensorFlow repository, and apply the patch mentioned in the previous tip. Let's assume you checked out the code in the tensorflow folder in your home directory. Using Android Studio, open the Android project from the path ~/tensorflow/tensorflow/contrib/lite/java/demo. If it complains about a missing SDK or Gradle components, please install those components and sync Gradle. Build the project and run it on a virtual device with API > 21. We received the following warnings, but the build succeeded. You may want to resolve the warnings if the build fails: Warning:The  Jack  toolchain  is  deprecated  and  will  not run.  To  enable  support  for  Java  8 language  features  built into  the  plugin,  remove  'jackOptions  {  ...  }'  from  your build.gradle  file, and  add android.compileOptions.sourceCompatibility  1.8 android.compileOptions.targetCompatibility  1.8 Note:  Future  versions  of  the  plugin  will  not  support  usage 'jackOptions'  in  build.gradle. To learn  more,  go  to https://d.android.com/r/tools/java-8-support-message.html Warning:The  specified  Android  SDK  Build  Tools  version (26.0.1)  is  ignored,  as  it  is  below  the minimum  supported version  (26.0.2)  for  Android  Gradle  Plugin  3.0.1. Android  SDK  Build  Tools 26.0.2  will  be  used. To  suppress  this  warning,  remove  "buildToolsVersion '26.0.1'"  from  your  build.gradle  file,  as  each  version  of the  Android  Gradle  Plugin  now  has  a  default  version  of the  build  tools. TF Lite demo on iOS In order to build the demo on iOS, you need Xcode 7.3 or later. Follow these steps to build the iOS demo apps:  Check out the TensorFlow code in a  tensorflow folder in your home directory.  Build the TF Lite binary for iOS from the instructions at this link: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/lite  Navigate to the sample folder and download the pod: $ cd ~/tensorflow/tensorflow/contrib/lite/examples/ios/camera $ pod install  Open the Xcode workspace: $ open tflite_camera_example.xcworkspace  Run the sample app in the device simulator. We learned about using TensorFlow models on mobile applications and devices. TensorFlow provides two ways to run on mobile devices: TF Mobile and TF Lite. We learned how to build TF Mobile and TF Lite apps for iOs and Android. We used TensorFlow demo apps as an example.   If you found this post useful, do check out the book Mastering TensorFlow 1.x  to skill up for building smarter, faster, and efficient machine learning and deep learning systems. The 5 biggest announcements from TensorFlow Developer Summit 2018 Getting started with Q-learning using TensorFlow Implement Long-short Term Memory (LSTM) with TensorFlow  
Read more
  • 0
  • 0
  • 23157

article-image-microsoft-open-sources-infer-net-its-popular-model-based-machine-learning-framework
Melisha Dsouza
08 Oct 2018
3 min read
Save for later

Microsoft open sources Infer.NET, it’s popular model-based machine learning framework

Melisha Dsouza
08 Oct 2018
3 min read
Last week, Microsoft open sourced Infer.NET, the cross-platform framework used for model-based machine learning. This popular machine learning engine used in Office, Xbox and Azure, will be available on GitHub under the permissive MIT license for free use in commercial applications. Features of  Infer.NET The team at Microsoft Research in Cambridge initially envisioned Infer.NET as a research tool and released it for academic use in 2008. The framework has served as a base to publish hundreds of papers across a variety of fields, including information retrieval and healthcare. The team then started using the framework as a machine learning engine within a wide range of Microsoft products. A model-based approach to machine learning Infer.NET allows users to incorporate domain knowledge into their model. The framework can be used to build bespoke machine learning algorithms directly from their model. To sum it up, this framework actually constructs a learning algorithm for users based on the model they have provided. Facilitates interpretability Infer.NET also facilitates interpretability. If users have designed the model themselves and the learning algorithm follows that model, they can understand why the system behaves in a particular way or makes certain predictions. Probabilistic Approach In Infer.NET, models are described using a probabilistic program. This is used to describe real-world processes in a language that machines understand. Infer.NET compiles the probabilistic program into high-performance code for implementing something cryptically called deterministic approximate Bayesian inference. This approach allows a notable amount of scalability. For instance, it can be used in a system that automatically extracts knowledge from billions of web pages, comprising petabytes of data. Additional Features The framework also supports the ability of the system to learn as new data arrives. The team is also working towards developing and growing it further. Infer.NET will become a part of ML.NET (the machine learning framework for .NET developers). They have already set up the repository under the .NET Foundation and moved the package and namespaces to Microsoft.ML.Probabilistic.  Being cross platform, Infer.NET supports .NET Framework 4.6.1, .NET Core 2.0, and Mono 5.0. Windows users get to use Visual Studio 2017, while macOS and Linux folks have command-line options, which could be incorporated into the code wrangler of their choice. Download the framework to learn more about Infer.NET. You can also check the documentation for a detailed User Guide. To know more about this news, head over to Microsoft’s official blog. Microsoft announces new Surface devices to enhance user productivity, with style and elegance Neural Network Intelligence: Microsoft’s open source automated machine learning toolkit Microsoft’s new neural text-to-speech service lets machines speak like people
Read more
  • 0
  • 0
  • 23092

article-image-how-pytorch-is-bridging-the-gap-between-research-and-production-at-facebook-pytorch-team-at-f8-conference
Vincy Davis
04 Dec 2019
7 min read
Save for later

How PyTorch is bridging the gap between research and production at Facebook: PyTorch team at F8 conference

Vincy Davis
04 Dec 2019
7 min read
PyTorch, the machine learning library which was originally developed as a research framework by a Facebook intern in 2017, has now grown into a popular deep learning workflow. One of the most loved products by Facebook, PyTorch is free, open source, and used for applications like computer vision and natural language processing (NLP).  At the F8 conference held this year, the PyTorch team consisting of Joe Spisak, the project manager for PyTorch at Facebook AI and Dmytro Dzhulgakov, the tech lead at Facebook AI gave a talk on how Facebook is developing and scaling AI experiences with PyTorch.  Spisak describes PyTorch as an eager and graph-based execution that is defined by ‘run’. This means that when a user executes a Python code, it generates a graph on the fly. It is dynamic in nature and allows the compilation of the static graph. The dynamic neural networks are accessible, thus, allowing the user to change the parameters very quickly. This feature comes in handy for applications like control flow in NLP. Another important feature of PyTorch, according to Spisak, is the ability to generate accurately distributed training models that possess close to billion parameters, including the cutting-edge ones. It also has a simple and easy API that is very intuitive by nature. This is one of the qualities of PyTorch which has endeared many developers, claims Spisak.  Become a pro at Deep Learning with PyTorch! If you want to become an expert in building and training neural network models with high speed and flexibility in text, vision, and advanced analytics using PyTorch 1.x, read our book Deep Learning with PyTorch 1.x - Second Edition written by Sri. Yogesh K., Laura Mitchell, et al.  It will give you an insight into solving real-world problems using CNNs, RNNs, and LSTMs, along with discovering state-of-the-art modern deep learning architectures, such as ResNet, DenseNet, and Inception. How PyTorch is bridging the gap between research and production at Facebook Dzhulgakov points out how general advances in AI are driven by innovative research in the fields of academia or industry and why it’s necessary to bridge this big lag between research and production. He says, “If you have a new idea and you want to take it all the way through to deployment, you usually need to go through multiple steps - figure out what the approach is and then find the training data maybe prepare massage it a little bit. Actually, build and train your model after that and then there is this painful step of transferring your model to a production environment which often historically involved reimplementation of a lot of code so you can actually take and deploy it and scale-up.”  According to Dzhulgakov, PyTorch is trying to minimize this big gap by encouraging advances and experimentations in the field, so that the research is brought into production in a few days, instead of months. Challenges in bringing research to production Following are the various classes of challenges associated with bringing research to production, according to the PyTorch team. Hardware efficiency: In case of a tight latency constraint environment, users are required to fit all the hardware into the performance budget. On the other hand, an underused hardware environment can lead to an increase in cost. Scalability: In Facebook’s recent work, Dzhulgakov says, they have trained on billions of public images, thus indicating significant accuracy gains as compared to regular datasets like imageNet. Similarly, when models are taken to inference, it means that billions of inferences per second are running with multiple diverse models sharing the same hardware. Cross-platform: Neural networks are mostly not isolated as they need to be deployed inside their target application. It has a lot of interdependence with the surrounding code and application, thus posing different constraints like the user will not be able to run Python code or the user will have to work on very constrained computer capabilities if running a mobile device, and more. Reliability: A lot of PyTorch jobs run for multiple weeks on hundreds of GPUs, hence it is important to design a reliable software which can tolerate hardware failures and deliver results.  How PyTorch is tackling these challenges In order to tackle the above-listed challenges, Dzhulgakov says Facebook develops systems that can take up a training job and perform optimizations focused on performance for the performance-critical pieces. The system also applies “recipes for reliability” so that the developer written modeling code is automatically transformed. The Jit package comes into the picture here and acts like a key factor that is built to capture the structure of the Python program with minimal changes. The main goal of the Jit package is to make this process almost seamless. He asserts that PyTorch has been successful since it feels like regular programming in Python and most of its users start developing in traditional PyTorch mode (eager mode) just by writing and prototyping in the program. “For the subset of promising models which shall show what results you need to bring to production either scale up, so you can apply techniques provided by Jit to exist mental codes and annotated in order to run in so-called script code.”   The Jit is like a subset of Python with a thread list of request semantics, which allows the user to apply transparent transformations for the eager mode to the user. The annotations include adding a few lines of Python code on top of the function in such a way that it can be done incrementally on function by function or module by module fashion. This hybrid fashion ensures that the model works along the way. Such powerful PyTorch tools permit the user to share the same code base between research and production environments.  Next, Dzhulgakov deduces that the common factor between research and production is that both teams of developers work on the same code base built on top of PyTorch. Thus, they share the codes among the teams that have a common domain like text classification or object detection or reinforcement learning. These developers prototype models, train new algorithms and address new tasks for quickly transitioning this functionality to the opposite environment. Watch the full talk to see Dzhulgakov’s examples of PyTorch bridging the gap between research and production at Facebook. If you want to become an expert at implementing deep learning applications in PyTorch, check out our latest book Deep Learning with PyTorch 1.x - Second Edition written by Sri. Yogesh K., Laura Mitchell, and Et al. This book will show you how to apply neural networks to domains such as computer vision and NLP. It will also guide you to build, train, and scale a model with PyTorch and cover complex neural networks such as GANs and autoencoders for producing text and images. NVIDIA releases Kaolin, a PyTorch library to accelerate research in 3D computer vision and AI Introducing ESPRESSO, an open-source, PyTorch based, end-to-end neural automatic speech recognition (ASR) toolkit for distributed training across GPUs Facebook releases PyTorch 1.3 with named tensors, PyTorch Mobile, 8-bit model quantization, and more Transformers 2.0: NLP library with deep interoperability between TensorFlow 2.0 and PyTorch, and 32+ pretrained models in 100+ languages PyTorch announces the availability of PyTorch Hub for improving machine learning research reproducibility
Read more
  • 0
  • 0
  • 23083
article-image-dynamodb-best-practices
Packt
15 Sep 2015
24 min read
Save for later

DynamoDB Best Practices

Packt
15 Sep 2015
24 min read
 In this article by Tanmay Deshpande, the author of the book DynamoDB Cookbook, we will cover the following topics: Using a standalone cache for frequently accessed items Using the AWS ElastiCache for frequently accessed items Compressing large data before storing it in DynamoDB Using AWS S3 for storing large items Catching DynamoDB errors Performing auto-retries on DynamoDB errors Performing atomic transactions on DynamoDB tables Performing asynchronous requests to DynamoDB (For more resources related to this topic, see here.) Introduction We are going to talk about DynamoDB implementation best practices, which will help you improve the performance while reducing the operation cost. So let's get started. Using a standalone cache for frequently accessed items In this recipe, we will see how to use a standalone cache for frequently accessed items. Cache is a temporary data store, which will save the items in memory and will provide those from the memory itself instead of making a DynamoDB call. Make a note that this should be used for items, which you expect to not be changed frequently. Getting ready We will perform this recipe using Java libraries. So the prerequisite is that you should have performed recipes, which use the AWS SDK for Java. How to do it… Here, we will be using the AWS SDK for Java, so create a Maven project with the SDK dependency. Apart from the SDK, we will also be using one of the most widely used open source caches, that is, EhCache. To know about EhCache, refer to http://ehcache.org/. Let's use a standalone cache for frequently accessed items: To use EhCache, we need to include the following repository in pom.xml: <repositories> <repository> <id>sourceforge</id> <name>sourceforge</name> <url>https://oss.sonatype.org/content/repositories/ sourceforge-releases/</url> </repository> </repositories> We will also need to add the following dependency: <dependency> <groupId>net.sf.ehcache</groupId> <artifactId>ehcache</artifactId> <version>2.9.0</version> </dependency> Once the project setup is done, we will create a cachemanager class, which will be used in the following code: public class ProductCacheManager { // Ehcache cache manager CacheManager cacheManager = CacheManager.getInstance(); private Cache productCache; public Cache getProductCache() { return productCache; } //Create an instance of cache using cache manager public ProductCacheManager() { cacheManager.addCache("productCache"); this.productCache = cacheManager.getCache("productCache"); } public void shutdown() { cacheManager.shutdown(); } } Now, we will create another class where we will write a code to get the item from DynamoDB. Here, we will first initiate the ProductCacheManager: static ProductCacheManager cacheManager = new ProductCacheManager(); Next, we will write a method to get the item from DynamoDB. Before we fetch the data from DynamoDB, we will first check whether the item with the given key is available in cache. If it is available in cache, we will return it from cache itself. If the item is not found in cache, we will first fetch it from DynamoDB and immediately put it into cache. Once the item is cached, every time we need this item, we will get it from cache, unless the cached item is evicted: private static Item getItem(int id, String type) { Item product = null; if (cacheManager.getProductCache().isKeyInCache(id + ":" + type)) { Element prod = cacheManager.getProductCache().get(id + ":" + type); product = (Item) prod.getObjectValue(); System.out.println("Returning from Cache"); } else { AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); DynamoDB dynamoDB = new DynamoDB(client); Table table = dynamoDB.getTable("product"); product = table.getItem(new PrimaryKey("id", id, "type", type)); cacheManager.getProductCache().put( new Element(id + ":" + type, product)); System.out.println("Making DynamoDB Call for getting the item"); } return product; } Now we can use this method whenever needed. Here is how we can test it: Item product = getItem(10, "book"); System.out.println("First call :Item: " + product); Item product1 = getItem(10, "book"); System.out.println("Second call :Item: " + product1); cacheManager.shutdown(); How it works… EhCache is one of the most popular standalone caches used in the industry. Here, we are using EhCache to store frequently accessed items from the product table. Cache keeps all its data in memory. Here, we will save every item against its keys that are cached. We have the product table, which has the composite hash and range keys, so we will also store the items against the key of (Hash Key and Range Key). Note that caching should be used for only those tables that expect lesser updates. It should only be used for the table, which holds static data. If at all anyone uses cache for not so static tables, then you will get stale data. You can also go to the next level and implement a time-based cache, which holds the data for a certain time, and after that, it clears the cache. We can also implement algorithms, such as Least Recently Used (LRU), First In First Out (FIFO), to make the cache more efficient. Here, we will make comparatively lesser calls to DynamoDB, and ultimately, save some cost for ourselves. Using AWS ElastiCache for frequently accessed items In this recipe, we will do the same thing that we did in the previous recipe. The only thing we will change is that we will use a cloud hosted distributed caching solution instead of saving it on the local standalone cache. ElastiCache is a hosted caching solution provided by Amazon Web Services. We have two options to select which caching technology you would need. One option is Memcached and another option is Redis. Depending upon your requirements, you can decide which one to use. Here are links that will help you with more information on the two options: http://memcached.org/ http://redis.io/ Getting ready To get started with this recipe, we will need to have an ElastiCache cluster launched. If you are not aware of how to do it, you can refer to http://aws.amazon.com/elasticache/. How to do it… Here, I am using the Memcached cluster. You can choose the size of the instance as you wish. We will need a Memcached client to access the cluster. Amazon has provided a compiled version of the Memcached client, which can be downloaded from https://github.com/amazonwebservices/aws-elasticache-cluster-client-memcached-for-java. Once the JAR download is complete, you can add it to your Java Project class path: To start with, we will need to get the configuration endpoint of the Memcached cluster that we launched. This configuration endpoint can be found on the AWS ElastiCache console itself. Here is how we can save the configuration endpoint and port: static String configEndpoint = "my-elastic- cache.mlvymb.cfg.usw2.cache.amazonaws.com"; static Integer clusterPort = 11211; Similarly, we can instantiate the Memcached client: static MemcachedClient client; static { try { client = new MemcachedClient(new InetSocketAddress(configEndpoint, clusterPort)); } catch (IOException e) { e.printStackTrace(); } } Now, we can write the getItem method as we did for the previous recipe. Here, we will first check whether the item is present in cache; if not, we will fetch it from DynamoDB, and put it into cache. If the same request comes the next time, we will return it from the cache itself. While putting the item into cache, we are also going to put the expiry time of the item. We are going to set it to 3,600 seconds; that is, after 1 hour, the key entry will be deleted automatically: private static Item getItem(int id, String type) { Item product = null; if (null != client.get(id + ":" + type)) { System.out.println("Returning from Cache"); return (Item) client.get(id + ":" + type); } else { AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); DynamoDB dynamoDB = new DynamoDB(client); Table table = dynamoDB.getTable("product"); product = table.getItem(new PrimaryKey("id", id, "type", type)); System.out.println("Making DynamoDB Call for getting the item"); ElasticCache.client.add(id + ":" + type, 3600, product); } return product; } How it works… A distributed cache also works in the same fashion as the local one works. A standalone cache keeps the data in memory and returns it if it finds the key. In distributed cache, we have multiple nodes; here, keys are kept in a distributed manner. The distributed nature helps you divide the keys based on the hash value of the keys. So, when any request comes, it is redirected to a specified node and the value is returned from there. Note that ElastiCache will help you provide a faster retrieval of items at the additional cost of the ElastiCache cluster. Also note that the preceding code will work if you execute the application from the EC2 instance only. If you try to execute this on the local machine, you will get connection errors. Compressing large data before storing it in DynamoDB We are all aware of DynamoDB's storage limitations for the item's size. Suppose that we get into a situation where storing large attributes in an item is a must. In that case, it's always a good choice to compress these attributes, and then save them in DynamoDB. In this recipe, we are going to see how to compress large items before storing them. Getting ready To get started with this recipe, you should have your workstation ready with Eclipse or any other IDE of your choice. How to do it… There are numerous algorithms with which we can compress the large items, for example, GZIP, LZO, BZ2, and so on. Each algorithm has a trade-off between the compression time and rate. So, it's your choice whether to go with a faster algorithm or with an algorithm, which provides a higher compression rate. Consider a scenario in our e-commerce website, where we need to save the product reviews written by various users. For this, we created a ProductReviews table, where we will save the reviewer's name, its detailed product review, and the time when the review was submitted. Here, there are chances that the product review messages can be large, and it would not be a good idea to store them as they are. So, it is important to understand how to compress these messages before storing them. Let's see how to compress large data: First of all, we will write a method that accepts the string input and returns the compressed byte buffer. Here, we are using the GZIP algorithm for compressions. Java has a built-in support, so we don't need to use any third-party library for this: private static ByteBuffer compressString(String input) throws UnsupportedEncodingException, IOException { // Write the input as GZIP output stream using UTF-8 encoding ByteArrayOutputStream baos = new ByteArrayOutputStream(); GZIPOutputStream os = new GZIPOutputStream(baos); os.write(input.getBytes("UTF-8")); os.finish(); byte[] compressedBytes = baos.toByteArray(); // Writing bytes to byte buffer ByteBuffer buffer = ByteBuffer.allocate(compressedBytes.length); buffer.put(compressedBytes, 0, compressedBytes.length); buffer.position(0); return buffer; } Now, we can simply use this method to store the data before saving it in DynamoDB. Here is an example of how to use this method in our code: private static void putReviewItem() throws UnsupportedEncodingException, IOException { AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); DynamoDB dynamoDB = new DynamoDB(client); Table table = dynamoDB.getTable("ProductReviews"); Item product = new Item() .withPrimaryKey(new PrimaryKey("id", 10)) .withString("reviewerName", "John White") .withString("dateTime", "20-06-2015T08:09:30") .withBinary("reviewMessage", compressString("My Review Message")); PutItemOutcome outcome = table.putItem(product); System.out.println(outcome.getPutItemResult()); } In a similar way, we can write a method that decompresses the data on retrieval from DynamoDB. Here is an example: private static String uncompressString(ByteBuffer input) throws IOException { byte[] bytes = input.array(); ByteArrayInputStream bais = new ByteArrayInputStream(bytes); ByteArrayOutputStream baos = new ByteArrayOutputStream(); GZIPInputStream is = new GZIPInputStream(bais); int chunkSize = 1024; byte[] buffer = new byte[chunkSize]; int length = 0; while ((length = is.read(buffer, 0, chunkSize)) != -1) { baos.write(buffer, 0, length); } return new String(baos.toByteArray(), "UTF-8"); } How it works… Compressing data at client side has numerous advantages. Lesser size means lesser use of network and disk resources. Compression algorithms generally maintain a dictionary of words. While compressing, if they see the words getting repeated, then those words are replaced by their positions in the dictionary. In this way, the redundant data is eliminated and only their references are kept in the compressed string. While uncompressing the same data, the word references are replaced with the actual words, and we get our normal string back. Various compression algorithms contain various compression techniques. Therefore, the compression algorithm you choose will depend on your need. Using AWS S3 for storing large items Sometimes, we might get into a situation where storing data in a compressed format might not be sufficient enough. Consider a case where we might need to store large images or binaries that might exceed the DynamoDB's storage limitation per items. In this case, we can use AWS S3 to store such items and only save the S3 location in our DynamoDB table. AWS S3: Simple Storage Service allows us to store data in a cheaper and efficient manner. To know more about AWS S3, you can visit http://aws.amazon.com/s3/. Getting ready To get started with this recipe, you should have your workstation ready with the Eclipse IDE. How to do it… Consider a case in our e-commerce website where we would like to store the product images along with the product data. So, we will save the images on AWS S3, and only store their locations along with the product information in the product table: First of all, we will see how to store data in AWS S3. For this, we need to go to the AWS console, and create an S3 bucket. Here, I created a bucket called e-commerce-product-images, and inside this bucket, I created folders to store the images. For example, /phone/apple/iphone6. Now, let's write the code to upload the images to S3: private static void uploadFileToS3() { String bucketName = "e-commerce-product-images"; String keyName = "phone/apple/iphone6/iphone.jpg"; String uploadFileName = "C:\tmp\iphone.jpg"; // Create an instance of S3 client AmazonS3 s3client = new AmazonS3Client(new ProfileCredentialsProvider()); // Start the file uploading File file = new File(uploadFileName); s3client.putObject(new PutObjectRequest(bucketName, keyName, file)); } Once the file is uploaded, you can save its path in one of the attributes of the product table, as follows: private static void putItemWithS3Link() { AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); DynamoDB dynamoDB = new DynamoDB(client); Table table = dynamoDB.getTable("productTable"); Map<String, String> features = new HashMap<String, String>(); features.put("camera", "13MP"); features.put("intMem", "16GB"); features.put("processor", "Dual-Core 1.4 GHz Cyclone (ARM v8-based)"); Set<String> imagesSet = new HashSet<String>(); imagesSet.add("https://s3-us-west-2.amazonaws.com/ e-commerce-product-images/phone/apple/iphone6/iphone.jpg"); Item product = new Item() .withPrimaryKey(new PrimaryKey("id", 250, "type", "phone")) .withString("mnfr", "Apple").withNumber("stock", 15) .withString("name", "iPhone 6").withNumber("price", 45) .withMap("features", features) .withStringSet("productImages", imagesSet); PutItemOutcome outcome = table.putItem(product); System.out.println(outcome.getPutItemResult()); } So whenever required, we can fetch the item by its key, and fetch the actual images from S3 using the URL saved in the productImages attribute. How it works… AWS S3 provides storage services at very cheaper rates. It's like a flat data dumping ground where we can store any type of file. So, it's always a good option to store large datasets in S3 and only keep its URL references in DynamoDB attributes. The URL reference will be the connecting link between the DynamoDB item and the S3 file. If your file is too large to be sent in one S3 client call, you may want to explore its multipart API, which allows you to send the file in chunks. Catching DynamoDB errors Till now, we discussed how to perform various operations in DynamoDB. We saw how to use AWS provided by SDK and play around with DynamoDB items and attributes. Amazon claims that AWS provides high availability and reliability, which is quite true considering the years of experience I have been using their services, but we still cannot deny the possibility where services such as DynamoDB might not perform as expected. So, it's important to make sure that we have a proper error catching mechanism to ensure that the disaster recovery system is in place. In this recipe, we are going to see how to catch such errors. Getting ready To get started with this recipe, you should have your workstation ready with the Eclipse IDE. How to do it… Catching errors in DynamoDB is quite easy. Whenever we perform any operations, we need to put them in the try block. Along with it, we need to put a couple of catch blocks in order to catch the errors. Here, we will consider a simple operation to put an item into the DynamoDB table: try { AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); DynamoDB dynamoDB = new DynamoDB(client); Table table = dynamoDB.getTable("productTable"); Item product = new Item() .withPrimaryKey(new PrimaryKey("id", 10, "type", "mobile")) .withString("mnfr", "Samsung").withNumber("stock", 15) .withBoolean("isProductionStopped", true) .withNumber("price", 45); PutItemOutcome outcome = table.putItem(product); System.out.println(outcome.getPutItemResult()); } catch (AmazonServiceException ase) { System.out.println("Error Message: " + ase.getMessage()); System.out.println("HTTP Status Code: " + ase.getStatusCode()); System.out.println("AWS Error Code: " + ase.getErrorCode()); System.out.println("Error Type: " + ase.getErrorType()); System.out.println("Request ID: " + ase.getRequestId()); } catch (AmazonClientException e) { System.out.println("Amazon Client Exception :" + e.getMessage()); } We should first catch AmazonServiceException, which arrives if the service you are trying to access throws any exception. AmazonClientException should be put last in order to catch any client-related exceptions. How it works… Amazon assigns a unique request ID for each and every request that it receives. Keeping this request ID is very important if something goes wrong, and if you would like to know what happened, then this request ID is the only source of information. We need to contact Amazon to know more about the request ID. There are two types of errors in AWS: Client errors: These errors normally occur when the request we submit is incorrect. The client errors are normally shown with a status code starting with 4XX. These errors normally occur when there is an authentication failure, bad requests, missing required attributes, or for exceeding the provisioned throughput. These errors normally occur when users provide invalid inputs. Server errors: These errors occur when there is something wrong from Amazon's side and they occur at runtime. The only way to handle such errors is retries; and if it does not succeed, you should log the request ID, and then you can reach the Amazon support with that ID to know more about the details. You can read more about DynamoDB specific errors at http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ErrorHandling.html. Performing auto-retries on DynamoDB errors As mentioned in the previous recipe, we can perform auto-retries on DynamoDB requests if we get errors. In this recipe, we are going to see how to perform auto=retries. Getting ready To get started with this recipe, you should have your workstation ready with the Eclipse IDE. How to do it… Auto-retries are required if we get any errors during the first request. We can use the Amazon client configurations to set our retry strategy. By default, the DynamoDB client auto-retries a request if any error is generated three times. If we think that this is not efficient for us, then we can define this on our own, as follows: First of all, we need to create a custom implementation of RetryCondition. It contains a method called shouldRetry, which we need to implement as per our needs. Here is a sample CustomRetryCondition class: public class CustomRetryCondition implements RetryCondition { public boolean shouldRetry(AmazonWebServiceRequest originalRequest, AmazonClientException exception, int retriesAttempted) { if (retriesAttempted < 3 && exception.isRetryable()) { return true; } else { return false; } } } Similarly, we can implement CustomBackoffStrategy. The back-off strategy gives a hint on after what time the request should be retried. You can choose either a flat back-off time or an exponential back-off time: public class CustomBackoffStrategy implements BackoffStrategy { /** Base sleep time (milliseconds) **/ private static final int SCALE_FACTOR = 25; /** Maximum exponential back-off time before retrying a request */ private static final int MAX_BACKOFF_IN_MILLISECONDS = 20 * 1000; public long delayBeforeNextRetry(AmazonWebServiceRequest originalRequest, AmazonClientException exception, int retriesAttempted) { if (retriesAttempted < 0) return 0; long delay = (1 << retriesAttempted) * SCALE_FACTOR; delay = Math.min(delay, MAX_BACKOFF_IN_MILLISECONDS); return delay; } } Next, we need to create an instance of RetryPolicy, and set the RetryCondition and BackoffStrategy classes, which we created. Apart from this, we can also set a maximum number of retries. The last parameter is honorMaxErrorRetryInClientConfig. It means whether this retry policy should honor the maximum error retry set by ClientConfiguration.setMaxErrorRetry(int): RetryPolicy retryPolicy = new RetryPolicy(customRetryCondition, customBackoffStrategy, 3, false); Now, initiate the ClientConfiguration, and set the RetryPolicy we created earlier: ClientConfiguration clientConfiguration = new ClientConfiguration(); clientConfiguration.setRetryPolicy(retryPolicy); Now, we need to set this client configuration when we initiate the AmazonDynamoDBClient; and once done, your retry policy with a custom back-off strategy will be in place: AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider(), clientConfiguration); How it works… Auto-retries are quite handy when we receive a sudden burst in DynamoDB requests. If there are more number of requests than the provisioned throughputs, then auto-retries with an exponential back-off strategy will definitely help in handling the load. So if the client gets an exception, then it will get auto retried after sometime; and if by then the load is less, then there wouldn't be any loss for your application. The Amazon DynamoDB client internally uses HttpClient to make the calls, which is quite a popular and reliable implementation. So if you need to handle such cases, this kind of an implementation is a must. In case of batch operations, if any failure occurs, DynamoDB does not fail the complete operation. In case of batch write operations, if a particular operation fails, then DynamoDB returns the unprocessed items, which can be retried. Performing atomic transactions on DynamoDB tables I hope we are all aware that operations in DynamoDB are eventually consistent. Considering this nature it obviously does not support transactions the way we do in RDBMS. A transaction is a group of operations that need to be performed in one go, and they should be handled in an atomic nature. (If one operation fails, the complete transaction should be rolled back.) There might be use cases where you would need to perform transactions in your application. Considering this need, AWS has provided open sources, client-side transaction libraries, which helps us achieve atomic transactions in DynamoDB. In this recipe, we are going to see how to perform transactions on DynamoDB. Getting ready To get started with this recipe, you should have your workstation ready with the Eclipse IDE. How to do it… To get started, we will first need to download the source code of the library from GitHub and build the code to generate the JAR file. You can download the code from https://github.com/awslabs/dynamodb-transactions/archive/master.zip. Next, extract the code and run the following command to generate the JAR file: mvn clean install –DskipTests On a successful build, you will see a JAR generated file in the target folder. Add this JAR to the project by choosing a configure build path in Eclipse: Now, let's understand how to use transactions. For this, we need to create the DynamoDB client and help this client to create two helper tables. The first table would be the Transactions table to store the transactions, while the second table would be the TransactionImages table to keep the snapshots of the items modified in the transaction: AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); // Create transaction table TransactionManager.verifyOrCreateTransactionTable(client, "Transactions", 10, 10, (long) (10 * 60)); // Create transaction images table TransactionManager.verifyOrCreateTransactionImagesTable(client, "TransactionImages", 10, 10, (long) (60 * 10)); Next, we need to create a transaction manager by providing the names of the tables we created earlier: TransactionManager txManager = new TransactionManager(client, "Transactions", "TransactionImages"); Now, we create one transaction, and perform the operations you will need to do in one go. Consider our product table where we need to add two new products in one single transaction, and the changes will reflect only if both the operations are successful. We can perform these using transactions, as follows: Transaction t1 = txManager.newTransaction(); Map<String, AttributeValue> product = new HashMap<String, AttributeValue>(); AttributeValue id = new AttributeValue(); id.setN("250"); product.put("id", id); product.put("type", new AttributeValue("phone")); product.put("name", new AttributeValue("MI4")); t1.putItem(new PutItemRequest("productTable", product)); Map<String, AttributeValue> product1 = new HashMap<String, AttributeValue>(); id.setN("350"); product1.put("id", id); product1.put("type", new AttributeValue("phone")); product1.put("name", new AttributeValue("MI3")); t1.putItem(new PutItemRequest("productTable", product1)); t1.commit(); Now, execute the code to see the results. If everything goes fine, you will see two new entries in the product table. In case of an error, none of the entries would be in the table. How it works… The transaction library when invoked, first writes the changes to the Transaction table, and then to the actual table. If we perform any update item operation, then it keeps the old values of that item in the TransactionImages table. It also supports multi-attribute and multi-table transactions. This way, we can use the transaction library and perform atomic writes. It also supports isolated reads. You can refer to the code and examples for more details at https://github.com/awslabs/dynamodb-transactions. Performing asynchronous requests to DynamoDB Till now, we have used a synchronous DynamoDB client to make requests to DynamoDB. Synchronous requests block the thread unless the operation is not performed. Due to network issues, sometimes, it can be difficult for the operation to get completed quickly. In that case, we can go for asynchronous client requests so that we submit the requests and do some other work. Getting ready To get started with this recipe, you should have your workstation ready with the Eclipse IDE. How to do it… Asynchronous client is easy to use: First, we need to the AmazonDynamoDBAsync class: AmazonDynamoDBAsync dynamoDBAsync = new AmazonDynamoDBAsyncClient( new ProfileCredentialsProvider()); Next, we need to create the request to be performed in an asynchronous manner. Let's say we need to delete a certain item from our product table. Then, we can create the DeleteItemRequest, as shown in the following code snippet: Map<String, AttributeValue> key = new HashMap<String, AttributeValue>(); AttributeValue id = new AttributeValue(); id.setN("10"); key.put("id", id); key.put("type", new AttributeValue("phone")); DeleteItemRequest deleteItemRequest = new DeleteItemRequest( "productTable", key); Next, invoke the deleteItemAsync method to delete the item. Here, we can optionally define AsyncHandler if we want to use the result of the request we had invoked. Here, I am also printing the messages with time so that we can confirm its asynchronous nature: dynamoDBAsync.deleteItemAsync(deleteItemRequest, new AsyncHandler<DeleteItemRequest, DeleteItemResult>() { public void onSuccess(DeleteItemRequest request, DeleteItemResult result) { System.out.println("Item deleted successfully: "+ System.currentTimeMillis()); } public void onError(Exception exception) { System.out.println("Error deleting item in async way"); } }); System.out.println("Delete item initiated" + System.currentTimeMillis()); How it works Asynchronous clients use AsyncHttpClient to invoke the DynamoDB APIs. This is a wrapper implementation on top of Java asynchronous APIs. Hence, they are quite easy to use and understand. The AsyncHandler is an optional configuration you can do in order to use the results of asynchronous calls. We can also use the Java Future object to handle the response. Summary We have covered various recipes on cost and performance efficient use of DynamoDB. Recipes like error handling and auto retries helps readers in make their application robust. It also highlights use of transaction library in order to implement atomic transaction on DynamoDB. Resources for Article: Further resources on this subject: The EMR Architecture[article] Amazon DynamoDB - Modelling relationships, Error handling[article] Index, Item Sharding, and Projection in DynamoDB [article]
Read more
  • 0
  • 0
  • 23026

article-image-perform-regression-analysis-using-sas
Gebin George
27 Feb 2018
7 min read
Save for later

How to perform regression analysis using SAS

Gebin George
27 Feb 2018
7 min read
[box type="note" align="" class="" width=""]This article is an excerpt from the book, Big Data Analysis with SAS written by David Pope. This book will help you leverage the power of SAS for data management, analysis and reporting. It contains practical use-cases and real-world examples on predictive modelling, forecasting, optimizing, and reporting your Big Data analysis using SAS.[/box] Today, we will perform regression analysis using SAS in a step-by-step manner with a practical use-case. Regression analysis is one of the earliest predictive techniques most people learn because it can be applied across a wide variety of problems dealing with data that is related in linear and non-linear ways. Linear data is one of the easier use cases, and as such PROC REG is a well-known and often-used procedure to help predict likely outcomes before they happen. The REG procedure provides extensive capabilities for fitting linear regression models that involve individual numeric independent variables. Many other procedures can also fit regression models, but they focus on more specialized forms of regression, such as robust regression, generalized linear regression, nonlinear regression, nonparametric regression, quantile regression, regression modeling of survey data, regression modeling of survival data, and regression modeling of transformed variables. The SAS/STAT procedures that can fit regression models include the ADAPTIVEREG, CATMOD, GAM, GENMOD, GLIMMIX, GLM, GLMSELECT, LIFEREG, LOESS, LOGISTIC, MIXED, NLIN, NLMIXED, ORTHOREG, PHREG, PLS, PROBIT, QUANTREG, QUANTSELECT, REG, ROBUSTREG, RSREG, SURVEYLOGISTIC, SURVEYPHREG, SURVEYREG, TPSPLINE, and TRANSREG procedures. Several procedures in SAS/ETS software also fit regression models. SAS/STAT14.2 / SAS/STAT User's Guide - Introduction to Regression Procedures - Overview: Regression Procedures (http://documentation.sas.com/?cdcId=statcdccdcVersion=14.2 docsetId=statugdocsetTarget=statug_introreg_sect001.htmlocale=enshowBanner=yes). Regression analysis attempts to model the relationship between a response or output variable and a set of input variables. The response is considered the target variable or the variable that one is trying to predict, while the rest of the input variables make up parameters used as input into the algorithm. They are used to derive the predicted value for the response variable. PROC REG One of the easiest ways to determine if regression analysis is applicable to helping you answer a question is if the type of question being asked has only two answers. For example, should a bank lend an applicant money? Yes or no? This is known as a binary response, and as such, regression analysis can be applied to help determine the answer. In the following example, the reader will use the SASHELP.BASEBALL dataset to create a regression model to predict the value of a baseball player's salary. The SASHELP.BASEBALL dataset contains salary and performance information for Major League. Baseball players who played at least one game in both the 1986 and 1987 seasons, excluding pitchers. The salaries (Sports Illustrated, April 20, 1987) are for the 1987 season and the performance measures are from 1986 (Collier Books, The 1987 Baseball Encyclopedia Update). SAS/STAT® 14.2 / SAS/STAT User's Guide - Example 99: Modeling Salaries of Major League Baseball Players (http://documentation.sas.com/ ?cdcId= statcdc cdcVersion= 14.2 docsetId=statugdocsetTarget= statug_ reg_ examples01.htmlocale= en showBanner= yes). Let's first use PROC UNIVARIATE to learn something about this baseball data by submitting the following code: proc univariate data=sashelp.baseball; quit; While reviewing the results of the output, the reader will notice that the variance associated with logSalary, 0.79066, is much less than the variance associated with the actual target variable Salary, 203508. In this case, it makes better sense to attempt to predict the logSalary value of a player instead of Salary. Write the following code in a SAS Studio program section and submit it: proc reg data=sashelp.baseball; id name team league; model logSalary = nAtBat nHits nHome nRuns nRBI YrMajor CrAtBat CrHits CrHome CrRuns CrRbi; Quit; Notice that there are 59 observations as specified in the first output table with at least one of the input variables with missing values; as such those are not used in the development of the regression model. The Root Mean Squared Error (RMSE) and R-square are statistics that typically inform the analyst how good the model is in predicting the target. These range from 0 to 1.0 with higher values typically indicating a better model. The higher the Rsquared values typically indicate a better performing model but sometimes conditions or the data used to train the model over-fit and don't represent the true value of the prediction power of that particular model. Over-fitting can happen when an analyst doesn't have enough real-life data and chooses data or a sample of data that over-presents the target event, and therefore it will produce a poor performing model when using real-world data as input. Since several of the input values appear to have little predictive power on the target, an analyst may decide to drop these variables, thereby reducing the need for that information to make a decent prediction. In this case, it appears we only need to use four input variables. YrMajor, nHits, nRuns, and nAtBat. Modify the code as follows and submit it again: proc reg data=sashelp.baseball; id name team league; model logSalary = YrMajor nHits nRuns nAtBat; Quit; The p-value associated with each of the input variables provides the analyst with an insight into which variables have the biggest impact on helping to predict the target variable. In this case, the smaller the value, the higher the predictive value of the input variable. Both the RMSE and R-square values for this second model are slightly lower than the original. However, the adjusted R-square value is slightly higher. In this case, an analyst may chose to use the second model since it requires much less data and provides basically the same predictive power. Prior to accepting any model, an analyst should determine whether there are a few observations that may be over-influencing the results by investigating the influence and fit diagnostics. The default output from PROC REG provides this type of visual insight: The top-right corner plot, showing the externally studentized residuals (RStudent) by leverage values, shows that there are a few observations with high leverage that may be overly influencing the fit produced. In order to investigate this further, we will add a plots statement to our PROC REG to produce a labeled version of this plot. Type the following code in a SAS Studio program section and submit: proc reg data=sashelp.baseball plots(only label)=(RStudentByLeverage); id name team league; model logSalary = YrMajor nHits nRuns nAtBat; Quit; Sure enough, there are three to five individuals whose input variables may have excessive influence on fitting this model. Let's remove those points and see if the model improves. Type this code in a SAS Studio program section and submit it: proc reg data=sashelp.baseball plots=(residuals(smooth)); where name NOT IN ("Mattingly, Don", "Henderson, Rickey", "Boggs, Wade", "Davis, Eric", "Rose, Pete"); id name team league; model logSalary = YrMajor nHits nRuns nAtBat; Quit; This change, in itself, has not improved the model but actually made the model worse as can be seen by the R-square, 0.5592. However, the plots residuals(smooth) option gives some insights as it pertains to YrMajor; players at the beginning and the end of their careers tend to be paid less compared to others, as can be seen in Figure 4.12: In order to address this lack of fit, an analyst can use polynomials of degree two for this variable, YrMajor. Type the following code in a SAS Studio program section and submit it: data work.baseball; set sashelp.baseball; where name NOT IN ("Mattingly, Don", "Henderson, Rickey", "Boggs, Wade", "Davis, Eric", "Rose, Pete"); YrMajor2 = YrMajor*YrMajor; run; proc reg data=work.baseball; id name team league; model logSalary = YrMajor YrMajor2 nHits nRuns nAtBat; Quit; After removing some outliers and adjusting for the YrMajor variable, the model's predictive power has improved significantly as can be seen in the much improved R-square value of 0.7149. We saw an effective way of performing regression analysis using SAS platform. If you found our post useful, do check out this book Big Data Analysis with SAS to understand other data analysis models and perform them practically using SAS.    
Read more
  • 0
  • 0
  • 22959
Modal Close icon
Modal Close icon