Artificial Intelligence | 0 articles | Tech News, Tutorials & Expert Insights

article-image-graph-nets-deepminds-library-for-graph-networks-in-tensorflow-and-sonnet

19 Oct 2018

3 min read

Graph Nets – DeepMind's library for graph networks in Tensorflow and Sonnet

19 Oct 2018

Graph Nets is a new DeepMind’s library used for building graph networks in TensorFlow and Sonnet. Last week a paper Relational inductive biases, deep learning, and graph networks was published on arXiv by researchers from DeepMind, Google Brain, MIT and University of Edinburgh. The paper introduces a new machine learning framework called Graph networks which is expected to bring new innovations in artificial general intelligence realm. What are graph networks? Graph networks can generalize and extend various types of neural networks to perform calculations on the graph. It can implement relational inductive bias, a technique used for reasoning about inter-object relations. The graph networks framework is based on graph-to-graph modules. Each graph’s features are represented in three characteristics: Nodes Edges: Relations between the nodes Global attributes: System-level properties The graph network takes a graph as an input, performs the required operations and calculations from the edge, to the node, and to the global attributes, and then returns a new graph as an output. The research paper argues that graph networks can support two critical human-like capabilities: Relational reasoning: Drawing logical conclusions of how different objects and things relate to one another Combinatorial Generalization: Constructing new inferences, behaviors, and predictions from known building blocks To understand and learn more about graph networks you can refer the official research paper. Graph Nets Graph Nets library can be installed from pip. To install the library, run the following command: $ pip install graph_nets The installation is compatible with Linux/Mac OSX, and Python versions 2.7 and 3.4+ The library includes Jupyter notebook demos which allow you to create, manipulate, and train graph networks to perform operations such as shortest path-finding task, a sorting task, and prediction task. Each demo uses the same graph network architecture, thus showing the flexibility of the approach. You can try out various demos in your browser using Colaboratory. In other words, you don’t need to install anything locally when running the demos in the browser (or phone) via cloud Colaboratory backend. You can also run the demos on your local machine by installing the necessary dependencies. What’s ahead? The concept was released with ideas not only based in artificial intelligence research but also from the computer and cognitive sciences. Graph networks are still an early-stage research theory which does not yet offer any convincing experimental results. But it will be very interesting to see how well graph networks live up to the hype as they mature. To try out the open source library, you can visit the official Github page. In order to provide any comments or suggestions, you can contact graph-nets@google.com. Read more 2018 is the year of graph databases. Here’s why. Why Neo4j is the most popular graph database Pytorch.org revamps for Pytorch 1.0 with design changes and added Static graph support

0
0
22592

article-image-4-clustering-algorithms-every-data-scientist-know

Sugandha Lahoti

07 Nov 2017

6 min read

4 Clustering Algorithms every Data Scientist should know

Sugandha Lahoti

07 Nov 2017

6 min read

[box type="note" align="" class="" width=""]This is an excerpt from a book by John R. Hubbard titled Java Data Analysis. In this article, we see the four popular clustering algorithms: hierarchical clustering, k-means clustering, k-medoids clustering, and the affinity propagation algorithms along with their pseudo-codes.[/box] A clustering algorithm is one that identifies groups of data points according to their proximity to each other. These algorithms are similar to classification algorithms in that they also partition a dataset into subsets of similar points. But, in classification, we already have data whose classes have been identified. such as sweet fruit. In clustering, we seek to discover the unknown groups themselves. Hierarchical clustering Of the several clustering algorithms that we will examine in this article, hierarchical clustering is probably the simplest. The trade-off is that it works well only with small datasets in Euclidean space. The general setup is that we have a dataset S of m points in Rn which we want to partition into a given number k of clusters C1 , C2 ,..., Ck , where within each cluster the points are relatively close together. Here is the algorithm: Create a singleton cluster for each of the m data points. Repeat m – k times: Find the two clusters whose centroids are closest Replace those two clusters with a new cluster that contains their points The centroid of a cluster is the point whose coordinates are the averages of the corresponding coordinates of the cluster points. For example, the centroid of the cluster C = {(2, 4), (3, 5), (6, 6), (9, 1)} is the point (5, 4), because (2 + 3 + 6 + 9)/4 = 5 and (4 + 5 + 6 + 1)/4 = 4. This is illustrated in the figure below : K-means clustering A popular alternative to hierarchical clustering is the K-means algorithm. It is related to the K-Nearest Neighbor (KNN) classification algorithm. As with hierarchical clustering, the K-means clustering algorithm requires the number of clusters, k, as input. (This version is also called the K-Means++ algorithm) Here is the algorithm: Select k points from the dataset. Create k clusters, each with one of the initial points as its centroid. For each dataset point x that is not already a centroid: Find the centroid y that is closest to x Add x to that centroid’s cluster Re-compute the centroid for that cluster It also requires k points, one for each cluster, to initialize the algorithm. These initial points can be selected at random, or by some a priori method. One approach is to run hierarchical clustering on a small sample taken from the given dataset and then pick the centroids of those resulting clusters. K-medoids clustering The k-medoids clustering algorithm is similar to the k-means algorithm, except that each cluster center, called its medoid, is one of the data points instead of being the mean of its points. The idea is to minimize the average distances from the medoids to points in their clusters. The Manhattan metric is usually used for these distances. Since those averages will be minimal if and only if the distances are, the algorithm is reduced to minimizing the sum of all distances from the points to their medoids. This sum is called the cost of the configuration. Here is the algorithm: Select k points from the dataset to be medoids. Assign each data point to its closest medoid. This defines the k clusters. For each cluster Cj : Compute the sum s = ∑ j s j , where each sj = ∑{ d (x, yj) : x ∈ Cj } , and change the medoid yj to whatever point in the cluster Cj that minimizes s If the medoid yj was changed, re-assign each x to the cluster whose medoid is closest Repeat step 3 until s is minimal. This is illustrated by the simple example in Figure 8.16. It shows 10 data points in 2 clusters. The two medoids are shown as filled points. In the initial configuration it is: C1 = {(1,1),(2,1),(3,2) (4,2),(2,3)}, with y1 = x1 = (1,1) C2 = {(4,3),(5,3),(2,4) (4,4),(3,5)}, with y2 = x10 = (3,5) The sums are s1 = d (x2,y1) + d (x3,y1) + d (x4,y1) + d (x5,y1) = 1 + 3 + 4 + 3 = 11 s2 = d (x6,y1) + d (x7,y1) + d (x8,y1) + d (x9,y1) = 3 + 4 + 2 + 2 = 11 s = s1 + s2 = 11 + 11 = 22 The algorithm at step 3 first part changes the medoid for C1 to y1 = x3 = (3,2). This causes the clusters to change, at step 3 second part, to: C1 = {(1,1),(2,1),(3,2) (4,2),(2,3),(4,3),(5,3)}, with y1 = x3 = (3,2) C2 = {(2,4),(4,4),(3,5)}, with y2 = x10 = (3,5) This makes the sums: s1 = 3 + 2 + 1 + 2 + 2 + 3 = 13 s2 = 2 + 2 = 4 s = s1 + s2 = 13 + 4 = 17 The resulting configuration is shown in the second panel of the figure below: At step 3 of the algorithm, the process repeats for cluster C2. The resulting configuration is shown in the third panel of the above figure. The computations are: C1 = {(1,1),(2,1),(3,2) (4,2),(4,3),(5,3)}, with y1 = x3 = (3,2) C2 = {(2,3),(2,4),(4,4),(3,5)}, with y2 = x8 = (2,4) s = s1 + s2 = (3 + 2 + 1 + 2 + 3) + (1 + 2 + 2) = 11 + 5 = 16 The algorithm continues with two more changes, finally converging to the minimal configuration shown in the fifth panel of the above figure. This version of k-medoid clustering is also called partitioning around medoids (PAM). Affinity propagation clustering One disadvantage of each of the clustering algorithms previously presented (hierarchical, k-means, k-medoids) is the requirement that the number of clusters k be determined in advance. The affinity propagation clustering algorithm does not have that requirement. Developed in 2007 by Brendan J. Frey and Delbert Dueck at the University of Toronto, it has become one of the most widely-used clustering methods. Like k-medoid clustering, affinity propagation selects cluster center points, called exemplars, from the dataset to represent the clusters. This is done by message-passing between the data points. The algorithm works with three two-dimensional arrays: sij = the similarity between xi and xj rik = responsibility: message from xi to xk on how well-suited xk is as an exemplar for xi aik = availability: message from xk to xi on how well-suited xk is as an exemplar for xi Here is the complete algorithm: Initialize the similarities: sij = –d(xi , xj )2 , for i ≠ j; sii = the average of those other sij values 2. Repeat until convergence: Update the responsibilities: rik = sik − max {aij + s ij : j ≠ k} Update the availabilities: aik = min {0, rkk + ∑j { max {0, rjk } : j ≠ i ∧ j ≠ k }}, for i ≠ k; akk = ∑j { max {0, rjk } : j ≠ k } A point xk will be an exemplar for a point xi if aik + rik = maxj {aij + rij}. If you enjoyed this excerpt from the book Java Data Analysis by John R. Hubbard, check out the book to learn how to implement various machine learning algorithms, data visualization and more in Java.

0
0
22097

Abhishek Jha

01 Nov 2017

3 min read

Sony resurrects robotic pet Aibo with advanced AI

Abhishek Jha

01 Nov 2017

3 min read

A decade back when CEO Howard Stringer decided to discontinue Sony’s iconic entertainment robot AIBO, its progenitor Toshitada Doi had famously staged a mock funeral lamenting, more than Aibo’s disbandment, the death of Sony’s risk-taking spirit. Today as the Japanese firm’s sales have soared to a decade high beating projected estimates, Aibo is back from the dead. The revamped pet looks cuter than ever before, after nearly a decade of hold. And it has been infused with a range of sensors, cameras, microphones and upgraded artificial intelligence features. The new Aibo is an ivory-white, plastic-covered hound which even has the ability to connect to mobile networks. Using actuators, it can move its body remarkably well, while using two OLED panels in eyes to exhibit an array of expressions. Most importantly, it comes with a unique ‘adaptive’ behavior that includes being able to actively recognize its owner and running over to them, learning and interacting in the process – detecting smiles and words of praises – with all those head and back scratches. In short, a dog in real without canine instincts. Priced at around $1,735 (198,000 Yen), Aibo includes a SIM card slot to connect to internet and access Sony’s AI cloud to analyze and learn how other robot dogs are behaving on the network. Sony says it does not intend to replace a digital assistant like Google Home but that Aibo could be a wonderful companion for children and families, forming an “emotional bond” with love, affection, and joy. The cloud service that powers Aibo’s AI is however expensive, and a basic three-year subscription plan is priced at $26 (2,980 Yen) per month. Or you could sign up upfront for three years at around $790 (90,000 Yen). As far as the battery life is concerned, the robot will take three hours to fully charge itself once it gets dissipated after two hours of activity. “It was a difficult decision to stop the project in 2006, but we continued development in AI and robotics,” Sony CEO Kazuo Hirai said speaking at a launch event. “I asked our engineers a year and a half ago to develop Aibo because I strongly believe robots capable of building loving relationships with people help realize Sony’s mission.” When Sony had initially launched AIBO in 1999, it was well ahead of its time. But after the initial euphoria, the product somehow failed to get mainstream buyers as reboots after reboots failed to generate profits. That time clearly Sony had to make a decision as its core electronics business struggled in price wars. Today, times are different – AI fever has gripped the tech world. A plastic bone (‘aibone’) for the robotic dog costs you around 2,980 Yen. And that’s the price you pay for a keeping a robotic buddy around. The word “aibo” literally means a companion after all.

0
0
21738

article-image-apache-spark-2-4-0-released

Amrata Joshi

09 Nov 2018

2 min read

Apache Spark 2.4.0 released

Amrata Joshi

09 Nov 2018

2 min read

Last week, Apache Spark released its latest version, Apache Spark 2.4.0. It is the fifth release in the 2.x line. This release comes with Barrier Execution Mode for better integration with deep learning frameworks. Apache Spark 2.4.0 brings 30+ built-in and higher-order functions to deal with complex data types. These functions work with Scala 2.12 and improve the K8s (Kubernetes) integration. This release also focuses on usability, stability, and polish while resolving around 1100 tickets. What’s new in Apache Spark 2.4.0? Built-in Avro data source Image data source Flexible streaming sinks Elimination of the 2GB block size limitation during transfer Pandas UDF improvements Major changes Apache Spark 2.4.0 supports Barrier Execution Mode in the scheduler, for better integration with deep learning frameworks. One can now build Spark with Scala 2.12 and write Spark applications in Scala 2.12. Apache Spark 2.4.0 supports Spark-Avro package with logical type support for better performance and usability. Some users are SQL experts but aren’t much aware of Scala/Python or R. Thus, this version of Apache comes with support for Pivot. Apache Spark 2.4.0 has added Structured Streaming ForeachWriter for Python. This lets users write ForeachWriter code in Python, that is, they can use the partitionId and the version/batchId/epochId to conditionally process rows. This new release has also introduced Spark data source for the image format. Users can now load images through the Spark source reader interface. Bug fixes: The LookupFunctions are used to check the same function name again and again. This version includes a latest LookupFunctions rule which performs a check for each invocation. A PageRank change in the Apache Spark 2.3 introduced a bug in the ParallelPersonalizedPageRank implementation. This change prevents serialization of a Map which needs to be broadcast to all workers. This issue has been resolved with the release of Apache Spark 2.4.0 Read more about Apache Spark 2.4.0 on the official website of Apache Spark. Building Recommendation System with Scala and Apache Spark [Tutorial] Apache Spark 2.3 now has native Kubernetes support! Implementing Apache Spark K-Means Clustering method on digital breath test data for road safety

0
0
21728

article-image-european-union-fined-google-1-49-billion-euros-for-antitrust-violations-in-online-advertising

Amrata Joshi

22 Mar 2019

3 min read

European Union fined Google 1.49 billion euros for antitrust violations in online advertising

Amrata Joshi

22 Mar 2019

3 min read

On Wednesday, European authorities fined Google 1.49 billion euros for antitrust violations in online advertising and it seems to be the third antitrust fine by the European Union against Google since 2017. As per the regulators, Google had imposed unfair terms on companies that used its search bar on their websites in Europe. Google has been abusing its power in its Android mobile phone operating system, shopping comparison services, and now search adverts. Last year, EU competition commissioner Margrethe Vestager had fined Google €4.34 billion for using its Android mobile operating system for unfairly keeping its rivals away in the mobile phone market. Two years ago, Google was fined 2.4 billion euros for unfairly favoring its own shopping services over those of its rivals. Newspaper websites or blog aggregators usually have a search function embedded to them. When a user searches something on this search function, the website provides search results and search adverts that appear alongside the search result. Google uses AdSense for Search, that provides the search adverts to the owner of the publisher websites. Google acts as an advertising broker, between advertisers and website owners that provide the space. AdSense also works as an online search advertising broker platform. Google has been at the top in online search advertising intermediation in the European Economic Area (EEA), with a market share of more than 70% from 2006 to 2016. Last year Google held nearly 75.8% and this year it’s already 77.8%. There is constant growth happening in Google’s search ad market. And it is impossible for competitors such as Microsoft and Yahoo to sell advertising space in Google's own search engine results pages. So, they need to work with third-party websites to grow their business and compete with Google. In 2006, Google had included exclusivity clauses in its contracts that prohibit the publishers from placing any search adverts from competitors on their search results pages. In March 2009, Google started to replace the exclusivity clauses with “Premium Placement” clauses. According to these clauses, the publishers had to reserve the most profitable space on their search results pages for Google's adverts and further request a minimum number of Google adverts. This, in turn, affected Google's competitors as they got restricted from placing their search adverts in the most visible and clickable parts of the websites' search results pages. It got more difficult for the competitors when Google included the clauses that would require publishers to seek written approval from Google before making any changes to the way in which the rival adverts were displayed. Google has control over how attractive the competing search adverts would be. Google also imposed an exclusive supply obligation, which would prevent competitors from placing any search adverts on the most significant websites. The company gave the most valuable positions to its adverts and also controlled the performance of the rivals’ adverts. European Commission found that Google's conduct harmed competition and consumers, and affected innovation. Google might face civil actions before the courts of the Member States for damages suffered by any person or business because of its anti-competitive behaviour. To know more about this news, check out the official press release. Google announces Stadia, a cloud-based game streaming service, at GDC 2019 Google is planning to bring Node.js support to Fuchsia Google Open-sources Sandboxed API, a tool that helps in automating the process of porting existing C and C++ code

0
0
21704

article-image-lyft-releases-an-autonomous-driving-dataset-level-5-and-sponsors-research-competition

Amrata Joshi

25 Jul 2019

3 min read

Lyft releases an autonomous driving dataset “Level 5” and sponsors research competition

Amrata Joshi

25 Jul 2019

3 min read

This week, the team at Lyft released a subset of their autonomous driving data, the Level 5 Dataset, and will be sponsoring a research competition. The Level 5 Dataset includes over 55,000 human-labelled 3D annotated frames, a drivable surface map, as well as an HD spatial semantic map for contextualizing the data. The team has been perfecting their hardware and autonomy stack for the last two years. As the sensor hardware needs to be built and properly calibrated, there is also the need for a localization stack and an HD semantic map must be created. Only then it is possible to unlock higher-level functionality like 3D perception, prediction, and planning. The dataset allows a broad cross-section of researchers in contributing to downstream research in self-driving technology. The team is iterating on the third generation of Lyft’s self-driving car and has already patented a new sensor array and a proprietary ultra-high dynamic range (100+DB) camera. Since HD mapping is crucial to autonomous vehicles, the teams in Munich and Palo Alto have been working towards building high-quality lidar-based geometric maps and high-definition semantic maps that are used by the autonomy stack. The team is also working towards building high quality and cost-effective geometric maps that would use only a camera phone for capturing the source data. Lyft’s autonomous platform team has been deploying partner vehicles on the Lyft network. Along with their partner Aptiv, the team has successfully provided over 50,000 self-driving rides to Lyft passengers in Las Vegas, which becomes the largest paid commercial self-driving service in operation. Waymo vehicles are also now available on the Lyft network in Arizona that expands the opportunity for our passengers to experience self-driving rides. To advance self-driving vehicles, the team will also be launching a competition for individuals for training algorithms on the dataset. The dataset makes it possible for researchers to work on problems such as prediction of agents over time, scene depth estimation from cameras with lidar as ground truth and many more. The blog post reads, “We have segmented this dataset into training, validation, and testing sets — we will release the validation and testing sets once the competition opens.” It further reads, “There will be $25,000 in prizes, and we’ll be flying the top researchers to the NeurIPS Conference in December, as well as allowing the winners to interview with our team. Stay tuned for specific details of the competition!” To know more about this news, check out the Medium post. Lyft announces Envoy Mobile, an iOS and Android client network library for mobile application networking Uber and Lyft drivers go on strike a day before Uber IPO roll-out Lyft introduces Amundsen; a data discovery and metadata engine for its researchers and data scientists

0
0
21667

article-image-how-deep-neural-networks-can-improve-speech-recognition-and-generation

Sugandha Lahoti

02 Feb 2018

7 min read

How Deep Neural Networks can improve Speech Recognition and generation

Sugandha Lahoti

02 Feb 2018

7 min read

While watching your favorite movie or TV show, you must have found it difficult to sometimes decipher what the characters are saying, especially if they are talking really fast, or well, you’re seeing a show in the language you don’t know. You quickly add subtitles and voila, the problem is solved. But, do you know how these subtitles work? Instead of a person writing them, a computer automatically recognizes speech and the dialogues of the characters and generates scripts. However, this is just a trivial example of what computers and neural networks can do in the field of speech understanding and generation. Today, we’re gonna talk about the achievements of deep neural networks to improve the ability of our computing systems to understand and generate human speech. How traditional speech recognition systems work Traditionally speech recognition models used classification algorithms to arrive at a distribution of possible phonemes for each frame. These classification algorithms were based on highly specialized features such as MFCC. Hidden Markov Models (HMM) were used in the decoding phase. This model was accompanied with a pre-trained language model and was used to find the most likely sequence of phones that can be mapped to output words. With the emergence of deep learning, neural networks were used in many aspects of speech recognition such as phoneme classification, isolated word recognition, audiovisual speech recognition, audio-visual speaker recognition and speaker adaptation. Deep learning enabled the development of Automatic Speech Recognition (ASR) systems. These ASR systems require separate models, namely acoustic model (AM), a pronunciation model (PM) and a language model (LM). The AM is typically trained to recognize context-dependent states or phonemes, by bootstrapping from an existing model which is used for alignment. The PM maps the sequences of phonemes produced by the AM into word sequences. Word sequences are scored using LM trained on large amounts of text data, which estimate probabilities of word sequences. However, training independent components added complexities and was suboptimal compared to training all components jointly. This called for developing end-to-end systems in the ASR community, those which attempt to learn the separate components of an ASR jointly as a single system. A single system Speech recognition model The end-to-end trained neural networks can essentially recognize speech, without using an external pronunciation lexicon, or a separate language model. End-to-end trained systems can directly map the input acoustic speech signal to word sequences. In such sequence-to-sequence models, the AM, PM, and LM are trained jointly in a single system. Since these models directly predict words, the process of decoding utterances is also greatly simplified. The end-to-end ASR systems do not require bootstrapping from decision trees or time alignments generated from a separate system. Thereby making the training of such models simpler than conventional ASR systems. There are several sequence-to-sequence models including connectionist temporal classification (CTC), and recurrent neural network (RNN) transducer, an attention-based model etc. CTC models are used to train end-to-end systems that directly predict grapheme sequences. This model was proposed by Graves et al. as a way of training end-to-end models without requiring a frame-level alignment of the target labels for a training statement. This basic CTC model was extended by Graves to include a separate recurrent LM component, in a model referred to as the recurrent neural network (RNN) transducer. The RNN transducer augments the encoder network from the CTC model architecture with a separate recurrent prediction network over the output symbols. Attention-based models are also a type of end-to-end sequence models. These models consist of an encoder network, which maps the input acoustics into a higher-level representation. They also have an attention-based decoder that predicts the next output symbol based on the previous predictions. A schematic representation of various sequence-to-sequence modeling approaches Google’s Listen-Attend-Spell (LAS) end-to-end architecture is one such attention-based model. Their end-to-end system achieves a word error rate (WER) of 5.6%, which corresponds to a 16% relative improvement over a strong conventional system which achieves a 6.7% WER. Additionally, the end-to-end model used to output the initial word hypothesis, before any hypothesis rescoring, is 18 times smaller than the conventional model. These sequence-to-sequence models are comparable with traditional approaches on dictation test sets. However, the traditional models outperform end-to-end systems on voice-search test sets. Future work is being done on building optimal models for voice-search tests as well. More work is also expected in building multi-dialect and multi-lingual systems. So that data for all dialects/languages can be combined to train one network, without the need for a separate AM, PM, and LM for each dialect/language. Enough with understanding speech. Let’s talk about generating it Text-to-speech (TTS) conversion, i.e generating natural sounding speech from text, or allowing people to converse with machines has been one of the top research goals in the present times. Deep Neural networks have greatly improved the overall development of a TTS system, as well as enhanced individual pieces of such a system. In 2012, Google first used Deep Neural Networks (DNN) instead of Gaussian Mixture Model (GMMs), which were then used as the core technology behind TTS systems. DNNs assessed sounds at every instant in time with increased speech recognition accuracy. Later, better neural network acoustic models were built using CTC and sequence discriminative training techniques based on RNNs. Although being blazingly fast and accurate, these TTS systems were largely based on concatenative TTS, where a very large database of short speech fragments was recorded from a single speaker and then recombined to form complete utterances. This led to the development of parametric TTS, where all the information required to generate the data was stored in the parameters of the model, and the contents and characteristics of the speech were controlled via the inputs to the model. WaveNet further enhanced these parametric models by directly modeling the raw waveform of the audio signal, one sample at a time. WaveNet yielded more natural-sounding speech using raw waveforms and was able to model any kind of audio, including music. Baidu then came with their Deep Voice TTS system constructed entirely from deep neural networks. Their system was able to do audio synthesis in real-time, giving up to 400X speedup over previous WaveNet inference implementations. Google, then released Tacotron, an end-to-end generative TTS model that synthesized speech directly from characters. Tacotron was able to achieve a 3.82 mean opinion score (MOS), outperforming the traditional parametric system in terms of speech naturalness. Tacotron was also considerably faster than sample-level autoregressive methods because of its ability to generate speech at the frame level. Most recently, Google has released Tacotron 2 which took inspiration from past work on Tacotron and WaveNet. It features a tacotron style, recurrent sequence-to-sequence feature prediction network that generates mel spectrograms. Followed by a modified version of WaveNet which generates time-domain waveform samples conditioned on the generated mel spectrogram frames. The model achieved a MOS of 4.53 compared to a MOS of 4.58 for professionally recorded speech. Deep Neural Networks have been a strong force behind the developments of end-to-end speech recognition and generation models. Although these end-to-end models have compared substantially well against the classical approaches, more work is to be done still. As of now, end-to-end speech models cannot process speech in real time. Real-time speech processing is a strong requirement for latency-sensitive applications such as voice search. Hence more progress is expected in such areas. Also, end-to-end models do not give expected results when evaluated on live production data. There is also difficulty in learning proper spellings for rarely used words such as proper nouns. This is done quite easily when a separate PM is used. More efforts will need to be made to address these challenges as well.

0
0
21564

article-image-spark-h2o-sparkling-water-machine-learning-needs

Aarthi Kumaraswamy

15 Nov 2017

3 min read

Spark + H2O = Sparkling water for your machine learning needs

Aarthi Kumaraswamy

15 Nov 2017

3 min read

[box type="note" align="" class="" width=""]The following is an excerpt from the book Mastering Machine Learning with Spark, Chapter 1, Introduction to Large-Scale Machine Learning and Spark written by Alex Tellez, Max Pumperla, and Michal Malohlava. This article introduces Sparkling water - H2O's integration of their platform within the Spark project, which combines the machine learning capabilities of H2O with all the functionality of Spark. [/box] H2O is an open source, machine learning platform that plays extremely well with Spark; in fact, it was one of the first third-party packages deemed "Certified on Spark". Sparkling Water (H2O + Spark) is H2O's integration of their platform within the Spark project, which combines the machine learning capabilities of H2O with all the functionality of Spark. This means that users can run H2O algorithms on Spark RDD/DataFrame for both exploration and deployment purposes. This is made possible because H2O and Spark share the same JVM, which allows for seamless transitions between the two platforms. H2O stores data in the H2O frame, which is a columnar-compressed representation of your dataset that can be created from Spark RDD and/or DataFrame. Throughout much of this book, we will be referencing algorithms from Spark's MLlib library and H2O's platform, showing how to use both the libraries to get the best results possible for a given task. The following is a summary of the features Sparkling Water comes equipped with: Use of H2O algorithms within a Spark workflow Transformations between Spark and H2O data structures Use of Spark RDD and/or DataFrame as inputs to H2O algorithms Use of H2O frames as inputs into MLlib algorithms (will come in handy when we do feature engineering later) Transparent execution of Sparkling Water applications on top of Spark (for example, we can run a Sparkling Water application within a Spark stream) The H2O user interface to explore Spark data Design of Sparkling Water Sparkling Water is designed to be executed as a regular Spark application. Consequently, it is launched inside a Spark executor created after submitting the application. At this point, H2O starts services, including a distributed key-value (K/V) store and memory manager, and orchestrates them into a cloud. The topology of the created cloud follows the topology of the underlying Spark cluster. As stated previously, Sparkling Water enables transformation between different types of RDDs/DataFrames and H2O's frame, and vice versa. When converting from a hex frame to an RDD, a wrapper is created around the hex frame to provide an RDD-like API. In this case, data is not duplicated but served directly from the underlying hex frame. Converting from an RDD/DataFrame to a H2O frame requires data duplication because it transforms data from Spark into H2O-specific storage. However, data stored in an H2O frame is heavily compressed and does not need to be preserved as an RDD anymore: If you enjoyed this excerpt, be sure to check out the book it appears in.

0
0
20966

article-image-automobile-repair-self-diagnosis-and-traffic-light-management-enabled-by-ai-from-ai-trends

Matthew Emerick

15 Oct 2020

5 min read

Automobile Repair Self-Diagnosis and Traffic Light Management Enabled by AI from AI Trends

Matthew Emerick

15 Oct 2020

5 min read

By AI Trends Staff Looking inside and outside, AI is being applied to the self-diagnosis of automobiles and to the connection of vehicles to traffic infrastructure. A data scientist at BMW Group in Munich, while working on his PhD, created a system for self-diagnosis called the Automated Damage Assessment Service, according to an account in Mirage. Milan Koch was completing his studies at the Leiden Institute of Advanced Computer Science in the Netherlands when he got the idea. “It should be a nice experience for customers,” he stated. The system gathers data over time from sensors in different parts of the car. “From scratch, we have developed a service idea that is about detecting damaged parts from low speed accidents,” Koch stated. “The car itself is able to detect the parts that are broken and can estimate the costs and the time of the repair.” Milan Koch, data scientist, BMW Group, Munich Koch developed and compared different multivariate time series methods, based on machine learning, deep learning and also state-of-the-art automated machine learning (AutoML) models. He tested different levels of complexity to find the best way to solve the time series problems. Two of the AutoML methods and his hand-crafted machine learning pipeline showed the best results. The system may have application to other multivariate time series problems, where multiple time-dependent variables must be considered, outside the automotive field. Koch collaborated with researchers from the Leiden University Medical Center (LUMC) to use his hand-crafted pipeline to analyze Electroencephalography (EEG) data. Koch stated, ‘We predicted the cognition of patients based on EEG data, because an accurate assessment of cognitive function is required during the screening process for Deep Brain Stimulation (DBS) surgery. Patients with advanced cognitive deterioration are considered suboptimal candidates for DBS as cognitive function may deteriorate after surgery. However, cognitive function is sometimes difficult to assess accurately, and analysis of EEG patterns may provide additional biomarkers. Our machine learning pipeline was well suited to apply to this problem.” He added, “We developed algorithms for the automotive domain and initially we didn’t have the intention to apply it to the medical domain, but it worked out really well.” His models are now also applied to Electromyography (EMG) data, to distinguish between people with a motor disease and healthy people. Koch intends to continue his work at BMW Group, where he will focus on customer-oriented services, predictive maintenance applications and optimization of vehicle diagnostics. DOE Grant to Research Traffic Management Delays Aims to Reduce Emissions Getting automobiles to talk to the traffic management infrastructure is the goal of research at the University of Tennesse at Chattanooga, which has been awarded $1.89 million from the US Department of Energy to create a new model for traffic intersections that would reduce energy consumption. The UTC Center for Urban Informatics and Progress (CUIP) will leverage its existing “smart corridor” to accommodate the new research. The smart corridor is a 1.25-mile span on a main artery in downtown Chattanooga, used as a test bed for research into smart city development and connected vehicles in a real-world environment. “This project is a huge opportunity for us,” stated Dr. Mina Sartipi, CUIP Director and principal investigator, in a press release. “Collaborating on a project that is future-oriented, novel, and full of potential is exciting. This work will contribute to the existing body of literature and lead the way for future research.” UTC is collaborating with the University of Pittsburgh, the Georgia Institute of Technology, the Oak Ridge National Laboratory, and the City of Chattanooga on the project. Dr. Mina Sartipi, Director, UTC Center for Urban Informatics and Progress In the grant proposal for the DOE, the research team noted that the US transportation sector accounted for more than 69 percent of petroleum consumption, and more than 37 percent of the country’s CO2 emissions. An earlier National Traffic Signal Report Card found that inefficient traffic signals contribute to 295 million vehicle hours of traffic delay, making up to 10 percent of all traffic-related delays. The project intends to leverage the capabilities of connected vehicles and infrastructures to optimize and manage traffic flow. While adaptive traffic control systems (ATCS) have been in use for a half century to improve mobility and traffic efficiency, they were not designed to address fuel consumption and emissions. Inefficient traffic systems increase idling time and stop-and-go traffic. The National Transportation Operations Coalition has graded the state of the nation’s traffic signals as D+. “The next step in the evolution [of intelligent transportation systems] is the merging of these systems through AI,” noted Aleksandar Stevanovic, associate professor of civil and environmental engineering at Pitt’s Swanson School of Engineering and director of the Pittsburgh Intelligent Transportation Systems (PITTS) Lab. “Creation of such a system, especially for dense urban corridors and sprawling exurbs, can greatly improve energy and sustainability impacts. This is critical as our transportation portfolio will continue to have a heavy reliance on gasoline-powered vehicles for some time.” The goal of the three-year project is to develop a dynamic feedback Ecological Automotive Traffic Control System (Eco-ATCS), which reduces fuel consumption and greenhouse gases while maintaining a highly operable and safe transportation environment. The integration of AI will allow additional infrastructure enhancements including emergency vehicle preemption, transit signal priority, and pedestrian safety. The ultimate goal is to reduce corridor-level fuel consumption by 20 percent. Read the source articles and information in Mirage, and in a press release from the UTC Center for Urban Informatics and Progress.

0
0
20817

article-image-breaking-ai-workflow-into-stages-reveals-investment-opportunities-from-ai-trends

Matthew Emerick

08 Oct 2020

6 min read

Breaking AI Workflow Into Stages Reveals Investment Opportunities from AI Trends

Matthew Emerick

08 Oct 2020

6 min read

By John P. Desmond, AI Trends Editor An infrastructure–first approach to AI investing has the potential to yield greater returns with a lower risk profile, suggests a recent account in Forbes. To identify the technologies supporting the AI system, deconstruct the workflow into two steps as a starting point: training and inference. MBA candidate at Columbia Business School, MBA Associate at Primary Venture Partners “Training is the process by which a framework for deep-learning is applied to a dataset,” states Basil Alomary, author of the Forbes account. An MBA candidate at Columbia Business School and MBA Associate at Primary Venture Partners, his background and experience are in early-stage SaaS ventures, as an operator and an investor. “That data needs to be relevant, large enough, and well-labeled to ensure that the system is being trained appropriately. Also, the machine learning models being created need to be validated, to avoid overfitting to the training data and to maintain a level of generalizability. The inference portion is the application of this model and the ongoing monitoring to identify its efficacy.” He identifies these stages in the AI/ML development lifecycle: data acquisition, data preparation, training, inference, and implementation. The stages of acquisition, preparation, and implementation have arguably attracted the least amount of attention from investors. Where to get the data for training the models is a chief concern. If a company is old enough to have historical customer data, it can be helpful. That approach should be inexpensive, but the data needs to be clean and complete enough to help in whatever decisions it works on. Companies without the option of historical data, can try publicly-available datasets, or they can buy the data directly. A new class of suppliers is emerging that primarily focus on selling clean, well-labeled datasets specifically for machine learning applications. One such startup is Narrative, based in New York City. The company sells data tailored to the client’s use case. The OpenML and Amazon Datasets have marketplace characteristics but are entirely open source, which is limiting for those who seek to monetize their own assets. Nick Jordan, CEO and founder, Narrative “Essentially, the idea was to take the best parts of the e-commerce and search models and apply that to a non-consumer offering to find, discover and ultimately buy data,” stated Narrative founder and CEO Nick Jordan in an account in TechCrunch. “The premise is to make it as easy to buy data as it is to buy stuff online.” In a demonstration, Jordan showed how a marketer could browse and search for data using the Narrative tools. The marketer could select the mobile IDs of people who have the Uber Driver app installed on their phone, or the Zoom app, at a price that is often subscription-based. The data selection is added to the shopping cart and checked out, like any online transaction. Founded in 2016, Narrative collects data sellers into its market, vetting each one, working to understand how the data is collected, its quality, and whether it could be useful in a regulated environment. Narrative does not attempt to grade the quality of the data. “Data quality is in the eye of the beholder,” Jordan stated. Buyers are able to conduct their own research into the data quality if so desired. Narrative is working on building a marketplace of third-party applications, which could include scoring of data sets. Data preparation is critical to making the machine learning model effective. Raw data needs to be preprocessed so that machine learning algorithms can produce a model, a structural description of the data. In an image database, for example, the images may have to be labelled, which can be labor-intensive. Automating Data Preparation is an Opportunity Area Platforms are emerging to support the process of data preparation with a layer of automation that seeks to accelerate the process. Startup Labelbox recently raised a $25 million Series B financing round to help grow its data labeling platform for AI model training, according to a recent account in VentureBeat. Founded in 2018 in San Francisco, Labelbox aims to be the data platform that acts as a central hub for data science teams to coordinate with dispersed labeling teams. In April, the company won a contract with the Department of Defense for the US Air Force AFWERX program, which is building out technology partnerships. Manu Sharma, CEO and co-founder, Labelbox A press release issued by Labelbox on the contract award contained some history of the company. “I grew up in a poor family, with limited opportunities and little infrastructure” stated Manu Sharma, CEO and one of Labelbox’s co-founders, who was raised in a village in India near the Himalayas. He said that opportunities afforded by the U.S. have helped him achieve more success in ten years than multiple generations of his family back home. “We’ve made a principled decision to work with the government and support the American system,” he stated. The Labelbox platform is supporting supervised-learning, a branch of AI that uses labeled data to train algorithms to recognize patterns in images, audio, video or text. The platform enables collaboration among team members as well as these functions: rework, rework, quality assurance, model evaluation, audit trails, and model-assisted labeling. “Labelbox is an integrated solution for data science teams to not only create the training data but also to manage it in one place,” stated Sharma. “It’s the foundational infrastructure for customers to build their machine learning pipeline.” Deploying the AI model into the real world requires an ongoing evaluation, a data pipeline that can handle continued training, scaling and managing computing resources, suggests Alomary in Forbes. An example product is Amazon’s Sagemaker, supporting deployment. Amazon offers a managed service that includes human interventions to monitor deployed models. DataRobot of Boston in 2012 saw the opportunity to develop a platform for building, deploying, and managing machine learning models. The company raised a Series E round of $206 million in September and now has $431 million in venture-backed funding to date, according to Crunchbase. Unfortunately DataRobot in March had to shrink its workforce by an undisclosed number of people, according to an account in BOSTINNO. The company employed 250 full-time employees as of October 2019. DataRobot announced recently that it was partnering with Amazon Web Services to provide its enterprise AI platform free of charge to anyone using it to help with the coronavirus response effort. Read the source articles and releases in Forbes, TechCrunch, VentureBeat and BOSTINNO.

0
0
20791

article-image-google-joins-social-coding-colaboratory

Savia Lobo

15 Nov 2017

3 min read

Google joins the social coding movement with CoLaboratory

Savia Lobo

15 Nov 2017

3 min read

Google has made it quite accessible for people to collaborate their documents, spreadsheets, and so on, with the Google Drive feature. What next? If you are one of those data science nerds who love coding, this roll-out from Google would be an amazing experimental ground for you. Google released its coLaboratory project, a new tool, and a boon for data science and analysis. It is designed in a way to make collaborating on data easier; similar to a Google document. This means it is capable of running code and providing simultaneous output within the document itself. Collaboration is what sets coLaboratory apart. It allows an improved collaboration among people having distinct skill sets--one may be great at coding, while the other might be well aware of the front-end or GUI aspects of the project. Just as you store and share a Google document or spreadsheets, you can store and share code with coLaboratory notebooks, in Google Drive. All you have to do is, click on the 'Share' option at the top right of any coLaboratory notebook. You can also look up to the Google Drive file sharing instructions. Thus, it sets new improvements for the ad-hoc workflows without the need of mailing documents back and forth. CoLaboratory includes a Jupyter notebook environment that does not require any setup for using it. With this, one does not need to download, install, or run anything on their computer. All they would need is, just a browser and they can use and share Jupyter notebooks. At present, coLaboratory functions with Python 2.7 on the desktop version of Chrome only. The reason for this is, coLab with Python 2.7 has been an internal tool for Google, for many years. Although, making it available on other browsers and with an added support for other Jupyter Kernels such as R or Scala is on the cards, soon. CoLaboratory’s GitHub repository contains two dependent tools, which one can make use of to leverage the tool onto the browser. First is the coLaboratory Chrome App and the other is coLaboratory with Classic Jupyter Kernels. Both tools can be used for creating and storing notebooks within Google Drive. This allows a collaborative editing within the notebooks. The only difference is that Chrome App executes all the code within its browser using the PNaCl Sandbox. Whereas, the CoLaboratory classic code execution is done using the local Jupyter kernels (IPython kernel) that have a complete access to the host systems and files. The coLaboratory Chrome App aids in setting up a collaborative environment for data analysis. This can be a hurdle at times, as requirements vary among different machines and operating systems. Also, the installation errors can be cryptic too. However, just with a single click, coLaboratory, IPython and a large set of popular scientific python libraries can be installed. Also, because of the Portable Native Client (PNaCl), coLaboratory is secure and runs at local speeds. This allows new users to set out on exploring IPython at a faster speed. Here’s what coLaboratory brings about for the code-lovers: No additional installation required the browser does it all The capabilities of coding now within a document Storing and sharing the notebooks on Google Drive Real-time collaboration possible; no fuss of mailing documents to and fro You can find a detailed explanation of the tool on GitHub.

0
0
20301

article-image-update-pandemic-driving-more-ai-business-researchers-fighting-fraud-cure-posts-from-ai-trends

Matthew Emerick

08 Oct 2020

6 min read

Update: Pandemic Driving More AI Business; Researchers Fighting Fraud ‘Cure’ Posts from AI Trends

Matthew Emerick

08 Oct 2020

6 min read

By AI Trends Staff The impact of the coronavirus pandemic around AI has many shades, from driving higher rates of IT spending on AI, to spurring researchers to fight fraud “cure” claims on social media, and hackers seeking to tap the medical data stream IT leaders are planning to spend more on AI/ML, and the pandemic is increasing demand for people with related job skills, according to the survey of over 100 IT executives with AI initiatives going on at companies spending at least $1 million annually on AI/ML before the pandemic. The survey was conducted in August by Algorithmia, a provider of ML operations and management platforms. Some 50% of respondents reported they are planning to spend more on AI/ML in the coming year, according to an account based on the survey from TechRepublic. A lack of in-house staff with AI/ML skills was the primary challenge for IT leaders before the pandemic, according to 59% of respondents. The most important job skills coming out of the pandemic are going to be security (69%), data management (64%), and systems integration (62%). Diego Oppenheimer, CEO of Algorithmia “When we come through the pandemic, the companies that will emerge the strongest will be those that invested in tools, people, and processes that enable them to scale delivery of AI and ML-based applications to production,” stated Diego Oppenheimer, CEO of Algorithmia, in a press release. “We believe investments in AI/ML operations now will pay off for companies sooner than later. Despite the fact that we’re still dealing with the pandemic, CIOs should be encouraged by the results of our survey.” Researchers Tracking Increase in Fraudulent COVID-19 ‘Cure’ Posts Legitimate businesses are finding opportunities from COVID-19, and so are the scammers. Researchers at UC San Diego are studying the increase of fraudulent posts around COVID-19 “cures” being posted on social media. In a new study published in the Journal of Medical Internet Research Public Health and Surveillance on August 25, 2020, researchers at University of California San Diego School of Medicine found thousands of social media posts on two popular platforms — Twitter and Instagram — tied to financial scams and possible counterfeit goods specific to COVID-19 products and unapproved treatments, according to a release from UC San Diego via EurekAlert “We started this work with the opioid crisis and have been performing research like this for many years in order to detect illicit drug dealers,” stated Timothy Mackey, PhD, associate adjunct professor at UC San Diego School of Medicine and lead author of the study. “We are now using some of those same techniques in this study to identify fake COVID-19 products for sale. From March to May 2020, we have identified nearly 2,000 fraudulent postings likely tied to fake COVID-19 health products, financial scams, and other consumer risk.” The first two waves of fraudulent posts focused on unproven marketing claims for prevention or cures and fake testing kits. The third wave of fake pharmaceutical treatments is now materializing. Prof. Mackey expects it to get worse when public health officials announce development of an effective vaccine or other therapeutic treatments. The research team identified suspect posts through a combination of Natural Language Processing and machine learning. Topic model clusters were transferred into a deep learning algorithm to detect fraudulent posts. The findings were customized to a data dashboard in order to enable public health intelligence and provide reports to authorities, including the World Health Organization and U.S. Food & Drug Administration (FDA). “Criminals seek to take advantage of those in need during times of a crisis,” Mackey stated. Sandia Labs, BioBright Working on a Better Way to Secure Critical Health Data Complementing the scammers, hackers are also seeing opportunity in these pandemic times. Hackers that threaten medical data are of particular concern. One effort to address this is a partnership between Sandia National Laboratories and the Boston firm BioBright to improve the security of synthetic biology data, a new commercial field. Corey Hudson, senior member, technical staff, Sandia Labs “In the past decade, genomics and synthetic biology have grown from principally academic pursuits to a major industry,” said computational biology manager Corey Hudson, senior member of the technical staff at Sandia Labs in a press release. “This shift paves the way toward rapid production of small molecules on demand, precision healthcare, and advanced materials.” BioBright is a scientific lab data automation company, recently acquired by Dotmatics, a UK company working on the Lab of the Future. The two companies are working to develop a better security model since currently, large volumes of data about the health and pharmaceutical information of patients are being handled with security models developed two decades ago, Hudon suggested. The situation potentially leaves open the risk of data theft or targeted attack by hackers to interrupt production of vaccines and therapeutics or the manufacture of controlled, pathogenic, or toxic materials, he suggested. “Modern synthetic biology and pharmaceutical workflows rely on digital tools, instruments, and software that were designed before security was such an important consideration,” stated Charles Fracchia, CEO of BioBright. The new effort seeks to better secure synthetic biology operations and genomic data across industry, government, and academia. The team is using Emulytics, a research initiative developed at Sandia for evaluating realistic threats against critical systems, to help develop countermeasures to the risks. C3.ai Sponsors COVID-19 Grand Challenge Competition with $200,000 in Awards If all else fails, participate in a programming challenge and try to win some money. Enterprise AI software provider C3.ai is inviting data scientists, developers, researchers and creative thinkers to participate in the C3.ai COVID-19 Grand Challenge and win prizes totaling $200,000. The judging panel will prioritize data science projects that help to understand and mitigate the spread of the virus, improve the response capabilities of the medical community, minimize the impact of this disease on society, and help policymakers navigate responses to COVID-19. C3.ai will award one Grand Prize of $100,000, two second-place awards of $25,000 each, and four third-place awards of $12,500 each. “The C3.ai COVID-19 Grand Challenge represents an opportunity to inform decision makers at the local, state, and federal levels and transform the way the world confronts this pandemic,” stated Thomas M. Siebel, CEO of C3.ai, in a press release. “As with the C3.ai COVID-19 Data Lake and the C3.ai Digital Transformation Institute, this initiative will tap our community’s collective IQ to make important strides toward necessary, innovative solutions that will help solve a global crisis.” The competition is now open. Registration ends Oct. 25 and final submissions are due Nov. 18, 2020. By Dec. 9, C3.ai will announce seven competition winners and award $200,000 in cash prizes to honorees. Judges include Michael Callagy, County Manager, County of San Mateo; S. Shankar Sastry, Professor of Electrical Engineering & Computer Science, UC Berkeley; and Zico Kolter, Associate Professor Computer Science, Carnegie Mellon University. Launched in April 2020, the C3.ai COVID-19 Data Lake now consists of 40 unique datasets, said to be among the largest unified, federated image of COVID-19 data in the world. Read the source articles and information at TechRepublic, from UC San Diego via EurekAlert, a press release from Sandia Labs, a press release from C3.ai about the COVID-19 Grand Challenge.

0
0
20242

article-image-dr-brandon-explains-transfer-learning

Shoaib Dabir

15 Nov 2017

5 min read

Dr. Brandon explains 'Transfer Learning' to Jon

Shoaib Dabir

15 Nov 2017

5 min read

[box type="shadow" align="" class="" width=""]Dr. Brandon: Hello and welcome to another episode of 'Date with Data Science'. Today we are going to talk about a topic that is all the rage these days in the data science community: Transfer Learning. Jon: 'Transfer learning' sounds all sci-fi to me. Is it like the thing that Prof. X does in X-men reading other people's minds using that dome-like headset thing in his chamber? Dr. Brandon: If we are going to get X-men involved, what Prof. X does is closer to deep learning. We will talk about that another time. Transfer learning is simpler to explain. It's what you actually do everytime you get into some character, Jon. Say, you are given the role of Jack Sparrow to play. You will probably read a lot about pirates, watch a lot of pirate movies and even Jonny Depp in character and form your own version of Jack Sparrow. Now after that acting assignment is over, say you are given the opportunity to audition for the role of Captain Hook, the famous pirate from Peter Pan. You won't do your research from ground zero this time. You will retain general mannerisms of a Pirate you learned from your previous role, but will only learn the nuances of Captain Hook, like acting one-handed. Jon: That's pretty cool! So you say machines can also learn this way? Dr.Brandon: Of course, that's what transfer learning is all about: learn something, abstract the learning sufficiently, then apply it to another related problem. The following is an excerpt from a book by Kuntal Ganguly titled Learning Generative Adversarial Networks.[/box] Pre-trained models are not optimized for tackling user specific datasets, but they are extremely useful for the task at hand that has similarity with the trained model task. For example, a popular model, InceptionV3, is optimized for classifying images on a broad set of 1000 categories, but our domain might be to classify some dog breeds. A well-known technique used in deep learning that adapts an existing trained model for a similar task to the task at hand is known as Transfer Learning. And this is why Transfer Learning has gained a lot of popularity among deep learning practitioners and in recent years has become the go-to technique in many real-life use cases. It is all about transferring knowledge (or features) among related domain. Purpose of Transfer Learning Let say you have trained a deep neural network to differentiate between fresh mango and rotten mango. During training, the network requires thousands of rotten and fresh mango images and hours of training to learn knowledge like if any fruit is rotten, a liquid will ooze out of the fruit and it produce a bad odor. Now with this training experience the network, can be used for different task/use-case to differentiate between a rotten apple and fresh apple using the knowledge of rotten features learned during training of mango images. The general approach of Transfer Learning is to train a base network and then copy its first n layers to the first n layers of a target network. The remaining layers of the target network are initialized randomly and trained toward the targeted use-case. The main scenarios for using Transfer Learning in your deep learning workflow are as follows: Smaller datasets: When you have a smaller dataset, building a deep learning model from scratch won't work well. Transfer Learning provides the way to apply a pre-trained model to new classes of data. Let's say a pre-trained model built from one million images of ImageNet data will converge to a decent solution (after training on just a fraction of the available smaller training data, for example, CIFAR-10) compared to a deep learning model built with a smaller dataset from scratch. Less resource: Deep learning process (such as convolution) requires a significant amount of resource and time. Deep learning process are well suited to run on high graded GPU-based machines. But with pre-trained models, you can easily train across a full training set (let's say 50000 images) in less than a minute using your laptop/notebook without GPU, since the majority of time a model is modified in the final layer with a simple update of just a classifier or regressor. Various approaches of using pre-trained models Using pre-trained architecture: Instead of transferring weights of the trained model, we can only use the architecture and initialize our own random weights to our new dataset. Feature extractor: A pre-trained model can be used as a feature extraction mechanism just by simply removing the output layer of the network (that gives the probabilities for being in each of the n classes) and then freezing all the previous layers of the network as a fixed feature extractor for the new dataset. Partially freezing the network: Instead of replacing only the final layer and extracting features from all previous layers, sometime we might train our new model partially (that is, to keep the weights of initial layers of the network frozen while retraining only the higher layers). Choice of the number of frozen layers can be considered as one more hyper-parameter. Next, read about how transfer learning is being used in the real world. If you enjoyed the above excerpt, do check out the book it is from.

0
0
20138

article-image-improve-interpretability-machine-learning-systems

Sugandha Lahoti

12 Mar 2018

6 min read

How to improve interpretability of machine learning systems

Sugandha Lahoti

12 Mar 2018

6 min read

Advances in machine learning have greatly improved products, processes, and research, and how people might interact with computers. One of the factors lacking in machine learning processes is the ability to give an explanation for their predictions. The inability to give a proper explanation of results leads to end-users losing their trust over the system, which ultimately acts as a barrier to the adoption of machine learning. Hence, along with the impressive results from machine learning, it is also important to understand why and where it works, and when it won’t. In this article, we will talk about some ways to increase machine learning interpretability and make predictions from machine learning models understandable. 3 interesting methods for interpreting Machine Learning predictions According to Miller, interpretability is the degree to which a human can understand the cause of a decision. Interpretable predictions lead to better trust and provide insight into how the model may be improved. The kind of machine learning developments happening in the present times require a lot of complex models, which lack in interpretability. Simpler models (e.g. linear models), on the other hand, often give a correct interpretation of a prediction model’s output, but they are often less accurate than complex models. Thus creating a tension between accuracy and interpretability. Complex models are less interpretable as their relationships are generally not concisely summarized. However, if we focus on a prediction made on a particular sample, we can describe the relationships more easily. Balancing the trade-off between model complexity and interpretability lies at the heart of the research done in the area of developing interpretable deep learning and machine learning models. We will discuss a few methods to increase the interpretability of complex ML models by summarizing model behavior with respect to a single prediction. LIME or Local Interpretable Model-Agnostic Explanations, is a method developed in the paper Why should I trust you? for interpreting individual model predictions based on locally approximating the model around a given prediction. LIME uses two approaches to explain specific predictions: perturbation and linear approximation. With Perturbation, LIME takes a prediction that requires explanation and systematically perturbs its inputs. These perturbed inputs become new, labeled training data for a simpler approximate model. It then does local linear approximation by fitting a linear model to describe the relationships between the (perturbed) inputs and outputs. Thus a simple linear algorithm approximates the more complex, nonlinear function. DeepLIFT (Deep Learning Important FeaTures) is another method which serves as a recursive prediction explanation method for deep learning. This method decomposes the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. DeepLIFT assigns contribution scores based on the difference between activation of each neuron and its ‘reference activation’. DeepLIFT can also reveal dependencies which are missed by other approaches by optionally giving separate consideration to positive and negative contributions. Layer-wise relevance propagation is another method for interpreting the predictions of deep learning models. It determines which features in a particular input vector contribute most strongly to a neural network’s output. It defines a set of constraints to derive a number of different relevance propagation functions. Thus we saw 3 different ways of summarizing model behavior with a single prediction to increase model interpretability. Another important avenue to interpret machine learning models is to understand (and rethink) generalization. What is generalization and how it affects Machine learning interpretability Machine learning algorithms are trained on certain datasets, called training sets. During training, a model learns intrinsic patterns in data and updates its internal parameters to better understand the data. Once training is over, the model is tried upon test data to predict results based on what it has learned. In an ideal scenario, the model would always accurately predict the results for the test data. In reality, what happens is that the model is able to identify all the relevant information in the training data, but sometimes fails when presented with the new data. This difference between “training error” and “test error” is called the generalization error. The ultimate aim of turning a machine learning system to a scalable product is generalization. Every task in ML wants to create a generalized algorithm that acts in the same way for all kind of distributions. And the ability to distinguish models that generalize well from those that do not, will not only help to make ML models more interpretable, but it might also lead to more principled and reliable model architecture design. According to the conventional statistical theory, small generalization error is either due to properties of the model family or because of the regularization techniques used during training. A recent paper at ICLR 2017, Understanding deep learning requires rethinking generalization shows that current machine learning theoretical frameworks fail to explain the impressive results of deep learning approaches and why understanding deep learning requires rethinking generalization. They support their findings through extensive systematic experiments. Developing human understanding through visualizing ML models Interpretability also means creating models that support human understanding of machine learning. Human interpretation is enhanced when visual and interactive diagrams and figures are used for the purpose of explaining the results of ML models. This is why a tight interplay of UX design with Machine learning is essential for increasing Machine learning interpretability. Walking along the lines of Human-centered Machine Learning, researchers at Google, OpenAI, DeepMind, YC Research and others have come up with Distill. This open science journal features articles which have a clear exposition of machine learning concepts using excellent interactive visualization tools. Most of these articles are aimed at understanding the inner working of various machine learning techniques. Some of these include: An article on attention and Augmented Recurrent Neural Networks which has a beautiful visualization of attention distribution in RNN. Another one on feature visualization, which talks about how neural networks build up their understanding of images Google has also launched the PAIR initiative to study and design the most effective ways for people to interact with AI systems. It helps researchers understand ML systems through work on interpretability and expanding the community of developers. R2D3 is another website, which provides an excellent visual introduction to machine learning. Facets is another tool for visualizing and understanding training datasets to provide a human-centered approach to ML engineering. Conclusion Human-Centered Machine Learning is all about increasing machine learning interpretability of ML systems and in developing their human understanding. It is about ML and AI systems understanding how humans reason, communicate and collaborate. As algorithms are used to make decisions in more angles of everyday life, it’s important for data scientists to train them thoughtfully to ensure the models make decisions for the right reasons. As more progress is done in this area, ML systems will not make commonsense errors or violate user expectations or place themselves in situations that can lead to conflict and harm, making such systems safer to use. As research continues in this area, machines will soon be able to completely explain their decisions and their results in the most humane way possible.

0
0
19951

article-image-ai-autonomous-cars-might-have-just-a-four-year-endurance-lifecycle-from-ai-trends

Matthew Emerick

15 Oct 2020

14 min read

AI Autonomous Cars Might Have Just A Four-Year Endurance Lifecycle from AI Trends

Matthew Emerick

15 Oct 2020

14 min read

0
0
19383

Tech News - Artificial Intelligence

Graph Nets – DeepMind's library for graph networks in Tensorflow and Sonnet

4 Clustering Algorithms every Data Scientist should know

Sony resurrects robotic pet Aibo with advanced AI

Apache Spark 2.4.0 released

European Union fined Google 1.49 billion euros for antitrust violations in online advertising

Lyft releases an autonomous driving dataset “Level 5” and sponsors research competition

How Deep Neural Networks can improve Speech Recognition and generation

Spark + H2O = Sparkling water for your machine learning needs

Automobile Repair Self-Diagnosis and Traffic Light Management Enabled by AI from AI Trends

Breaking AI Workflow Into Stages Reveals Investment Opportunities from AI Trends

Trending Topics

Google joins the social coding movement with CoLaboratory

Update: Pandemic Driving More AI Business; Researchers Fighting Fraud ‘Cure’ Posts from AI Trends

Dr. Brandon explains 'Transfer Learning' to Jon

How to improve interpretability of machine learning systems

AI Autonomous Cars Might Have Just A Four-Year Endurance Lifecycle from AI Trends

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access