Artificial Intelligence | 0 articles | Tech News, Tutorials & Expert Insights

article-image-amazon-is-supporting-research-into-conversational-ui-with-alexa-fellowships

03 Sep 2018

3 min read

Amazon is supporting research into conversational AI with Alexa fellowships

03 Sep 2018

Amazon has chosen recipients from all over the world to be awarded the Alexa fellowships. The Alexa Fellowships program is open for PhD and post-doctoral students specializing in conversational AI at select universities. The program was launched last year, when four researchers won awards. Amazon's Alexa Graduate fellowship The Alexa Graduate Fellowship supports conversational AI research by providing funds and mentorship to PhD and postdoctoral students. Faculty Advisors and Alexa Graduate Fellows will also teach conversational AI to undergraduate and graduate students using the Alexa Skills Kit (ASK) and Alexa Voice Services (AVS). The graduate fellowship recipients are selected based on their research interests, planned coursework and existing conversational AI curriculum. This year the institutions include six in the United States, two in the United Kingdom, one in Canada and one in India. The 10 universities are: Carnegie Mellon University, Pittsburgh, PA International Institute of Information Technology, Hyderabad, India Johns Hopkins University, Baltimore, MD MIT App Inventor, Boston, MA University of Cambridge, Cambridge, United Kingdom University of Sheffield, Sheffield, United Kingdom University of Southern California, Los Angeles, CA University of Texas at Austin, Austin, TX University of Washington, Seattle, WA University of Waterloo, Waterloo, Ontario, Canada Amazon's Alexa Innovation Fellowship The Alexa Innovation Fellowship is dedicated to innovations in conversational AI. The program was introduced this year and Amazon has partnered with university entrepreneurship centers to help student-led startups build their innovative conversational interfaces. The fellowship also provides resources to faculty members. This year ten leading entrepreneurship center faculty members were selected as the inaugural class of Alexa Innovation Fellows. They are invited to learn from the Alexa team and network with successful Alexa Fund entrepreneurs. Instructors will receive funding, Alexa devices, hardware kits and regular training, as well as introductions to successful Alexa Fund-backed entrepreneurs. The 10 universities selected to receive the 2018-2019 Alexa Innovation Fellowship include: Arizona State University, Tempe, AZ California State University, Northridge, CA Carnegie Mellon University, Pittsburgh, PA Dartmouth College, Hanover, NH Emerson College, Boston, MA Texas A&M University, College Station, TX University of California, Berkeley, CA University of Illinois, Urbana-Champaign, IL University of Michigan, Ann Arbor, MI University of Southern California, Los Angeles, CA “We want to make it easier and more accessible for smart people outside of the company to get involved with conversational AI. That's why we launched the Alexa Skills Kit (ASK) and Alexa Voice Services (AVS) and allocated $200 million to promising startups innovating with voice via the Alexa Fund.” wrote Kevin Crews, Senior Product Manager for the Amazon Alexa Fellowship, in a blog post. Read more about the 2018-2019 Alexa Fellowship class on the Amazon blog. Read next Cortana and Alexa become best friends: Microsoft and Amazon release a preview of this integration Voice, natural language, and conversations: Are they the next web UI?

0
0
14025

article-image-paper-in-two-minutes-zero-shot-learning-for-visual-imitation

Savia Lobo

02 May 2018

4 min read

Paper in Two minutes: Zero-Shot learning for Visual Imitation

Savia Lobo

02 May 2018

4 min read

The ICLR paper, ‘Zero-Shot learning for Visual Imitation’ is a collaborative effort by Deepak Pathak, Parsa Mahmoudieh, Michael Luo, Pulkit Agrawal, Dian Chen, Fred Shentu, Evan Shelhamer, Jitendra Malik, Alexei A. Efros, and Trevor Darrell. In this article, we will come across one of the main problems with imitation learning, the expense of expert demonstration. The authors here propose a method for sidestepping this issue by using the random exploration of an agent to learn generalizable skills which can then be applied without any specific pretraining on any new task. Reducing the expert demonstration expense with Zero-shot visual imitation What problem is the paper trying to solve? In order to carry out imitation, the expert should be able to simply demonstrate tasks capably without lots of effort, instrumentation, or engineering. Collecting too many demonstrations is time-consuming, exact state-action knowledge is impractical, and reward design is involved and takes more than task expertise. The agent should be able to achieve goals based on the demonstrations without having to devote time learning to do each and every task. To address these issues, the authors recast learning from demonstration into doing from demonstration by (1) Only giving demonstrations during inference and, (2) Restricting demonstrations to visual observations alone rather than full state-actions. Instead of imitation learning, the agent must learn to imitate. This is the goal that the authors are trying to achieve. Paper summary This paper explains how existing approaches to imitation learning distill both what to do (goal) and how to do it (skills), from expert demonstrations. However, this expertise is effective but expensive supervision: it is not always practical to collect many detailed demonstrations. The authors here suggest that if an agent has access to its environment along with the expert, it can learn skills from its own experience and rely on expertise for the goals alone. And so, they have proposed a ‘Zero-shot’ method which does not include any expert actions or demonstrations during learning. The zero-shot imitator has no prior knowledge of the environment and makes no use of the expert during training. It learns from experience to follow experts, for instance, the authors conducted certain experiments such as, navigating an office with a turtlebot, and manipulating rope with a baxter robot. Key takeaways The authors have proposed a method for learning a parametric skill function (PSF) that takes as input a description of the initial state, goal state, parameters of the skill and outputs a sequence of actions (could be of varying length), which take the agent from initial state to goal state. The authors have shown real-world results for office navigation and rope manipulation but make no domain assumptions limiting the method to these problems. Zero-shot imitators learn to follow demonstrations without any expert supervision during learning. This approach learns task priors of representation, goals, and skills from the environment in order to imitate the goals given by the expert during inference. Reviewer comments summary Overall Score: 25/30 Average Score: 8 As per one of the reviewers, the proposed approach is well founded and the experimental evaluations are promising. The paper is well written and easy to follow. The skill function uses a RNN as function approximator and minimizes the sum of two losses i.e. the state mismatch loss over the trajectory (using an explicitly learnt forward model) and the action mismatch loss (using a model-free action prediction module) . This is hard to do in practice due to jointly learning both the forward model as well as the state mismatches. So first they are separately learnt and then fine-tuned together. One Shot Learning: Solution to your low data problem Using Meta-Learning in Nonstationary and Competitive Environments with Pieter Abbeel What is Meta Learning?

0
0
13690

article-image-introducing-googles-tangent

Sugandha Lahoti

14 Nov 2017

3 min read

Introducing Google's Tangent: A Python library with a difference

Sugandha Lahoti

14 Nov 2017

3 min read

The Google Brain team, in a recent blog post, announced the arrival of Tangent, an open source and free Python library for ahead-of-time automatic differentiation. Most machine learning algorithms require the calculation of derivatives and gradients. If we do it manually, it is time-taking as well as error-prone. Automatic differentiation or autodiff is a set of techniques to accurately compute the derivatives of numeric functions expressed as computer programs. Autodiff techniques can run large-scale machine learning models with high-performance and better usability. Tangent uses the Source code transformation (SCT) in Python to perform automatic differentiation. What it basically does is, take the Python source code as input, and then produce new Python functions as its output. The new python function calculates the gradient of the input. This improves readability of the automatic derivative code similar to the rest of the program. In contrast, TensorFlow and Theano, the two most popular machine learning frameworks do not perform autodiff on the Python Code. They instead use Python as a metaprogramming language to define a data flow graph on which SCT is performed. This at times is confusing to the user, considering it involves a separate programming paradigm. Source: https://github.com/google/tangent/blob/master/docs/toolspace.png Tangent has a one-function API: import tangent df = tangent.grad(f) For printing out derivatives: import tangent df = tangent.grad(f, verbose=1) Because it uses SCT, it generates a new python function. This new function follows standard semantics and its source code can be inspected directly. This makes it easy to understand by users, easy to debug, and has no runtime overhead. Another highlighting feature is the fact that it is easily compatible with TensorFlow and NumPy. It is high performing and is built on Python, which has a large and growing community. For processing arrays of numbers, TensorFlow Eager functions are also supported in Tangent. This library also auto-generates derivatives of codes that contain if statements and loops. It also provides easy methods to generate custom gradients. It improves usability by using abstractions for easily inserting logic into the generated gradient code. Tangent provides forward-mode auto differentiation. This is a better alternative than the backpropagation, which fails for cases where the number of outputs exceeds the number of inputs. In contrast, forward-mode auto diff runs in proportion to the input variables. According to the Github repository, “Tangent is useful to researchers and students who not only want to write their models in Python but also read and debug automatically-generated derivative code without sacrificing speed and flexibility.” Currently Tangent does not support classes and closures. Although the developers do plan on incorporating classes. This will enable class definitions of neural networks and parameterized functions. Tangent is still in the experimental stage. In the future, the developers plan to extend it to other numeric libraries and add support for more aspects of the Python language. These include closures, classes, more NumPy and TensorFlow functions etc. They also plan to add more advanced autodiff and compiler functionalities. To summarize, here’s a bullet list of key features of Tangent: Auto differentiation capabilities Code is easy to interpret, debug, and modify Easily compatible Custom Gradients Forward-mode autodiff High performance and optimization You can learn more about the project on their official GitHub.

0
0
13203

article-image-alibaba-fashionai-artificial-intelligence-shopping

Abhishek Jha

15 Nov 2017

3 min read

Digitizing the offline: How Alibaba’s FashionAI can revive the waning retail industry

Abhishek Jha

15 Nov 2017

3 min read

Imagine a visit to a store when you end up forgetting what you wanted to purchase, but the screen at front tells you all about your preferences. Before you struggle to pick a selection, the system tells you what you could possibly try and which items you could end up purchasing. All this is happening in the malls of China, thanks to artificial intelligence. Thanks to the Alibaba Group which is on a mission to revive the offline retail market. Ever since inception in 2009, the Singles Day festival has been considered the biggest opportunity for shopping in China. There was no better occasion for Alibaba to test its artificial intelligence product FashionAI. And this year, the sales zoomed. Alibaba’s gross merchandise volume gained a mammoth 25 billion dollars on Nov. 11, breaking its last year’s gross figure of $17.8 billion by a considerable margin. Majorly attributed to the FashionAI initiative. At a time when offline retail is in decline all across the world, Alibaba’s Fashion AI could single-handedly reinvent the market. It will drag you back to the malls. When the tiny sensors embedded in the cloth you just tried suggest you all ‘matching’ items, you do not mind visiting the stores with an easy to use recognizable interface at front that saves you from all the old drudgeries in retail shopping! The FashionAI screen interface uses machine learning for its suggestions, based on the items being tried on. It extends the information stored into the product tag to generate recommendations. Using the system, a customer can try clothes on, get related ‘smart’ suggestions from the AI, and then finalize a selection on the screen. But most importantly, the AI assistant doesn’t intend to replace humans with robots. It instead integrates them together to deliver better service. So when you want to try something different, you can click a button and a store attendant will be right there. Why are we cutting down on human intervention then? The point is, it is nearly impossible for a human staff to remember the shopping preferences of each customer, whereas an artificial intelligence system can do it with scale. This is why researchers thought to apply deep learning for real world scenarios like these. Unlike the human store attendant who gets irked with your massive shopping tantrums in terms of choices, the AI system has been programmed to rather leverage the big data for making ‘smart’ decisions. What more, it gets better with time, and learns more and more based on the suggested inputs from surroundings. We could say FashionAI is still an experiment. But Alibaba is on course to create history, if the Chinese e-commerce giant succedes in fueling life back into retail. As CEO Daniel Zhang reiterated, Alibaba is going to “digitize the offline retail world.” This is quite a first for such a line of thinking. But then, customers don’t recognize offline and online till it serves their interest.

0
0
12989

article-image-data-science-news-daily-roundup-2nd-april-2018

Packt Editorial Staff

02 Apr 2018

2 min read

Data Science News Daily Roundup – 2nd April 2018

Packt Editorial Staff

02 Apr 2018

2 min read

Apache Releases Trafodion, SAP announces general availability of SAP Predictive Analytics application edition, Pachyderm 1.7, and more in today’s top stories and news around data science, machine learning, and deep learning. Top Data Science news of the Day The 5 biggest announcements from TensorFlow Dev Summit 2018 Other Data Science News at a Glance Apache Releases Trafodion, a webscale SQL-on-Hadoop solution. Apache Trafodion has moved from incubator status to become a high level project. Trafodion enables transactional or operational workloads on Apache Hadoop. Read more on I Programmer SAP has announced general availability of the application edition of SAP Predictive Analytics software, to help enterprise clients harness machine learning. With this, one can create and manage predictive models that deliver powerful data-driven insights to every business user across the enterprise in real-time. Read more on inside SAP IBM’s GPU-Accelerated Semantic Similarity Search at Scale Shows ~30000x Speed Up. The proposed model is a linear-complexity RWMD that avoids wasteful and repetitive computations and reduces the average time complexity to linear. Read more on IBM Research Blog Announcing Pachyderm 1.7, an open source and enterprise data science platform that is enabling reproducible data processing at scale. Read more on Medium Mobodexter announces general availability of Paasmer 2.0, a dockerized version of their IoT Edge software that removes the hardware dependency to run Paasmer Edge Software. Paasmer becomes one of the few IoT software platforms in the world to add the Docker capability on the IoT Edge. Read more on Benzinga Announcing AIRI: Integrated AI-Ready Infrastructure for Deploying Deep Learning at Scale. AIRI is purpose-built to enable data architects, scientists and business leaders to extend the power of the NVIDIA DGX-1 and operationalise AI-at-scale for every enterprise. Read more on Scientific Computing World

0
0
11906

article-image-dr-brandon-explains-word-vectors-word2vec-jon

Aarthi Kumaraswamy

01 Nov 2017

6 min read

Dr. Brandon explains Word Vectors (word2vec) to Jon

Aarthi Kumaraswamy

01 Nov 2017

6 min read

[box type="shadow" align="" class="" width=""]Dr. Brandon: Welcome back to the second episode of 'Date with Data Science'. Last time, we explored natural language processing. Today we talk about one of the most used approaches in NLP: Word Vectors. Jon: Hold on Brandon, when we went over maths 101, didn't you say numbers become vectors when they have a weight and direction attached to them. But numbers and words are Apples and Oranges! I don't understand how words could also become vectors. Unless the words are coming from my movie director and he is yelling at me :) ... What would the point of words having directions be, anyway? Dr. Brandon: Excellent question to kick off today's topic, Jon. On an unrelated note, I am sure your director has his reasons. The following is an excerpt from the book Mastering Machine Learning with Spark 2.x by Alex Tellez, Max Pumperla and Michal Malohlava. [/box] Traditional NLP approaches rely on converting individual words--which we created via tokenization--into a format that a computer algorithm can learn (that is, predicting the movie sentiment). Doing this required us to convert a single review of N tokens into a fixed representation by creating a TF-IDF matrix. In doing so, we did two important things behind the scenes: Individual words were assigned an integer ID (for example, a hash). For example, the word friend might be assigned to 39,584, while the word bestie might be assigned to 99,928,472. Cognitively, we know that friend is very similar to bestie; however, any notion of similarity is lost by converting these tokens into integer IDs. By converting each token into an integer ID, we consequently lose the context with which the token was used. This is important because, in order to understand the cognitive meaning of words, and thereby train a computer to learn that friend and bestie are similar, we need to understand how the two tokens are used (for example, their respective contexts). Given this limited functionality of traditional NLP techniques with respect to encoding the semantic and syntactic meaning of words, Tomas Mikolov and other researchers explored methods that employ neural networks to better encode the meaning of words as a vector of N numbers (for example, vector bestie = [0.574, 0.821, 0.756, ... , 0.156]). When calculated properly, we will discover that the vectors for bestie and friend are close in space, whereby closeness is defined as a cosine similarity. It turns out that these vector representations (often referred to as word embeddings) give us the ability to capture a richer understanding of text. Interestingly, using word embeddings also gives us the ability to learn the same semantics across multiple languages despite differences in the written form (for example, Japanese and English). For example, the Japanese word for movie is eiga; therefore, it follows that using word vectors, these two words, should be close in the vector space despite their differences in appearance. Thus, the word embeddings allow for applications to be language-agnostic--yet another reason why this technology is hugely popular! Word2vec explained First things first: word2vec does not represent a single algorithm but rather a family of algorithms that attempt to encode the semantic and syntactic meaning of words as a vector of N numbers (hence, word-to-vector = word2vec). We will explore each of these algorithms in depth in this chapter, while also giving you the opportunity to read/research other areas of vectorization of text, which you may find helpful. What is a word vector? In its simplest form, a word vector is merely a one-hot-encoding, whereby every element in the vector represents a word in our vocabulary, and the given word is encoded with 1 while all the other words elements are encoded with 0. Suppose our vocabulary only has the following movie terms: Popcorn, Candy, Soda, Tickets, and Blockbuster. Following the logic we just explained, we could encode the term Tickets as follows: Using this simplistic form of encoding, which is what we do when we create a bag-of-words matrix, there is no meaningful comparison we can make between words (for example, is Popcorn related to Soda; is Candy similar to Tickets?). Given these obvious limitations, word2vec attempts to remedy this via distributed representations for words. Suppose that for each word, we have a distributed vector of, say, 300 numbers that represent a single word, whereby each word in our vocabulary is also represented by a distribution of weights across those 300 elements. Now, our picture would drastically change to look something like this: Now, given this distributed representation of individual words as 300 numeric values, we can make meaningful comparisons among words using a cosine similarity, for example. That is, using the vectors for Tickets and Soda, we can determine that the two terms are not related, given their vector representations and their cosine similarity to one another. And that's not all we can do! In their ground-breaking paper, Mikolov et. al also performed mathematical functions of word vectors to make some incredible findings; in particular, the authors give the following math problem to their word2vec dictionary: V(King) - V(Man) + V(Woman) ~ V(Queen) It turns out that these distributed vector representations of words are extremely powerful in comparison questions (for example, is A related to B?), which is all the more remarkable when you consider that this semantic and syntactic learned knowledge comes from observing lots of words and their context with no other information necessary. That is, we did not have to tell our machine that Popcorn is a food, noun, singular, and so on. How is this made possible? Word2vec employs the power of neural networks in a supervised fashion to learn the vector representation of words (which is an unsupervised task). The above is an excerpt from the book Mastering Machine Learning with Spark 2.x by Alex Tellez, Max Pumperla and Michal Malohlava. To learn more about the word2vec and doc2vec algorithms such as continuous-bag-of-words (CBOW), skip-gram, cosine similarity, distributed memory among other models and to build applications based on these, check out the book.

0
0
11758

article-image-filestack-workflows-comes-with-machine-learning-capabilities-to-help-business-manage-their-digital-images

Sugandha Lahoti

25 Oct 2018

3 min read

Filestack Workflows comes with machine learning capabilities to help business manage their digital images

Sugandha Lahoti

25 Oct 2018

3 min read

Filestack has come up with Filestack Workflows, a machine learning powered solution to help businesses detect, analyze, moderate and curate content in scalable and automated ways. Filestack and Workflows have traditionally been providing tools for companies to handle content as it is uploaded. Their tools checked for NSFW content, cropped photos, performed copyright detection on Word Docs, etc. However, handling content at scale using tools they've built in-house was proving to be difficult. They relied heavily on developers to implement the code or set up a chain of events. This brought them to develop a new interface that allows businesses to upload, moderate, transform and understand content at scale, freeing them to innovate more and manage less. The Filestack Workflows platform is built on a logic-driven intelligence functionality which uses machine learning to provide quick analysis of images and return actionable insights. This includes object recognition and detection, explicit content detection, optical character recognition, and copyright detection. Filestack Workflows provide flexibility for integration either from Filestack’s own API or from a simple user Interface. Workflows also have several new features that extend far beyond simple image transformation: Optical Character Recognition (OCR) allows users to abstract text from any given image. Images of everything from tax documents to street signs can be uploaded through their system, returning a raw text format of all characters in that image. Not Safe for Work (NSFW) Detection for filtering out content that is not appropriate for the workplace. Their image tagging feature can automate content moderations by implementing “safe for work” and a “not safe for work” score. Copyright Detection to determine if a file is an original work. A single API call will display the copyright status of single or multiple images. They have also released a quick demo to highlight the features of Filestack Workflows. This demo creates a Workflow that takes uploaded content (images or documents) and determines a filetype and then curates ‘safe for work’ images. It determines the Filetype using the following logic: If it is an 'Image' then: Determine if the image is 'Safe for Work' If it is 'Safe', then store to a specific storage source. If it is 'Not Safe' then, pixelate the image, and then store to a specific storage source for modified images. If it is a 'Document', then store to a specific storage source for documents. Read more about the news on Filestack’s blog. Facebook introduces Rosetta, a scalable OCR system that understands text on images using Faster-RCNN and CNN How Netflix uses AVA, an Image Discovery tool to find the perfect title image for each of its shows Datasets and deep learning methodologies to extend image-based applications to videos

0
0
10322

article-image-paper-two-minutes-using-mean-field-games-learning-behavior-policy-large-populations

Sugandha Lahoti

20 Feb 2018

4 min read

Paper in Two minutes: Using Mean Field Games for learning behavior policy of large populations

Sugandha Lahoti

20 Feb 2018

4 min read

This ICLR 2018 accepted paper, Deep Mean Field Games for Learning Optimal Behavior Policy of Large Populations, deals with inference in models of collective behavior, specifically at how to infer the parameters of a mean field game (MFG) representation of collective behavior. This paper is authored by Jiachen Yang, Xiaojing Ye, Rakshit Trivedi, Huan Xu, and Hongyuan Zha. The 6th annual ICLR conference is scheduled to happen between April 30 - May 03, 2018. Mean field game theory is the study of decision making in very large populations of small interacting agents. This theory understands the behavior of multiple agents each individually trying to optimize their position in space and time, but with their preferences being partly determined by the choices of all the other agents. Estimating the optimal behavior policy of large populations with Deep Mean Field Games What problem is the paper attempting to solve? The paper considers the problem of representing and learning the behavior of a large population of agents, to construct an effective predictive model of the behavior. For example, a population’s behavior directly affects the ranking of a set of trending topics on social media, represented by the global population distribution over topics. Each user’s observation of this global state influences their choice of the next topic in which to participate, thereby contributing to future population behavior. Classical predictive methods such as time series analysis are also used to build predictive models from data. However, these models do not consider the behavior as the result of optimization of a reward function and so may not provide insight into the motivations that produce a population’s behavior policy. Alternatively, methods that employ the underlying population network structure assume that nodes are only influenced by a local neighborhood and do not include a representation of a global state. Hence, they face difficulty in explaining events as the result of uncontrolled implicit optimization. MFG (mean field games) overcomes the limitations of alternative predictive methods by determining how a system naturally behaves according to its underlying optimal control policy. The paper proposes a novel approach for estimating the parameters of MFG. The main contribution of the paper is in relating the theories of MFG and Reinforcement Learning within the classic context of Markov Decision Processes (MDPs). The method suggested uses inverse RL to learn both the reward function and the forward dynamics of the MFG from data. Paper summary The paper covers the problem in three sections-- theory, algorithm, and experiment. The theoretical contribution begins by transforming a continuous time MFG formulation to a discrete time formulation and then relates the MFG to an associated MDP problem. In the algorithm phase, an RL solution is suggested to the MFG problem. The authors relate solving an optimization problem on an MDP of a single agent with solving the inference problem of the (population-level) MFG. This leads to learning a reward function from demonstrations using a maximum likelihood approach, where the reward is represented using a deep neural network. The policy is learned through an actor-critic algorithm, based on gradient descent with respect to the policy parameters. The algorithm is then compared with previous approaches on toy problems with artificially created reward functions. The authors then demonstrate the algorithm on real-world social data with the aim of recovering the reward function and predicting the future trajectory. Key Takeaways This paper describes a data-driven method to solve a mean field game model of population evolution, by proving a connection between Mean Field Games with Markov Decision Process and building on methods in reinforcement learning. This method is scalable to arbitrarily large populations because the Mean Field Games framework represents population density rather than individual agents. With experiments on real data, Mean Field Games emerges as a powerful framework for learning a reward and policy that can predict trajectories of a real-world population more accurately than alternatives. Reviewer feedback summary Overall Score: 26/30 Average Score: 8.66 The reviewers are unanimous in finding the work in this paper highly novel and significant. According to the reviewers, there is still minimal work at the intersection of machine learning and collective behavior, and this paper could help to stimulate the growth of that intersection. On the flip side, surprisingly, the paper was criticized with the statement “scientific content of the work has critical conceptual flaws”. However, the author refutations persuaded the reviewers that the concerns were largely addressed.

0
0
10249

article-image-why-drive-ai-is-going-to-struggle-to-disrupt-public-transport

Richard Gall

09 May 2018

5 min read

Why Drive.ai is going to struggle to disrupt public transport

Richard Gall

09 May 2018

5 min read

Drive.ai has announced that it is to begin trialling a self-driving car taxi service in Frisco, Texas this Summer. The trial is to last 6 months as the organization works closely with the Frisco authorities to finalize the details of the routes and to 'educate' the public about how they can be used. But although the news has widely been presented as a step forward for the wider adoption of self-driving cars, the story in fact exposes the way in which self-driving car engineers are struggling to properly disrupt. And that's before it has even begun. Drive.ai's announcement comes shortly after a number of high profile incidents involving self-driving cars. In March, a woman was killed by an Uber self-driving car in Arizona. In May, a Waymo van was involved in a collision in Arizona too. This puts a little more pressure on Drive.ai, and means the trial will be watched particularly closely. Any further issues will only do more to make the wider public resistant to autonomous vehicles. The more issues that appear, the more the very concept of self-driving vehicles begins to look like a Silicon Valley pipe dream. It starts looking like a way for tech entrepreneurs to take advantage of underfunded public infrastructure in the name of disruption and innovation. And this is precisely the problem with the forthcoming Drive.ai trial. For the trial to work, Drive.ai are dependent on the support and collaboration of the Frisco authorities. Yes, there are some positives to this - there's an argument that the future of public life depends on a sort of hybrid of entrepreneurialism and state support. But we're not just talking about using machine learning or deep learning to better understand how to deploy resources more effectively, how to target those most in need of support. In this instance, we're talking about a slightly clunky system. It's a system everyone recognises as clunky - after all, that's why public 'education' is needed. Disruption should be frictionless. Self-driving taxis aren't. Whatever you think of Uber and Airbnb, both organisations have managed to disrupt their respective industries by building platforms that make certain transactions and interactions frictionless. However, when it comes to self-driving taxi services, things are much different. They're not frictionless at all. That's why Drive.ai are having to work with the Frisco authorities to sell the idea to the public. Disruptive tech works best when people immediately get the concept. It's the sort of thing that starts with wouldn't it be great if... No one thinks that about self-driving cars. The self-driving bit is immaterial to most users. Provided their Uber drivers are polite and get them to where they want to go, that's enough. Of course, some people might even like having a driver they can interact with (god forbid!). Sure, you might think I'm missing the point. Self driving cars will be more efficient, right? The cost savings will be passed on to end users. Of course it might - but seen in perspective, lots of things have become more efficient or automated. It doesn't mean we're suddenly all feeling the benefits of our savings. More importantly, this isn't really disruption. You're not radically changing the way you do something based on the needs of the people that use it. Instead you're simply trying to shift their expectations to make it easier to automate jobs. In many instances we're seeing power shift from public organizations to those where technical expertise is located. And that's what's happening here. Artificial intelligence needs to be accessible to be impactful Essentially, the technology is stuck inside the Silicon Valley organizations trying to profit from it. We know for a fact the deep learning and artificial intelligence are at their most exciting and interesting when its accessible to a huge range of people. In the case of Drive.ai, the AI is just the kernel around which all these other moving parts depend - the investment, infrastructure, and acceptance of the technology. Artificial intelligence projects work best when they seem to achieve something seamlessly, not when they require a whole operation just to make it work. The initiatives being run by Drive.ai and its competitors are a tired use of AI. It's almost as if we're chasing the dream of taxi cabs that can drive themselves simply because we simply should. And while there's clearly potential for big money to be made by those organizations working hard to make it work, for many of the cities they're working with, it might not be the best option. Public transport does, after all, already exist. Drive.ai needs users to adapt to the technology Perhaps Drive.ai might just make this work. But it's going to be difficult. That's because the problems of self-driving cars are actually a little different to those many software companies face. Typically the challenge is responding to the needs of users and building the technology accordingly. In this instance, the technology is almost there. The problem facing Drive.ai and others is getting users to accept it. What we learned from CES 2018: Self-driving cars and AI chips are the rage! Apple self-driving cars are back! VoxelNet may drive the autonomous vehicles

0
0
8539

article-image-paper-two-minutes-certifiable-distributional-robustness-principled-adversarial-training

Savia Lobo

01 Mar 2018

3 min read

Paper in two minutes: Certifiable Distributional Robustness with Principled Adversarial Training

Savia Lobo

01 Mar 2018

3 min read

Certifiable Distributional Robustness with Principled Adversarial Training, a paper accepted for ICLR 2018, is a collaborative effort of Aman Sinha, Hongseok Namkoong, and John Duchi. In this paper, the authors state the vulnerability of neural networks to adversarial examples and further take the perspective of a distributionally robust optimization which guarantees performance under adversarial input perturbations. Certifiable Distributional Robustness with Applying Principled Adversarial Training What problem is the paper trying to solve? Recent works have shown that neural networks are vulnerable to adversarial examples; seemingly imperceptible perturbations to data can lead to misbehavior of the model, such as misclassifications of the output. Many researchers proposed adversarial attack and defense mechanisms to counter these vulnerabilities. While these works provide an initial foundation for adversarial training, there are no guarantees on whether proposed white-box attacks can find the most adversarial perturbation and whether there is a class of attacks such defenses can successfully prevent. On the other hand, verification of deep networks using SMT (satisfiability modulo theories) solvers provides formal guarantees on robustness but is NP-hard in general. This approach requires prohibitive computational expense even on small networks. The authors take the perspective of distributionally robust optimization and provide an adversarial training procedure with provable guarantees on its computational and statistical performance. Paper summary This paper proposes a principled methodology to induce distributional robustness in trained neural nets with the purpose of mitigating the impact of adversarial examples. The idea is to train the model to perform well not only with respect to the unknown population distribution, but to perform well on the worst-case distribution in a Wasserstein ball around the population distribution. In particular, the authors adopt the Wasserstein distance to define the ambiguity sets. This allows them to use strong duality results from the literature on distributionally robust optimization and express the empirical minimax problem as a regularized ERM (empirical risk minimization) with a different cost. Key takeaways The paper provides a method for efficiently guaranteeing distributional robustness with a simple form of adversarial data perturbation. The method values strong statistical guarantees and fast optimization rates for a large class of problems. Empirical evaluations indicate that the proposed methods are in fact robust to perturbations in the data, and they outperform less-principled adversarial training techniques. The major benefit of this approach is its simplicity and wide applicability across many models and machine-learning scenarios. Reviewer comments summary Overall Score: 27/30 Average Score: 9 The reviewers have strongly accepted this paper and have stated that it is of a great quality and originality. They said that this paper is an interesting attempt, but some of the key claims seem to be inaccurate and miss comparison to proper baselines. Another reviewer said, the paper applies recently developed ideas in the literature of robust optimization, in particular distributionally robust optimization with Wasserstein metric, and showed that under this framework for smooth loss functions when not too much robustness is requested, then the resulting optimization problem is of the same difficulty level as the original one (where the adversarial attack is not concerned). The paper has also received some criticisms but at the end of all it is majorly liked by many of the reviewers.

0
0
7537

article-image-predictive-cybersecurity-company-balbix-secures-20-million-investment

Richard Gall

27 Jun 2018

2 min read

Predictive cybersecurity company Balbix secures $20M investment

Richard Gall

27 Jun 2018

2 min read

High profile security attacks have put cybersecurity high on the agenda. For most companies it's at best a headache and at worst a full-blown crisis. But if you're in the business of solving these problems, it only makes you more valuable. That's what has happened to Balbix. Balbix is a security solution that allows users to "predict & proactively mitigate breaches before they happen." It does this by using predictive analytics and machine learning to identify possible threats. According to TechCrunch, the company has received the series B investment from a number of different sources. This includes Singtel's Innov8 fund (based in Singapore). Balbix is bringing together machine learning and cybersecurity However, the most interesting part of the story is what Balbix is trying to do. The fact that it's seeing early signs of eager investment indicates that it's moving down the right track when it comes to cybersecurity. The company spends some time outlining how the tool works on its website. Balbix's Breach Control product uses "sensors deployed across your entire enterprise network automatically and continuously discover and monitor all devices, apps and users for hundreds of attack vectors." An 'attack vector' is really just a method of attack, like, for example, phishing or social engineering. The product then uses what the company calls the 'Balbix Brain' to analyse risks within the network. The Balbix Brain is an artificial intelligence system that is designed to do a number of things. It assesses how likely different assets and areas of the network are to be compromised, and highlights the potential impact such a compromise might have. This adds an additional level of intelligence that allows organizations that use the product to make informed decisions about how to act and what to prioritize. But Balbix BreachControl also combines chaos engineering and penetration testing by simulating small-scale attacks across a given network. "Every possible breach path is modeled across all attack vectors to calculate breach risk. " Balbix is aiming to exploit the need for improved security at an enterprise level. In an interview with TechCrunch, CEO Gaurav Bhanga said “At enterprise scale, keeping everything up to snuff is very hard,” CEO Bangha told TechCrunch in an interview. “Most organizations have little visibility into attack surfaces, the right decisions aren’t made and projects aren’t secured.”

0
0
7525

article-image-pint-paper-two-mins-making-neural-network-architectures-generalize-via-recursion

Amarabha Banerjee

13 Nov 2017

3 min read

PINT (Paper IN Two mins) - Making Neural Network Architectures generalize via Recursion

Amarabha Banerjee

13 Nov 2017

3 min read

This is a quick summary of the research paper titled Making Neural Programming Architectures Generalize via Recursion by Jonathon Cai, Richard Shin, Dawn Song published on 6th Nov 2016. The idea of solving a common task is central to developing any algorithm or system. The primary challenge in designing any such system is the problem of generalizing the result to a large set of data. Simply put it means that using the same system, we should be able to predict accurate results when the amount of data is vast and varied across different domains. This is where most ANN systems fail. Researchers have claimed that the process of iteration which is inherent in all algorithms if introduced externally, will help us arrive at a system and architecture that can predict accurate results over limitless amounts of data. This technique is called the Recursive Neural Program. For more on this and the different Neural network programs, you can refer to the original research paper. A sample illustration showing a Neural Network Program is shown below: The Problem with Learned Neural Networks The most common technique which was applied till date to was to use Learned Neural Network - a method where a program was given increasingly complex tasks - for example solving the graduate level addition problem, in simpler words, adding two numbers. The problem with this approach was that the program kept on solving correctly as long as the number of digits was less. When the digits increased, the results were chaotic, some were correct and some were not, the reason being the program chose a complex method to solve the problem of increasing complexity. The real reason behind it was actually the architecture, which stayed the same as the complexity of the problem was increased, hence the program could not adapt in the end and gave chaotic response. The Solution of Recursion The essence of recursion is that it helps the system break down the problem into smaller pieces and then it solves these problems separately. This means irrespective of how complex the problem, the recursive process will break it down into standard units, i.e., the solution remains uniform and consistent. Keeping the theory of recursion in mind, a group of researchers have implemented this in their neural network Program and created a recursive architecture called as the Neural Programmer-Interpreter (NPI). This illustration shows the different algorithms and techniques used to create Neural Network based programs. The present system is based on the May 2016 formulation proposed by Reed et al. The system induces a supervised recursion in solving any task, in a way that a particular function stores an output in a particular memory cell, then calls that output value back while checking the actual desired result. This self-calling of the program or the function automatically induces recursion and that itself helps the program to decompose the problem into multiple smaller units and hence the results are more accurate than other techniques. The scientists have successfully applied this technique to solve four common tasks namely Grade School Addition Bubble Sort Topological Sort Quick Sort They have found that the Recursive Neural Network architecture gives 100 percent success rates in predicting correct results in case of all the four above mentioned tasks. The flip- side of this technique is still the amount of supervision required while performing the tasks. These will be subject to further investigation and research. For a more detailed approach and results on the different neural Network programs and their performance, please refer to the original research paper.

0
0
7362

article-image-dr-brandon-explains-nlp-natural-language-processing-jon

Aarthi Kumaraswamy

25 Oct 2017

5 min read

Dr.Brandon explains NLP (Natural Language Processing) to Jon

Aarthi Kumaraswamy

25 Oct 2017

5 min read

[box type="shadow" align="" class="" width=""] Dr.Brandon: Welcome everyone to the first episode of 'Date with data science'. I am Dr. Brandon Hopper, B.S., M.S., Ph.D., Senior Data Scientist at BeingHumanoid and, visiting faculty at Fictional AI University. Jon: And I am just Jon - actor, foodie and Brandon's fun friend. I don't have any letters after my name but I can say the alphabets in reverse order. Pretty cool, huh! Dr.Brandon: Yes, I am sure our readers will find it very amusing Jon. Talking of alphabets, today we discuss NLP. Jon: Wait, what is NLP? Is it that thing Ashley's working on? Dr.Brandon: No. The NLP we are talking about today is Natural Language Processing, not to be confused with Neuro-Linguistic Programming. Jon: Oh alright. I thought we just processed cheese. How do you process language? Don't you start with 'to understand NLP, we must first understand how humans started communicating'! And keep it short and simple, will you? Dr.Brandon: OK I will try my best to do all of the above if you promise not to doze off. The following is an excerpt from the book Mastering Machine Learning with Spark 2.x by Alex Tellez, Max Pumperla and Michal Malohlava. [/box] NLP helps analyze raw textual data and extract useful information such as sentence structure, sentiment of text, or even translation of text between languages. Since many sources of data contain raw text, (for example, reviews, news articles, and medical records). NLP is getting more and more popular, thanks to providing an insight into the text and helps make automatized decisions easier. Under the hood, NLP is often using machine-learning algorithms to extract and model the structure of text. The power of NLP is much more visible if it is applied in the context of another machine method, where, for example, text can represent one of the input features. NLP - a brief primer Just like artificial neural networks, NLP is a relatively "old" subject, but one that has garnered a massive amount of attention recently due to the rise of computing power and various applications of machine learning algorithms for tasks that include, but are not limited to, the following: Machine translation (MT): In its simplest form, this is the ability of machines to translate one language of words to another language of words. Interestingly, proposals for machine translation systems pre-date the creation of the digital computer. One of the first NLP applications was created during World War II by an American scientist named Warren Weaver whose job was to try and crack German code. Nowadays, we have highly sophisticated applications that can translate a piece of text into any number of different languages we desire!‌ Speech recognition (SR): These methodologies and technologies attempt to recognize and translate spoken words into text using machines. We see these technologies in smartphones nowadays that use SR systems in tasks ranging from helping us find directions to the nearest gas station to querying Google for the weekend's weather forecast. As we speak into our phones, a machine is able to recognize the words we are speaking and then translate these words into text that the computer can recognize and perform some task if need be. Information retrieval (IR): Have you ever read a piece of text, such as an article on a news website, for example, and wanted to see similar news articles like the one you just read? This is but one example of an information retrieval system that takes a piece of text as an "input" and seeks to obtain other relevant pieces of text similar to the input text. Perhaps the easiest and most recognizable example of an IR system is doing a search on a web-based search engine. We give some words that we want to "know" more about (this is the "input"), and the output are the search results, which are hopefully relevant to our input search query. Information extraction (IE): This is the task of extracting structured bits of information from unstructured data such as text, video and pictures. For example, when you read a blog post on some website, often, the post is tagged with a few keywords that describe the general topics about this posting, which can be classified using information extraction systems. One extremely popular avenue of IE is called Visual Information Extraction, which attempts to identify complex entities from the visual layout of a web page, for example, which would not be captured in typical NLP approaches. Text summarization (darn, no acronym here!): This is a hugely popular area of interest. This is the task of taking pieces of text of various length and summarizing them by identifying topics, for example. In the next chapter, we will explore two popular approaches to text summarization via topic models such as Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA). If you enjoyed the above excerpt from the book Mastering Machine Learning with Spark 2.x by Alex Tellez, Max Pumperla, and Michal Malohlava, check out the book to learn how to Use Spark streams to cluster tweets online Utilize generated models for off-line/on-line prediction Transfer learning from an ensemble to a simpler Neural Network Use GraphFrames, an extension of DataFrames to graphs, to study graphs using an elegant query language Use K-means algorithm to cluster movie reviews dataset and more

0
0
7270

article-image-2018-prediction-was-reinforcement-learning-applied-to-many-real-world-situations

Prasad Ramesh

27 Feb 2019

4 min read

2018 prediction: Was reinforcement learning applied to many real-world situations?

Prasad Ramesh

27 Feb 2019

4 min read

0
0
7153

article-image-15-most-trending-applications-of-machine-learning-on-twitter

Aarthi Kumaraswamy

13 Oct 2017

2 min read

Top 15 Applications of Machine Learning on Twitter

Aarthi Kumaraswamy

13 Oct 2017

2 min read

0
0
6208

Tech News - Artificial Intelligence

Amazon is supporting research into conversational AI with Alexa fellowships

Paper in Two minutes: Zero-Shot learning for Visual Imitation

Introducing Google's Tangent: A Python library with a difference

Digitizing the offline: How Alibaba’s FashionAI can revive the waning retail industry

Data Science News Daily Roundup – 2nd April 2018

Dr. Brandon explains Word Vectors (word2vec) to Jon

Filestack Workflows comes with machine learning capabilities to help business manage their digital images

Paper in Two minutes: Using Mean Field Games for learning behavior policy of large populations

Why Drive.ai is going to struggle to disrupt public transport

Paper in two minutes: Certifiable Distributional Robustness with Principled Adversarial Training

Trending Topics

Predictive cybersecurity company Balbix secures $20M investment

PINT (Paper IN Two mins) - Making Neural Network Architectures generalize via Recursion

Dr.Brandon explains NLP (Natural Language Processing) to Jon

2018 prediction: Was reinforcement learning applied to many real-world situations?

Top 15 Applications of Machine Learning on Twitter

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access