Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Tech News - Data

1209 Articles
article-image-openais-new-versatile-ai-model-gpt-2-can-efficiently-write-convincing-fake-news-from-just-a-few-words
Natasha Mathur
15 Feb 2019
3 min read
Save for later

OpenAI’s new versatile AI model, GPT-2 can efficiently write convincing fake news from just a few words

Natasha Mathur
15 Feb 2019
3 min read
OpenAI researchers demonstrated a new AI model, yesterday, called GPT-2, that is capable of generating coherent paragraphs of text without needing any task-specific training. In other words, give it the first line of a story, and it’ll form the rest. Apart from generating articles, it can also perform rudimentary reading comprehension, summarization, machine translation, and question answering.   GPT-2 is an unsupervised language model comprising 1.5 billion parameters and is trained on a dataset of 8 million web pages. “GPT-2 is simply trained to predict the next word in a 40GB of internet tex”, says the OpenAI team. The OpenAI team states that it is superior to other language models trained on specific domains (like Wikipedia, news, or books) as it doesn’t need to use these domain-specific training datasets. For languages related tasks such as question answering, reading comprehension, and summarization, GPT-2 can learn these tasks directly from the raw text and doesn’t require any training data. The OpenAI team states that the GPT-2 model is ‘chameleon-like’ and easily adapts to the style and content of the input text. However, the team has observed certain failures in the model such as repetitive text, world modeling failures, and unnatural topic switching. Finding a good sample depends on the familiarity of the model with that sample’s context. For instance, when the model is prompted with topics that are ‘highly represented in data’ like Miley Cyrus, Lord of the rings, etc, it is able to generate reasonable samples 50% of the time. On the other hand, the model performs poorly in case of highly technical or complex content. The OpenAI team has specified that it envisions the use of GPT-2 in development of AI writing assistants, advanced dialogue agents, unsupervised translation between languages and enhanced speech recognition systems. It has also specified the potential misuses of GPT-2 as it can be used to generate misleading news articles, and automate the large scale production of fake and phishing content on social media. Due to the concerns related to this misuse of language generating models, OpenAI has decided to release a ‘small’ version of GPT-2  with its sampling code and a research paper for researchers to experiment with. The dataset, training code, or GPT-2 model weights have been excluded from the release. The OpenAI team states that this release strategy will give them and the overall AI community the time to discuss more deeply about the implications of such systems. It also wants the government to take initiatives to monitor the societal impact of AI technologies and to track the progress of capabilities in these systems. “If pursued, these efforts could yield a better evidence base for decisions by AI labs and governments regarding publication decisions and AI policy more broadly”, states the OpenAI team. Public reaction to the news is positive, however, not everyone is okay with OpenAI’s release strategy, and feels that the move signals towards ‘closed AI’ and propagates the ‘fear of AI’: https://twitter.com/chipro/status/1096196359403712512 https://twitter.com/ericjang11/status/1096236147720708096 https://twitter.com/SimonRMerton/status/1096104677001842688 https://twitter.com/AnimaAnandkumar/status/1096209990916833280 https://twitter.com/mark_riedl/status/1096129834927964160 For more information, check out the official OpenAI GPT-2 blog post. OpenAI charter puts safety, standards, and transparency first OpenAI launches Spinning Up, a learning resource for potential deep learning practitioners OpenAI builds reinforcement learning based system giving robots human like dexterity
Read more
  • 0
  • 0
  • 22308

article-image-responsible-tech-leadership-or-climate-washing-microsoft-hikes-its-carbon-tax-and-announces-new-initiatives-to-tackle-climate-change
Sugandha Lahoti
17 Apr 2019
5 min read
Save for later

Responsible tech leadership or climate washing? Microsoft hikes its carbon tax and announces new initiatives to tackle climate change

Sugandha Lahoti
17 Apr 2019
5 min read
Microsoft is taking a stand against climate devastation by hiking its internal carbon tax in a new sustainability drive. On Tuesday, the company announced that it nearly doubling its internal carbon fee to $15 per metric ton on all carbon emissions. The company introduced the internal carbon tax back in 2012. The fee is charged based on energy use from the company’s data centers, offices, and factories, and emissions from its employees' business air travel. Now, the funds from this higher fee will maintain Microsoft’s carbon neutrality and help meet their sustainability goals. https://twitter.com/satyanadella/status/1118241283133149184 Microsoft is aiming to use 70% renewable energy to power its data centers by 2023. For comparison, Google reached 100% renewable energy for its global operations — including both their data centers and offices in 2017. In April, this year Apple announced that its global facilities are powered with 100 percent clean energy. This achievement includes retail stores, offices, data centers and co-located facilities in 43 countries. Amazon has been the slow one in this race. Although, Amazon announced that it would power data centers with 100 percent renewable energy; since 2018 Amazon has reportedly slowed down its efforts to use renewable energy using only 50 percent. Microsoft has started the construction of  17 new buildings at their Washington headquarters. These buildings will run on 100 percent carbon-free electricity. Also, the amount of carbon associated with the construction materials of these buildings will be reduced by at least 15 percent, with a goal of reaching 30 percent. This would be monitored through Embodied Carbon Calculator for Construction (EC3), a new tool to track the carbon emissions of raw building materials. What is missing from this plan, is a complete transition off of fossil fuels rather than relying on carbon offsets. Microsoft is also joining the Climate Leadership Council (CLC). CLC is an international policy institute which promotes a national carbon pricing approach. “In addition to our internal carbon tax”, Microsoft says, “we supported the recent Washington state ballot measure on pricing carbon and believe it’s time for a robust national discussion on carbon pricing to lower emissions in an economically sound way.” Microsoft is also aggregating and hosting the environmental data sets on its cloud platform, Azure, and make them publicly available. These data sets, Microsoft notes, “are large government datasets contain satellite. and aerial imagery, among other things, and require petabytes of storage. By making them available in our cloud, we will advance and accelerate the work of grantees and researchers around the world.” Finally, the company will also scale up the work it does with other nonprofits and companies tackling environmental issues through their own data and Artificial Intelligence expertise. Responsible tech leadership or climate washing? Although, Microsoft plans to address quite a number of climate change and sustainability issues, what is missing are commitments for structural and business goal level changes or commitments. A report by Gizmodo highlights the lengths that Google, Microsoft, Amazon and other tech companies are going to help the oil industry accelerate the climate crisis—and there continued profits from this process. Per Gizmodo, Bill Gates heads a $1 billion climate action fund and has published his own point-by-point plan for fighting climate change. Notably absent from that plan is “Empowering Oil & Gas with AI”. Microsoft is also two years into a seven-year deal—rumored to be worth over a billion dollars—to help Chevron, one of the world’s largest oil companies, better extract and distribute oil. Microsoft Azure has also partnered with Equinor, a multinational energy company to provide data services in a deal worth hundreds of millions of dollars. Microsoft has also partnered with ExxonMobil to help it triple oil production in Texas and New Mexico. Microsoft also has a 7-year, multibillion-dollar deal with Chevron. Instead of making profits from these deals Microsoft could be prioritizing climate impacts in business decisions, including ending partnerships with fossil fuel companies that accelerate oil and gas exploration and extraction. https://twitter.com/MsWorkers4/status/1098693994903552000 https://twitter.com/MsWorkers4/status/1118540637899354113 Last week, Over 4,520 Amazon employees signed an open letter addressed to Jeff Bezos and Amazon board of directors asking for a company-wide action plan to address climate change and an end to the company’s reliance on dirty energy resources. Their demands: “define public goals and timelines to reduce emissions; complete ban from using fossil fuels; ending partnerships with fossil fuel companies; reducing harm caused by a company’s operations to vulnerable communities first; advocacy for local, federal, and international policies to reduce carbon emissions and fair treatment of all employees during extreme weather events linked to climate change.” Microsoft Workers 4 good who created their own petition for Microsoft to do better, endorsed the stand taken by Amazon employees and called for all employees to encourage their employers to take actions for climate change. Microsoft’s closed employee only petition was launched in February where Microsoft employees were asking the company to help them align employee’s retirement investments with Microsoft’s sustainability mission. https://twitter.com/MsWorkers4/status/1092942849522323456 4,520+ Amazon employees sign an open letter asking for a “company-wide plan that matches the scale and urgency of climate crisis” Minecraft is serious about global warming, adds a new (spigot) plugin to allow changes in climate mechanics. Google moving towards data centers with 24/7 carbon-free energy
Read more
  • 0
  • 0
  • 22231

article-image-introducing-deon-a-tool-for-data-scientists-to-add-an-ethics-checklist
Natasha Mathur
06 Sep 2018
5 min read
Save for later

Introducing Deon, a tool for data scientists to add an ethics checklist

Natasha Mathur
06 Sep 2018
5 min read
Drivendata has come out with a new tool, named, Deon, which allows you to easily add an ethics checklist to your data science projects. Deon is aimed at pushing the conversation about ethics in data science, machine learning, and Artificial intelligence by providing actionable reminders to data scientists. According to the Deon team, “it's not up to data scientists alone to decide what the ethical course of action is. This has always been a responsibility of organizations that are part of civil society. This checklist is designed to provoke conversations around issues where data scientists have particular responsibility and perspective”. Deon comes with a default checklist, but you can also develop your own custom checklists by removing items and sections, or marking items as N/A depending on the needs of the project. There are also real-world examples linked with each item in the default checklist.   To be able to run Deon for your data science projects, you need to have Python 3 or greater. Let’s now discuss the two types of checklists, Default, and Custom, that comes with Deon. Default checklist The default checklist comprises of sections on Data Collection, Data Storage, Analysis, Modeling, and Deployment. Data Collection This checklist covers information on informed consent, Collection Bias, and Limit PII exposure. Informed consent includes a mechanism for gathering consent where users have clear understanding of what they are consenting to. Collection Bias checks on sources of bias introduced during data collection and survey design. Lastly, Limit PII exposure talks about ways that can help minimize the exposure of personally identifiable information (PII). Data Storage This checklist covers sections such as Data security, Right to be forgotten and Data retention plan. Data Security refers to a plan to protect and secure data. Right to be forgotten includes a mechanism by which an individual can have his/her personal information. Data Retention consists of a plan to delete the data if no longer needed. Analysis This section comprises information on Missing perspectives, Dataset bias, Honest representation, Privacy in analysis and Auditability. Missing perspectives address the blind spots in data analysis via engagement with relevant stakeholders. Dataset bias discusses examining the data for possible sources of bias and consists of steps to mitigate or address them. Honest representation checks if visualizations, summary statistics, and reports designed honestly represent the underlying data. Privacy in analysis ensures that the data with PII are not used or displayed unless necessary for the analysis. Auditability refers to the process of producing an analysis which is well documented and reproducible. Modeling This offers information on Proxy discrimination, Fairness across groups, Metric selection,  Explainability, and Communicate bias. Proxy discrimination talks about ensuring that the model does not rely on variables or proxies that are discriminatory. Fairness across groups is a section that cross-checks whether the model results have been tested for fairness with respect to different affected groups. Metric selection considers the effects of optimizing for defined metrics and other additional metrics. Explainability talks about explaining the model’s decision in understandable terms. Communicate bias makes sure that the shortcomings, limitations, and biases of the model have been properly communicated to relevant stakeholders. Deployment This covers topics such as Redress, Roll back, Concept drift, and Unintended use. Redress discusses with an organization a plan for response in case users get harmed by the results. Roll back talks about a way to turn off or roll back the model in production when required. Concept drift refers to changing relationships between input and output data in a problem over time. This part in a checklist reminds the user to test and monitor the concept drift. This is to ensure that the model remains fair over time. Unintended use prompts the user about the steps to be taken for identifying and preventing uses and abuse of the model. Custom checklists For your projects with particular concerns, it is recommended to create your own checklist.yml file. Custom checklists are required to follow the same schema as checklist.yml. Custom Checklists need to have a top-level title which is a string, and sections which are a list. Each section in the list must have a title, a section_id, and then a list of lines. Each line must include a line_id, a line_summary, and a line string which is the content. When changing the default checklist, it is necessary to keep in mind that Deon’s goal is to have checklist items that are actionable. This is why users are advised to avoid suggesting items that are vague (e.g., "do no harm") or extremely specific (e.g., "remove social security numbers from data"). For more information, be sure to check out the official Drivendata blog post. The Cambridge Analytica scandal and ethics in data science OpenAI charter puts safety, standards, and transparency first 20 lessons on bias in machine learning systems by Kate Crawford at NIPS 2017
Read more
  • 0
  • 0
  • 22173

article-image-4-clustering-algorithms-every-data-scientist-know
Sugandha Lahoti
07 Nov 2017
6 min read
Save for later

4 Clustering Algorithms every Data Scientist should know

Sugandha Lahoti
07 Nov 2017
6 min read
[box type="note" align="" class="" width=""]This is an excerpt from a book by John R. Hubbard titled Java Data Analysis. In this article, we see the four popular clustering algorithms: hierarchical clustering, k-means clustering, k-medoids clustering, and the affinity propagation algorithms along with their pseudo-codes.[/box] A clustering algorithm is one that identifies groups of data points according to their proximity to each other. These algorithms are similar to classification algorithms in that they also partition a dataset into subsets of similar points. But, in classification, we already have data whose classes have been identified. such as sweet fruit. In clustering, we seek to discover the unknown groups themselves. Hierarchical clustering Of the several clustering algorithms that we will examine in this article, hierarchical clustering is probably the simplest. The trade-off is that it works well only with small datasets in Euclidean space. The general setup is that we have a dataset S of m points in Rn which we want to partition into a given number k of clusters C1 , C2 ,..., Ck , where within each cluster the points are relatively close together. Here is the algorithm: Create a singleton cluster for each of the m data points. Repeat m – k times: Find the two clusters whose centroids are closest Replace those two clusters with a new cluster that contains their points The centroid of a cluster is the point whose coordinates are the averages of the corresponding coordinates of the cluster points. For example, the centroid of the cluster C = {(2, 4), (3, 5), (6, 6), (9, 1)} is the point (5, 4), because (2 + 3 + 6 + 9)/4 = 5 and (4 + 5 + 6 + 1)/4 = 4. This is illustrated in the figure below : K-means clustering A popular alternative to hierarchical clustering is the K-means algorithm. It is related to the K-Nearest Neighbor (KNN) classification algorithm. As with hierarchical clustering, the K-means clustering algorithm requires the number of clusters, k, as input. (This version is also called the K-Means++ algorithm) Here is the algorithm: Select k points from the dataset. Create k clusters, each with one of the initial points as its centroid. For each dataset point x that is not already a centroid: Find the centroid y that is closest to x Add x to that centroid’s cluster Re-compute the centroid for that cluster It also requires k points, one for each cluster, to initialize the algorithm. These initial points can be selected at random, or by some a priori method. One approach is to run hierarchical clustering on a small sample taken from the given dataset and then pick the centroids of those resulting clusters. K-medoids clustering The k-medoids clustering algorithm is similar to the k-means algorithm, except that each cluster center, called its medoid, is one of the data points instead of being the mean of its points. The idea is to minimize the average distances from the medoids to points in their clusters. The Manhattan metric is usually used for these distances. Since those averages will be minimal if and only if the distances are, the algorithm is reduced to minimizing the sum of all distances from the points to their medoids. This sum is called the cost of the configuration. Here is the algorithm: Select k points from the dataset to be medoids. Assign each data point to its closest medoid. This defines the k clusters. For each cluster Cj : Compute the sum  s = ∑ j s j , where each sj = ∑{ d (x, yj) : x ∈ Cj } , and change the medoid yj  to whatever point in the cluster Cj that minimizes s If the medoid yj  was changed, re-assign each x to the cluster whose medoid is closest Repeat step 3 until s is minimal. This is illustrated by the simple example in Figure 8.16. It shows 10 data points in 2 clusters. The two medoids are shown as filled points. In the initial configuration it is: C1 = {(1,1),(2,1),(3,2) (4,2),(2,3)}, with y1 = x1 = (1,1) C2 = {(4,3),(5,3),(2,4) (4,4),(3,5)}, with y2 = x10 = (3,5) The sums are s1 = d (x2,y1) + d (x3,y1) + d (x4,y1) + d (x5,y1) = 1 + 3 + 4 + 3 = 11 s2 = d (x6,y1) + d (x7,y1) + d (x8,y1) + d (x9,y1) = 3 + 4 + 2 + 2 = 11 s = s1 + s2  = 11 + 11 = 22 The algorithm at step 3 first part changes the medoid for C1 to y1 = x3 = (3,2). This causes the clusters to change, at step 3 second part, to: C1 = {(1,1),(2,1),(3,2) (4,2),(2,3),(4,3),(5,3)}, with y1 = x3 = (3,2) C2 = {(2,4),(4,4),(3,5)}, with y2 = x10 = (3,5) This makes the sums: s1 = 3 + 2 + 1 + 2 + 2 + 3 = 13 s2 = 2 + 2 = 4 s = s1 + s2  = 13 + 4 = 17 The resulting configuration is shown in the second panel of the figure below: At step 3 of the algorithm, the process repeats for cluster C2. The resulting configuration is shown in the third panel of the above figure. The computations are: C1 = {(1,1),(2,1),(3,2) (4,2),(4,3),(5,3)}, with y1 = x3 = (3,2) C2 = {(2,3),(2,4),(4,4),(3,5)}, with y2 = x8 = (2,4) s = s1 + s2  = (3 + 2 + 1 + 2 + 3) + (1 + 2 + 2) = 11 + 5 = 16 The algorithm continues with two more changes, finally converging to the minimal configuration shown in the fifth panel of the above figure. This version of k-medoid clustering is also called partitioning around medoids (PAM). Affinity propagation clustering One disadvantage of each of the clustering algorithms previously presented (hierarchical, k-means, k-medoids) is the requirement that the number of clusters k be determined in advance. The affinity propagation clustering algorithm does not have that requirement. Developed in 2007 by Brendan J. Frey and Delbert Dueck at the University of Toronto, it has become one of the most widely-used clustering methods. Like k-medoid clustering, affinity propagation selects cluster center points, called exemplars, from the dataset to represent the clusters. This is done by message-passing between the data points. The algorithm works with three two-dimensional arrays: sij = the similarity between xi and xj rik = responsibility: message from xi to xk on how well-suited xk is as an exemplar for xi aik = availability: message from xk to xi on how well-suited xk is as an exemplar for xi Here is the complete algorithm: Initialize the similarities: sij = –d(xi , xj )2 , for i ≠ j; sii = the average of those other sij values 2. Repeat until convergence: Update the responsibilities: rik = sik − max {aij + s ij  : j ≠ k} Update the availabilities: aik = min {0, rkk + ∑j  { max {0, rjk } : j ≠ i ∧ j ≠ k }}, for i ≠ k; akk = ∑j  { max {0, rjk } : j ≠ k } A point xk will be an exemplar for a point xi if aik + rik = maxj {aij + rij}. If you enjoyed this excerpt from the book Java Data Analysis by John R. Hubbard, check out the book to learn how to implement various machine learning algorithms, data visualization and more in Java.
Read more
  • 0
  • 0
  • 22097

article-image-deepminds-alphazero-shows-unprecedented-growth-in-ai-masters-3-different-games
Sugandha Lahoti
07 Dec 2018
3 min read
Save for later

Deepmind’s AlphaZero shows unprecedented growth in AI, masters 3 different games

Sugandha Lahoti
07 Dec 2018
3 min read
Google’s DeepMind introduced AlphaZero last year as a reinforcement learning program that masters three different types of board games, Chess, Shogi and Go to beat world champions in each case. Yesterday, they announced that a full evaluation of AlphaZero has been published in the journal Science, which confirms and updates the preliminary results. The research paper describes how Deepmind’s AlphaZero learns each game from scratch, without any human intervention or no inbuilt domain knowledge but the basic rules of the game. Unlike traditional game playing programs, Deepmind’s AlphaZero uses deep neural networks, a general-purpose reinforcement learning algorithm, and a general-purpose tree search algorithm. The first play by the program is completely random. Over-time the system uses RL algorithms to learn from wins, losses and draws to adjust the parameters of the neural network. The amount of training varies taking approximately 9 hours for chess, 12 hours for shogi, and 13 days for Go. For searching, it uses Monte-Carlo Tree Search (MCTS)  to select the most promising moves in games. Testing and Evaluation Deepmind’s AlphaZero was tested against the best engines for chess (Stockfish), shogi (Elmo), and Go (AlphaGo Zero). All matches were played for three hours per game, plus an additional 15 seconds for each move. AlphaZero was able to beat all its component in each evaluation. Per Deepmind’s blog: In chess, Deepmind’s AlphaZero defeated the 2016 TCEC (Season 9) world champion Stockfish, winning 155 games and losing just six games out of 1,000. To verify the robustness of AlphaZero, it was also played on a series of matches that started from common human openings. In each opening, AlphaZero defeated Stockfish. It also played a match that started from the set of opening positions used in the 2016 TCEC world championship, along with a series of additional matches against the most recent development version of Stockfish, and a variant of Stockfish that uses a strong opening book. In all matches, AlphaZero won. In shogi, AlphaZero defeated the 2017 CSA world champion version of Elmo, winning 91.2% of games. In Go, AlphaZero defeated AlphaGo Zero, winning 61% of games. AlphaZero’s ability to master three different complex games is an important progress towards building a single AI system that can solve a wide range of real-world problems and generalize to new situations. People on the internet are also highly excited about this new achievement. https://twitter.com/DanielKingChess/status/1070755986636488704 https://twitter.com/demishassabis/status/1070786070806192129 https://twitter.com/TrevorABranch/status/1070765877669187584 https://twitter.com/LeonWatson/status/1070777729015013376 https://twitter.com/Kasparov63/status/1070775097970094082 Deepmind’s AlphaFold is successful in predicting the 3D structure of a protein making major inroads for AI use in healthcare. Google makes major inroads into healthcare tech by absorbing DeepMind Health. AlphaZero: The genesis of machine intuition
Read more
  • 0
  • 0
  • 22078

article-image-a-universal-bypass-tricks-cylance-ai-antivirus-into-accepting-all-top-10-malware-revealing-a-new-attack-surface-for-machine-learning-based-security
Sugandha Lahoti
19 Jul 2019
4 min read
Save for later

A universal bypass tricks Cylance AI antivirus into accepting all top 10 Malware revealing a new attack surface for machine learning based security

Sugandha Lahoti
19 Jul 2019
4 min read
Researchers from Skylight Cyber, an Australian cybersecurity enterprise, have tricked Blackberry Cylance’s AI-based antivirus product. They identified a peculiar bias of the antivirus product towards a specific game engine and bypassed it to trick the product into accepting malicious malware files. This discovery means companies working in the field of artificial intelligence-driven cybersecurity need to rethink their approach to creating new products. The bypass is not just limited to Cylance, researchers chose it as it is a leading vendor in the field and is publicly available. The researchers Adi Ashkenazy and Shahar Zini from Skylight Cyber say they can reverse the model of any AI-based EPP (Endpoint Protection Platform) product, and find a bias enabling a universal bypass. Essentially meaning if you could truly understand how a certain model works, and the type of features it uses to reach a decision, you would have the potential to fool it consistently. How did the researchers trick Cylance into thinking bad is good? Cylance’s machine-learning algorithm has been trained to favor a benign file, causing it to ignore malicious code if it sees strings from the benign file attached to a malicious file. The researchers took advantage of this and appended strings from a non-malicious file to a malicious one, tricking the system into thinking the malicious file is safe and avoiding detection. The trick works even if the Cylance engine previously concluded the same file was malicious before the benign strings were appended to it. The Cylance engine keeps a scoring mechanism ranging from -1000 for the most malicious files, and +1000 for the most benign of files. It also whitelists certain families of executable files to avoid triggering false positives on legitimate software. The researchers suspected that the machine learning would be biased toward code in those whitelisted files. So, they extracted strings from an online gaming program that Cylance had whitelisted and appended it to malicious files. The Cylance engine tagged the files benign and shifted scores from high negative numbers to high positive ones. https://youtu.be/NE4kgGjhf1Y The researchers tested against the WannaCry ransomware, Samsam ransomware, the popular Mimikatz hacking tool, and hundreds of other known malicious files. This method proved successful for 100% of the top 10 Malware for May 2019, and close to 90% for a larger sample of 384 malware. “As far as I know, this is a world-first, proven global attack on the ML [machine learning] mechanism of a security company,” told Adi Ashkenazy, CEO of Skylight Cyber to Motherboard, who first reported the news. “After around four years of super hype [about AI], I think this is a humbling example of how the approach provides a new attack surface that was not possible with legacy [antivirus software].” Gregory Webb, chief executive officer of malware protection firm Bromium Inc., told SiliconAngle that the news raises doubts about the concept of categorizing code as “good” or “bad.” “This exposes the limitations of leaving machines to make decisions on what can and cannot be trusted,” Webb said. “Ultimately, AI is not a silver bullet.” Martijn Grooten, a security researcher also added his views to the Cylance Bypass story. He states, “This is why we have good reasons to be concerned about the use of AI/ML in anything involving humans because it can easily reinforce and amplify existing biases.” The Cylance team have now confirmed the global bypass issue and will release a hotfix in the next few days. “We are aware that a bypass has been publicly disclosed by security researchers. We have verified there is an issue which can be leveraged to bypass the anti-malware component of the product. Our research and development teams have identified a solution and will release a hotfix automatically to all customers running current versions in the next few days,” the team wrote in a blog post. You can go through the blog post by Skylight Cyber researchers for additional information. Microsoft releases security updates: a “wormable” threat similar to WannaCry ransomware discovered 25 million Android devices infected with ‘Agent Smith’, a new mobile malware FireEye reports infrastructure-crippling Triton malware linked to Russian government tech institute
Read more
  • 0
  • 0
  • 22074
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-microsoft-acquires-ai-startup-lobe-a-no-code-visual-interface-tool-to-build-deep-learning-models-easily
Natasha Mathur
14 Sep 2018
4 min read
Save for later

Microsoft acquires AI startup Lobe, a no code visual interface tool to build deep learning models easily

Natasha Mathur
14 Sep 2018
4 min read
Microsoft announced yesterday that it has acquired Lobe, a small San Francisco based AI startup. Lobe is a visual Interface tool that allows people to easily create intelligent apps capable of understanding hand gestures, hear music, read handwriting, and more, without any coding involved. Lobe is aimed at making deep learning simple, understandable and accessible to everyone. With the Lobe’s simple visual interface, anyone can develop deep learning and AI models quickly, without having to write any code. A look at Lobe’s features Drag, drop, learn Lobe lets you build custom deep learning models, train them, and ship them directly in your app without any coding required. You can start by dragging in a folder of training examples from your desktop. This lets you build a custom deep learning model and begin its training. Once you’re done with this, you can export a trained model and ship it directly in your app. Connect together smart lobes There are smart building blocks called lobes in Lobe. These lobes can be connected together allowing you to quickly create custom deep learning models. For instance, you can connect the Hand & Face lobe to let you find the most prominent hand in the image. After this, connect the Detect Features lobe to find the important features in the hand. Finally, you can connect the Generate Labels lobe to predict the emoji in the image. You can also refine your model by adjusting each lobes unique settings or by editing any lobe’s sub-layers. Exploring dataset visually With Lobe, you can have your entire dataset displayed visually. This helps you browse and sort through all your examples. All you have to do is select any icon and see how that example performs in your model. Your dataset gets automatically split into a Lesson which teaches your model during training. There’s also a Test used that evaluates how your model will perform in the real world on examples that have never been seen before. Real-time training results Lobe comes with super fast cloud training that provides real-time results without slowing down your computer.  There are interactive charts which help you monitor the accuracy of your model and understand how the model improves over time. The best accuracy then automatically gets selected and saved. Advanced control over every layer Lobe is built on top of the deep learning frameworks TensorFlow and Keras. This allows you to control every layer of your model. With Lobe, you can tune hyperparameters, add layers, and design new architectures with the help of hundreds of advanced building block lobes. Ship it in your application After you’re done training your model, it can be exported to TensorFlow or CoreML which you can then run directly into your app. There’s also an easy-to-use Lobe Developer API, which lets you host your model in the cloud and integrate it into your app. What could Microsoft’s plans be with this acquisition? This is not the first AI startup acquired by Microsoft. Other than Lobe, Microsoft also acquired Bonsai.ai, a deep reinforcement learning platform, in July to build machine learning models for autonomous systems of all kinds. Similarly, Microsoft acquired Semantic Machines this May to build a conversational AI center of excellence in Berkeley to advance the state of conservational AI. “Over the last few months, we’ve made multiple investments in companies to further this (expanding its growth in AI) goal. These are just two recent examples of investments we have made to help us accelerate the current state of AI development”, says Kevin Scott, EVP, and CTO at Microsoft, in yesterday’s announcement on their official blog. Looks like Microsoft is all set on bringing more AI capabilities to its users. In fact, major tech firms around the world are walking along the same path and acquiring as many technology companies as they can. For instance, Amazon acquired AI cybersecurity startup Sqrrl, Facebook acquired Bloomsbury AI, and Intel acquired Vertex.ai earlier this year. “In many ways though, we’re only just beginning to tap into the full potential AI can provide. This in large part is because AI development and building deep learning models are slow and complex processes even for experienced data scientists and developers. To date, many people have been at a disadvantage when it comes to accessing AI, and we’re committed to changing that” writes Kevin. For more information, check out the official Microsoft Announcement. Say hello to IBM RXN, a free AI Tool in IBM Cloud for predicting chemical reactions Google’s new What-if tool to analyze Machine Learning models and assess fairness without any coding
Read more
  • 0
  • 0
  • 21902

article-image-graphql-api-is-now-generally-available
Amrata Joshi
17 Jul 2019
3 min read
Save for later

GraphQL API is now generally available

Amrata Joshi
17 Jul 2019
3 min read
Last month, the team at Fauna, provider of FaunaDB, the cloud-first database announced the general availability of its GraphQL API, a query language for APIs. With the support for GraphQL, FaunaDB now provides cloud database services in the market and allows developers to use any API of choice to manipulate all their data. GraphQL also helps developers with their productivity by enabling fast, easy development of serverless applications. It makes FaunaDB the only serverless backend that has support for universal database access. Matt Biilmann, CEO at Netlify, a Fauna partner said, “Fauna’s GraphQL support is being introduced at a perfect time as rich, serverless apps are disrupting traditional development models.” Biilmann added, “GraphQL is becoming increasingly important to the entire developer community as they continue to leverage JAMstack and serverless to simplify cloud application development. We applaud Fauna’s work as the first company to bring a serverless GraphQL database to market.” GraphQL helps developers in specifying the shape of the data they need without requiring changes to the backend components that provide data. The GraphQL API in FaunaDB helps teams in collaborating smoothly and allows back-end teams to focus on security and business logic, and helps front-end teams to concentrate on presentation and usability.  In 2017, the global serverless architecture market was valued at $3.46 billion in 2017 and is expected to reach $18.04 billion by 2024 as per the Zion Research. GraphQL brings growth and development to serverless development so developers can look for back-end GraphQL support like the one found in FaunaDB. The GraphQL API also supports three general functions: Queries, Mutations, and Subscriptions and currently, FaunaDB natively supports Queries and Mutations.  FaunaDB's GraphQL API provides developers with uniform access to transactional consistency, quality of service (QoS), user authorization, data access, and temporal storage. No limits on data history FaunaDB is the only database that provides support without any limits on data history. Any API such as SQL in FaunaDB can return data at any given time. Consistency FaunaDB provides the highest consistency levels for its transactions that are automatically applied to all APIs. Authorization FaunaDB provides access control at the row level which is applicable to all APIs, be it GraphQL or SQL. Shared data access It also features shared data access, so the data which is written by one API (e.g., GraphQL) can be read and modified by another API such as FQL.  To know more about the news, check out the press release. 7 reasons to choose GraphQL APIs over REST for building your APIs Best practices for RESTful web services : Naming conventions and API Versioning [Tutorial] Implementing routing with React Router and GraphQL [Tutorial]
Read more
  • 0
  • 0
  • 21844

article-image-deepminds-alphastar-ai-agent-will-soon-anonymously-play-with-european-starcraft-ii-players
Sugandha Lahoti
11 Jul 2019
4 min read
Save for later

DeepMind's Alphastar AI agent will soon anonymously play with European StarCraft II players

Sugandha Lahoti
11 Jul 2019
4 min read
Earlier this year, DeepMind’s AI Alphastar defeated two professional players at StarCraft II, a real-time strategy video game. Now, European Starcraft II players will get a chance to face off experimental versions of AlphaStar, as part of ongoing research into AI. https://twitter.com/MaxBakerTV/status/1149067938131054593 AlphaStar learns by imitating the basic micro and macro-strategies used by players on the StarCraft ladder. A neural network was trained initially using supervised learning from anonymised human games released by Blizzard. Once the agents get trained from human game replays, they’re then trained against other competitors in the “AlphaStar league”. This is where a multi-agent reinforcement learning process starts. New competitors are added to the league (branched from existing competitors). Each of these agents then learns from games against other competitors. This ensures that each competitor performs well against the strongest strategies, and does not forget how to defeat earlier ones. Anyone who wants to participate in this experiment will have to opt into the chance to play against the StarCraft II program. There will be an option provided in the in-game pop-up window. Users can alter their opt-in selection at any time. To ensure anonymity, all games will be blind test matches. European players that opt-in won't know if they've been matched up against AlphaStar. This will help ensure that all games are played under the same conditions, as players may tend to react differently when they know they’re against an AI. A win or a loss against AlphaStar will affect a player’s MMR (Matchmaking Rating) like any other game played on the ladder. "DeepMind is currently interested in assessing AlphaStar’s performance in matches where players use their usual mix of strategies," Blizzard said in its blog post. "Having AlphaStar play anonymously helps ensure that it is a controlled test, so that the experimental versions of the agent experience gameplay as close to a normal 1v1 ladder match as possible. It also helps ensure all games are played under the same conditions from match to match." Some people have appreciated the anonymous testing feature. A Hacker News user commented, “Of course the anonymous nature of the testing is interesting as well. Big contrast to OpenAI's public play test. I guess it will prevent people from learning to exploit the bot's weaknesses, as they won't know they are playing a bot at all. I hope they eventually do a public test without the anonymity so we can see how its strategies hold up under focused attack.” Others find it interesting to see what happens if players know they are playing against AlphaStar. https://twitter.com/hardmaru/status/1149104231967842304   AlphaStar will play in Starcraft’s three in-universe races (Terran, Zerg, or Protoss). Pairings on the ladder will be decided according to normal matchmaking rules, which depend on how many players are online while AlphaStar is playing. It will not be learning from the games it plays on the ladder, having been trained from human replays and self-play. The Alphastar will also use a camera interface and more restricted APMs. Per the blog post, “AlphaStar has built-in restrictions, which cap its effective actions per minute and per second. These caps, including the agents’ peak APM, are more restrictive than DeepMind’s demonstration matches back in January, and have been applied in consultation with pro players.” https://twitter.com/Eric_Wallace_/status/1148999440121749504 https://twitter.com/Liquid_MaNa/status/1148992401157054464   DeepMind will be benchmarking the performance of a number of experimental versions of AlphaStar to enable DeepMind to gather a broad set of results during the testing period. DeepMind will use a player’s replays and the game data (skill level, MMR, the map played, race played, time/date played, and game duration) to assess and describe the performance of the AlphaStar system. However, Deepmind will remove identifying details from the replays including usernames, user IDs and chat histories. Other identifying details will be removed to the extent that they can be without compromising the research DeepMind is pursuing. For now, AlphaStar agents will play only in Europe. The research results will be released in a peer-reviewed scientific paper along with replays of AlphaStar’s matches. Google DeepMind’s AI AlphaStar beats StarCraft II pros TLO and MaNa; wins 10-1 against the gamers Deepmind’s AlphaZero shows unprecedented growth in AI, masters 3 different games Deepmind’s AlphaFold is successful in predicting the 3D structure of a protein making major inroads for AI use in healthcare
Read more
  • 0
  • 0
  • 21796

article-image-sony-aibo-robot
Abhishek Jha
01 Nov 2017
3 min read
Save for later

Sony resurrects robotic pet Aibo with advanced AI

Abhishek Jha
01 Nov 2017
3 min read
A decade back when CEO Howard Stringer decided to discontinue Sony’s iconic entertainment robot AIBO, its progenitor Toshitada Doi had famously staged a mock funeral lamenting, more than Aibo’s disbandment, the death of Sony’s risk-taking spirit. Today as the Japanese firm’s sales have soared to a decade high beating projected estimates, Aibo is back from the dead. The revamped pet looks cuter than ever before, after nearly a decade of hold. And it has been infused with a range of sensors, cameras, microphones and upgraded artificial intelligence features. The new Aibo is an ivory-white, plastic-covered hound which even has the ability to connect to mobile networks. Using actuators, it can move its body remarkably well, while using two OLED panels in eyes to exhibit an array of expressions. Most importantly, it comes with a unique ‘adaptive’ behavior that includes being able to actively recognize its owner and running over to them, learning and interacting in the process – detecting smiles and words of praises – with all those head and back scratches. In short, a dog in real without canine instincts. Priced at around $1,735 (198,000 Yen), Aibo includes a SIM card slot to connect to internet and access Sony’s AI cloud to analyze and learn how other robot dogs are behaving on the network. Sony says it does not intend to replace a digital assistant like Google Home but that Aibo could be a wonderful companion for children and families, forming an “emotional bond” with love, affection, and joy. The cloud service that powers Aibo’s AI is however expensive, and a basic three-year subscription plan is priced at $26 (2,980 Yen) per month. Or you could sign up upfront for three years at around $790 (90,000 Yen). As far as the battery life is concerned, the robot will take three hours to fully charge itself once it gets dissipated after two hours of activity. “It was a difficult decision to stop the project in 2006, but we continued development in AI and robotics,” Sony CEO Kazuo Hirai said speaking at a launch event. “I asked our engineers a year and a half ago to develop Aibo because I strongly believe robots capable of building loving relationships with people help realize Sony’s mission.” When Sony had initially launched AIBO in 1999, it was well ahead of its time. But after the initial euphoria, the product somehow failed to get mainstream buyers as reboots after reboots failed to generate profits. That time clearly Sony had to make a decision as its core electronics business struggled in price wars. Today, times are different – AI fever has gripped the tech world. A plastic bone (‘aibone’) for the robotic dog costs you around 2,980 Yen. And that’s the price you pay for a keeping a robotic buddy around. The word “aibo” literally means a companion after all.
Read more
  • 0
  • 0
  • 21738
article-image-apache-spark-2-4-0-released
Amrata Joshi
09 Nov 2018
2 min read
Save for later

Apache Spark 2.4.0 released

Amrata Joshi
09 Nov 2018
2 min read
Last week, Apache Spark released its latest version, Apache Spark 2.4.0. It is the fifth release in the 2.x line. This release comes with Barrier Execution Mode for better integration with deep learning frameworks. Apache Spark 2.4.0 brings 30+ built-in and higher-order functions to deal with complex data types. These functions work with  Scala 2.12 and improve the K8s (Kubernetes) integration. This release also focuses on usability, stability, and polish while resolving around 1100 tickets. What’s new in Apache Spark 2.4.0? Built-in Avro data source Image data source Flexible streaming sinks Elimination of the 2GB block size limitation during transfer Pandas UDF improvements Major changes Apache Spark 2.4.0 supports Barrier Execution Mode in the scheduler, for better integration with deep learning frameworks. One can now build Spark with Scala 2.12 and write Spark applications in Scala 2.12. Apache Spark 2.4.0 supports Spark-Avro package with logical type support for better performance and usability. Some users are SQL experts but aren’t much aware of Scala/Python or R. Thus, this version of Apache comes with support for Pivot. Apache Spark 2.4.0 has added Structured Streaming ForeachWriter for Python. This lets users write ForeachWriter code in Python, that is, they can use the partitionId and the version/batchId/epochId to conditionally process rows. This new release has also introduced Spark data source for the image format. Users can now load images through the Spark source reader interface. Bug fixes: The LookupFunctions are used to check the same function name again and again. This version includes a latest LookupFunctions rule which performs a check for each invocation. A PageRank change in the Apache Spark 2.3 introduced a bug in the ParallelPersonalizedPageRank implementation. This change prevents serialization of a Map which needs to be broadcast to all workers. This issue has been resolved with the release of Apache Spark 2.4.0 Read more about Apache Spark 2.4.0 on the official website of Apache Spark. Building Recommendation System with Scala and Apache Spark [Tutorial] Apache Spark 2.3 now has native Kubernetes support! Implementing Apache Spark K-Means Clustering method on digital breath test data for road safety
Read more
  • 0
  • 0
  • 21728

article-image-openai-charter-safety-standards-transparency
Richard Gall
10 Apr 2018
3 min read
Save for later

OpenAI charter puts safety, standards, and transparency first

Richard Gall
10 Apr 2018
3 min read
OpenAI, the non-profit that promotes the development of artificial intelligence, has released a charter. In it, the organization outlines the core principles it believes should govern the development and management of artificial intelligence. The OpenAI charter represents an important step in initiating a broader discussion around the ethical considerations of artificial intelligence. Revealed in a short blog post, the organization explains that the OpenAI charter is a summation of the development of its strategy over the last two years. Its mission remains central to the charter, however: ensuring that the development of artificial intelligence benefits all of humanity.  What's inside the OpenAI charter? The charter is then broken down into 4 other areas. Broadly-distributed benefits - OpenAI claims its primary duty is to humanity Long-term safety Technical leadership - OpenAI places itself at the cutting edge of the technology that will drive AI forward Cooperative orientation - working with policy-makers and institutions Core concerns the OpenAI charter aims to address A number of core concerns lie at the heart of the charter. One of the most prominent is what OpenAI see as the competitive race to create AGI "without time for adequate safety precautions". It's because of this that OpenAI seeks cooperation with "other research and policy institutions" - essentially ensuring that AI doesn't become a secretive corporate arms race. Clearly, for OpenAI, transparency will be key to creating artificial intelligence that is 'safe'. OpenAI also claims it will publish its most recent AI research. But perhaps even more interestingly, the charter goes on to say that "we expect that safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research." There appears to be a tacit recognition of a tension between innovation of AI and the ethics around such innovations. A question nevertheless remains over how easy it is for an organization to be at the cutting-edge of AI technology, while taking part in conversations around safety and ethics. As the last decade of technical development has proved, innovation and standards sometimes seem to be diametrically opposed, rather than in support of one another. This might be important in moving beyond that apparent opposition. 'If tech is building the future, let’s make that future inclusive and representative of all of society’ – An interview with Charlotte Jee What your organization needs to know about GDPR 20 lessons on bias in machine learning systems by Kate Crawford at NIPS 2017 Mark Zuckerberg’s Congressional testimony: 5 things we learned The Cambridge Analytica scandal and ethics in data science
Read more
  • 0
  • 0
  • 21722

article-image-european-union-fined-google-1-49-billion-euros-for-antitrust-violations-in-online-advertising
Amrata Joshi
22 Mar 2019
3 min read
Save for later

European Union fined Google 1.49 billion euros for antitrust violations in online advertising

Amrata Joshi
22 Mar 2019
3 min read
On Wednesday, European authorities fined Google 1.49 billion euros for antitrust violations in online advertising and it seems to be the third antitrust fine by the European Union against Google since 2017. As per the regulators,  Google had imposed unfair terms on companies that used its search bar on their websites in Europe. Google has been abusing its power in its Android mobile phone operating system, shopping comparison services, and now search adverts. Last year, EU competition commissioner Margrethe Vestager had fined Google €4.34 billion for using its Android mobile operating system for unfairly keeping its rivals away in the mobile phone market. Two years ago, Google was fined 2.4 billion euros for unfairly favoring its own shopping services over those of its rivals. Newspaper websites or blog aggregators usually have a search function embedded to them. When a user searches something on this search function, the website provides search results and search adverts that appear alongside the search result. Google uses AdSense for Search, that provides the search adverts to the owner of the publisher websites. Google acts as an advertising broker, between advertisers and website owners that provide the space. AdSense also works as an online search advertising broker platform. Google has been at the top in online search advertising intermediation in the European Economic Area (EEA), with a market share of more than 70% from 2006 to 2016. Last year Google held nearly 75.8% and this year it’s already 77.8%. There is constant growth happening in Google’s search ad market. And it is impossible for competitors such as Microsoft and Yahoo to sell advertising space in Google's own search engine results pages. So, they need to work with third-party websites to grow their business and compete with Google. In 2006, Google had included exclusivity clauses in its contracts that prohibit the publishers from placing any search adverts from competitors on their search results pages. In March 2009, Google started to replace the exclusivity clauses with “Premium Placement” clauses. According to these clauses, the publishers had to reserve the most profitable space on their search results pages for Google's adverts and further request a minimum number of Google adverts. This, in turn, affected Google's competitors as they got restricted from placing their search adverts in the most visible and clickable parts of the websites' search results pages. It got more difficult for the competitors when Google included the clauses that would require publishers to seek written approval from Google before making any changes to the way in which the rival adverts were displayed. Google has control over how attractive the competing search adverts would be. Google also imposed an exclusive supply obligation, which would prevent competitors from placing any search adverts on the most significant websites. The company gave the most valuable positions to its adverts and also controlled the performance of the rivals’ adverts. European Commission found that Google's conduct harmed competition and consumers, and affected innovation. Google might face civil actions before the courts of the Member States for damages suffered by any person or business because of its anti-competitive behaviour. To know more about this news, check out the official press release. Google announces Stadia, a cloud-based game streaming service, at GDC 2019 Google is planning to bring Node.js support to Fuchsia Google Open-sources Sandboxed API, a tool that helps in automating the process of porting existing C and C++ code    
Read more
  • 0
  • 0
  • 21704
article-image-this-ai-generated-animation-can-dress-like-humans-using-deep-reinforcement-learning
Prasad Ramesh
02 Nov 2018
4 min read
Save for later

This AI generated animation can dress like humans using deep reinforcement learning

Prasad Ramesh
02 Nov 2018
4 min read
In a paper published this month, the human motions to wear clothes is synthesized in animation with reinforcement learning. The paper named Learning to Dress: Synthesizing Human Dressing Motion via Deep Reinforcement Learning was published yesterday. The team is made up of two Ph.D. students from The Georgia Institute of Technology, two of its professors and a researcher from Google Brain. Understanding the dressing problem Dressing, putting on a t-shirt or a jacket is something we do every day. Yet it is a computationally costly and complex task for a machine to perform or be simulated by computers. Techniques in physics simulation and machine learning are used in this paper to simulate an animation. A physics engine is used to simulate character motion and cloth motion. On the other hand deep reinforcement learning on a neural network is used to produce character motion. Physics engine and reinforcement learning on a neural network The authors of the paper introduce a salient representation of haptic information to guide the dressing process. Then the haptic information is used in the reward function to provide learning signals when training the network. As the task is too complex to do perform in one go, the dressing task is separated into several subtasks for better control. A policy sequencing algorithm is introduced to match the distribution of output states from one task to the input distribution for the next task. The same approach is used to produce character controllers for various dressing tasks like wearing a t-shirt, wearing a jacket, and robot-assisted dressing of a sleeve. Dressing is complex, split into several subtasks The approach taken by the authors splits the dressing task into a sequence of subtasks. Then a state machine guides the between these tasks. Dressing a jacket, for example, consists of four subtasks: Pulling the sleeve over the first arm. Moving the second arm behind the back to get in position for the second sleeve. Putting hand in the second sleeve. Finally, returning the body back to a rest position. A separate reinforcement learning problem is formulated for each subtask in order to learn a control policy. The policy sequencing algorithm ensures that these individual control policies can lead to a successful dressing sequence on being executed sequentially. The algorithm matches the initial state of one subtask with the final state of the previous subtask in the sequence. A variety of successful dressing motions can be produced by applying the resultant control policies. Each subtask in the dressing task is formulated as a partially observable Markov Decision Process (POMDP). Character dynamics are simulated with Dynamic Animation and Robotics Toolkit (DART) and cloth dynamics with NVIDIA PhysX. Conclusion and room for improvement A system that learns to animate a character that puts on clothing is successfully created with the use of deep reinforcement learning and physics simulation. From the subtasks, the system learns each sub-task individually, then connects them with a state machine. It was found that carefully selecting the cloth observations and the reward functions were important factors for the success of the approach taken. This system currently performs only upper body dressing. For lower body, a balance into the controller would be required. The number of subtasks might reduce on using a control policy architecture with memory. This will allow for greater generalization of the skills learned. You can read the research paper at the Georgia Institute of Technology website. Facebook launches Horizon, its first open source reinforcement learning platform for large-scale products and services Deep reinforcement learning – trick or treat? Google open sources Active Question Answering (ActiveQA), a Reinforcement Learning based Q&A system
Read more
  • 0
  • 0
  • 21586

article-image-how-deep-neural-networks-can-improve-speech-recognition-and-generation
Sugandha Lahoti
02 Feb 2018
7 min read
Save for later

How Deep Neural Networks can improve Speech Recognition and generation

Sugandha Lahoti
02 Feb 2018
7 min read
While watching your favorite movie or TV show, you must have found it difficult to sometimes decipher what the characters are saying, especially if they are talking really fast, or well, you’re seeing a show in the language you don’t know. You quickly add subtitles and voila, the problem is solved. But, do you know how these subtitles work? Instead of a person writing them, a computer automatically recognizes speech and the dialogues of the characters and generates scripts. However, this is just a trivial example of what computers and neural networks can do in the field of speech understanding and generation. Today, we’re gonna talk about the achievements of deep neural networks to improve the ability of our computing systems to understand and generate human speech. How traditional speech recognition systems work Traditionally speech recognition models used classification algorithms to arrive at a distribution of possible phonemes for each frame. These classification algorithms were based on highly specialized features such as MFCC. Hidden Markov Models (HMM) were used in the decoding phase. This model was accompanied with a pre-trained language model and was used to find the most likely sequence of phones that can be mapped to output words. With the emergence of deep learning, neural networks were used in many aspects of speech recognition such as phoneme classification, isolated word recognition,  audiovisual speech recognition, audio-visual speaker recognition and speaker adaptation. Deep learning enabled the development of Automatic Speech Recognition (ASR) systems. These ASR systems require separate models, namely acoustic model (AM), a pronunciation model (PM) and a language model (LM).  The AM is typically trained to recognize context-dependent states or phonemes, by bootstrapping from an existing model which is used for alignment. The PM maps the sequences of phonemes produced by the AM into word sequences. Word sequences are scored using LM trained on large amounts of text data, which estimate probabilities of word sequences. However, training independent components added complexities and was suboptimal compared to training all components jointly. This called for developing end-to-end systems in the ASR community, those which attempt to learn the separate components of an ASR jointly as a single system. A single system Speech recognition model The end-to-end trained neural networks can essentially recognize speech, without using an external pronunciation lexicon, or a separate language model. End-to-end trained systems can directly map the input acoustic speech signal to word sequences. In such sequence-to-sequence models, the AM, PM, and LM are trained jointly in a single system. Since these models directly predict words, the process of decoding utterances is also greatly simplified. The end-to-end ASR systems do not require bootstrapping from decision trees or time alignments generated from a separate system. Thereby making the training of such models simpler than conventional ASR systems. There are several sequence-to-sequence models including connectionist temporal classification (CTC), and recurrent neural network (RNN) transducer, an attention-based model etc. CTC models are used to train end-to-end systems that directly predict grapheme sequences. This model was proposed by Graves et al. as a way of training end-to-end models without requiring a frame-level alignment of the target labels for a training statement.  This basic CTC model was extended by Graves to include a separate recurrent LM component, in a model referred to as the recurrent neural network (RNN) transducer.  The RNN transducer augments the encoder network from the CTC model architecture with a separate recurrent prediction network over the output symbols. Attention-based models are also a type of end-to-end sequence models. These models consist of an encoder network, which maps the input acoustics into a higher-level representation. They also have an attention-based decoder that predicts the next output symbol based on the previous predictions. A schematic representation of various sequence-to-sequence modeling approaches Google’s Listen-Attend-Spell (LAS) end-to-end architecture is one such attention-based model. Their end-to-end system achieves a word error rate (WER) of 5.6%, which corresponds to a 16% relative improvement over a strong conventional system which achieves a 6.7% WER. Additionally, the end-to-end model used to output the initial word hypothesis, before any hypothesis rescoring, is 18 times smaller than the conventional model. These sequence-to-sequence models are comparable with traditional approaches on dictation test sets. However, the traditional models outperform end-to-end systems on voice-search test sets. Future work is being done on building optimal models for voice-search tests as well. More work is also expected in building multi-dialect and multi-lingual systems. So that data for all dialects/languages can be combined to train one network, without the need for a separate AM, PM, and LM for each dialect/language. Enough with understanding speech. Let’s talk about generating it Text-to-speech (TTS) conversion, i.e generating natural sounding speech from text, or allowing people to converse with machines has been one of the top research goals in the present times. Deep Neural networks have greatly improved the overall development of a TTS system, as well as enhanced individual pieces of such a system. In 2012, Google first used Deep Neural Networks (DNN) instead of Gaussian Mixture Model (GMMs), which were then used as the core technology behind TTS systems. DNNs assessed sounds at every instant in time with increased speech recognition accuracy.  Later, better neural network acoustic models were built using CTC and sequence discriminative training techniques based on RNNs. Although being blazingly fast and accurate,  these TTS systems were largely based on concatenative TTS, where a very large database of short speech fragments was recorded from a single speaker and then recombined to form complete utterances. This led to the development of parametric TTS, where all the information required to generate the data was stored in the parameters of the model, and the contents and characteristics of the speech were controlled via the inputs to the model. WaveNet further enhanced these parametric models by directly modeling the raw waveform of the audio signal, one sample at a time. WaveNet yielded more natural-sounding speech using raw waveforms and was able to model any kind of audio, including music. Baidu then came with their Deep Voice TTS system constructed entirely from deep neural networks. Their system was able to do audio synthesis in real-time, giving up to 400X speedup over previous WaveNet inference implementations. Google, then released Tacotron, an end-to-end generative TTS model that synthesized speech directly from characters. Tacotron was able to achieve a 3.82 mean opinion score (MOS), outperforming the traditional parametric system in terms of speech naturalness. Tacotron was also considerably faster than sample-level autoregressive methods because of its ability to generate speech at the frame level. Most recently, Google has released Tacotron 2 which took inspiration from past work on Tacotron and WaveNet. It features a tacotron style, recurrent sequence-to-sequence feature prediction network that generates mel spectrograms. Followed by a modified version of WaveNet which generates time-domain waveform samples conditioned on the generated mel spectrogram frames. The model achieved a MOS of 4.53 compared to a MOS of 4.58 for professionally recorded speech.   Deep Neural Networks have been a strong force behind the developments of end-to-end speech recognition and generation models. Although these end-to-end models have compared substantially well against the classical approaches, more work is to be done still. As of now, end-to-end speech models cannot process speech in real time. Real-time speech processing is a strong requirement for latency-sensitive applications such as voice search. Hence more progress is expected in such areas. Also, end-to-end models do not give expected results when evaluated on live production data. There is also difficulty in learning proper spellings for rarely used words such as proper nouns. This is done quite easily when a separate PM is used. More efforts will need to be made to address these challenges as well.
Read more
  • 0
  • 0
  • 21564
Modal Close icon
Modal Close icon