Data | Tech News, Tutorials & Expert Insights

article-image-salesforce-open-sources-transmogrifai-automated-machine-learning-library

17 Aug 2018

2 min read

Salesforce Einstein team open sources TransmogrifAI, their automated machine learning library

17 Aug 2018

Salesforce has open sourced TransmogrifAI, their end-to-end automated machine learning library for structured data. This library is currently used in production to help power Salesforce Einstein AI platform. TransmogrifAI enables data scientists at Salesforce to transform customer data into meaningful, actionable predictions. Now, they have open-sourced this project to enable other developers and data scientists to build machine learning solutions at scale, fast. TransmogrifAI is built on Scala and SparkML that automates data cleansing, feature engineering, and model selection to arrive at a performant model. It encapsulates five main components of the machine learning process: Source: Salesforce Engineering Feature Inference: TransmogrifAI allows users to specify a schema for their data to automatically extract the raw predictor and response signals as “Features”. In addition to allowing for user-specified types, TransmogrifAI also does inference of its own. The strongly-typed features allow developers to catch a majority of errors at compile-time rather than run-time. Transmogrification or automated feature engineering: TransmogrifAI comes with a myriad of techniques for all the supported feature types ranging from phone numbers, email addresses, geo-location to text data. It also optimizes the transformations to make it easier for machine learning algorithms to learn from the data. Automated Feature Validation: TransgmogrifAI has algorithms that perform automatic feature validation to remove features with little to no predictive power. These algorithms are useful when working with high dimensional and unknown data. They apply statistical tests based on feature types, and additionally, make use of feature lineage to detect and discard bias. Automated Model Selection: The TransmogrifAI Model Selector runs several different machine learning algorithms on the data and uses the average validation error to automatically choose the best one. It also automatically deals with the problem of imbalanced data by appropriately sampling the data and recalibrating predictions to match true priors. Hyperparameter Optimization: It automatically tunes hyperparameters and offers advanced tuning techniques. This large-scale automation has brought down the total time taken to train models from weeks and months to a few hours with just a few lines of code. You can check out the project to get started with TransmogrifAI. For detailed information, read the Salesforce Engineering Blog. Salesforce Spring 18 – New features to be excited about in this release! How to secure data in Salesforce Einstein Analytics How to create and prepare your first dataset in Salesforce Einstein

0
0
53540

article-image-youtube-bans-dangerous-pranks-and-challenges

Prasad Ramesh

17 Jan 2019

2 min read

YouTube bans dangerous pranks and challenges

Prasad Ramesh

17 Jan 2019

2 min read

YouTube updates its policies to ban dangerous pranks and challenges that can be harmful to the victim of a prank or encourages people to partake in dangerous behavior. Pranks and challenges have been around on YouTube for a long time. Many of the pranks are entertaining and harmless, some challenges potentially unsafe like an extreme food eating challenge. Recently, the “Bird Box Challenge” has been popular inspired after the Netflix movie Bird Box. The challenge is to perform difficult tasks, like driving a car, blindfolded. This challenge has received media coverage not for the entertainment value but for the dangers involved. It has caused many accidents where people take this challenge. What is banned on YouTube? In the light of this challenge being harmful and dangerous to lives, YouTube bans certain content by updating its policies page. Primarily, it has banned three kinds of pranks: Challenges that can cause serious danger to life or cause death Pranks that lead the victims to believe that they’re in serious physical danger Any pranks that cause severe emotional distress in children They state in their policies page: “YouTube is home to many beloved viral challenges and pranks, but we need to make sure what’s funny doesn’t cross the line into also being harmful or dangerous.” What are the terms? Other than the points listed above there is no clear or exhaustive list of the kind of activities that are banned. The YouTube moderators may take a call to remove a video. In the next two months, YouTube will be removing any existing content that falls into this radar, however, content creators will not receive a strike. Going forward, any new content that may have objectionable content as per their policies will get the channel a ‘strike’. Three strikes in the span of three months will lead to the channel’s termination. Questionable content includes custom thumbnails or external links that display pornographic, graphic violent, malware, or spam content. So now you are less likely to see videos on driving blindfolded or eating tide pods. Google Chrome announces an update on its Autoplay policy and its existing YouTube video annotations Is the YouTube algorithm’s promoting of #AlternativeFacts like Flat Earth having a real-world impact? Worldwide Outage: YouTube, Facebook, and Google Cloud goes down affecting thousands of users

0
0
50715

article-image-microsoft-ai-toolkit-connect-2017

Sugandha Lahoti

17 Nov 2017

3 min read

Microsoft showcases its edgy AI toolkit at Connect(); 2017

Sugandha Lahoti

17 Nov 2017

3 min read

At the ongoing Microsoft Connect(); 2017, Microsoft has unveiled their latest innovations in AI development platforms. The Connect(); conference this year is all about developing new tools and cloud services that help developers seize the growing opportunity around artificial intelligence and machine learning. Microsoft has made two major announcements to capture the AI market. Visual Studio Tools for AI Microsoft has announced new tools for its Visual Studio IDE specific for building AI applications. Visual Studio Tools for AI is currently in the beta stage and is an extension to the Visual Studio 2017. It allows developers, data scientists, and machine learning engineers to embed deep learning models into applications. They also have built-in support for popular machine learning frameworks such as Microsoft Cognitive Toolkit (CNTK), Google TensorFlow, Caffe2, and MXNet. It also comes packed with features such as custom metrics, history tracking, enterprise-ready collaboration, and data science reproducibility and auditing. Visual Studio Tools for AI allows interactive debugging of deep learning applications with built-in features like syntax highlighting, IntelliSense and text auto formatting. Training of AI models on the cloud is also possible using the integration with Azure Machine Learning. This integration also allows deploying a model into production. Visualization and monitoring of AI models is available using TensorBoard, which is an integrated open tool and can be run both locally and in remote VMs. Azure IoT Edge Microsoft sees IoT as a mission-critical business asset. With this in mind, they have developed a product for IoT solutions. Termed as Azure IoT Edge, it enables developers to run cloud intelligence on the edge of IoT devices. Azure IoT Edge can operate on Windows and Linux as well as on multiple hardware architectures (x64 and ARM). Developers can work on languages such as C#, C and Python to deploy models on Azure IoT Edge. The Azure IoT edge is a bundle of multiple components. With AI Toolkit, developers can start building AI applications. With Azure Machine learning, AI applications can be created, deployed, and managed with the toolkit on any framework. Azure Machine Learning also includes a set of pre-built AI models for common tasks. In addition, using the Azure IoT Hub, developers can deploy Edge modules on multiple IoT Edge devices. Using a combination of Azure Machine Learning, Azure Stream Analytics, Azure Functions, and any third-party code, a complex data pipeline can be created to build and test container-based workloads. This pipeline can be managed using the Azure IoT Hub. The customer reviews on Azure IoT edge have been positive up till now. Here’s what Matt Boujonnier, Analytics Application Architect at Schneider Electric says: "Azure IoT Edge provided an easy way to package and deploy our Machine Learning applications. Traditionally, machine learning is something that has only run in the cloud, but for many IoT scenarios that isn’t good enough, because you want to run your application as close as possible to any events. Now we have the flexibility to run it in the cloud or at the edge—wherever we need it to be." With the launch of these two new tools, Microsoft is catching up quickly with the likes of Google and IBM to capture the AI market and providing developers with an intelligent edge.

0
0
47555

article-image-sherin-thomas-explains-how-to-build-a-pipeline-in-pytorch-for-deep-learning-workflows

Packt Editorial Staff

09 May 2019

8 min read

Sherin Thomas explains how to build a pipeline in PyTorch for deep learning workflows

Packt Editorial Staff

09 May 2019

8 min read

A typical deep learning workflow starts with ideation and research around a problem statement, where the architectural design and model decisions come into play. Following this, the theoretical model is experimented using prototypes. This includes trying out different models or techniques, such as skip connection, or making decisions on what not to try out. PyTorch was started as a research framework by a Facebook intern, and now it has grown to be used as a research or prototype framework and to write an efficient model with serving modules. The PyTorch deep learning workflow is fairly equivalent to the workflow implemented by almost everyone in the industry, even for highly sophisticated implementations, with slight variations. In this article, we explain the core of ideation and planning, design and experimentation of the PyTorch deep learning workflow. This article is an excerpt from the book PyTorch Deep Learning Hands-On by Sherin Thomas and Sudhanshi Passi. This book attempts to provide an entirely practical introduction to PyTorch. This PyTorch publication has numerous examples and dynamic AI applications and demonstrates the simplicity and efficiency of the PyTorch approach to machine intelligence and deep learning. Ideation and planning Usually, in an organization, the product team comes up with a problem statement for the engineering team, to know whether they can solve it or not. This is the start of the ideation phase. However, in academia, this could be the decision phase where candidates have to find a problem for their thesis. In the ideation phase, engineers brainstorm and find the theoretical implementations that could potentially solve the problem. In addition to converting the problem statement to a theoretical solution, the ideation phase is where we decide what the data types are and what dataset we should use to build the proof of concept (POC) of the minimum viable product (MVP). Also, this is the stage where the team decides which framework to go with by analyzing the behavior of the problem statement, available implementations, available pretrained models, and so on. This stage is very common in the industry, and I have come across numerous examples where a well-planned ideation phase helped the team to roll out a reliable product on time, while a non-planned ideation phase destroyed the whole product creation. Design and experimentation The crucial part of design and experimentation lies in the dataset and the preprocessing of the dataset. For any data science project, the major timeshare is spent on data cleaning and preprocessing. Deep learning is no exception from this. Data preprocessing is one of the vital parts of building a deep learning pipeline. Usually, for a neural network to process, real-world datasets are not cleaned or formatted. Conversion to floats or integers, normalization and so on, is required before further processing. Building a data processing pipeline is also a non-trivial task, which consists of writing a lot of boilerplate code. For making it much easier, dataset builders and DataLoader pipeline packages are built into the core of PyTorch. The dataset and DataLoader classes Different types of deep learning problems require different types of datasets, and each of them might require different types of preprocessing depending on the neural network architecture we use. This is one of the core problems in deep learning pipeline building. Although the community has made the datasets for different tasks available for free, writing a preprocessing script is almost always painful. PyTorch solves this problem by giving abstract classes to write custom datasets and data loaders. The example given here is a simple dataset class to load the fizzbuzz dataset, but extending this to handle any type of dataset is fairly straightforward. PyTorch's official documentation uses a similar approach to preprocess an image dataset before passing that to a complex convolutional neural network (CNN) architecture. A dataset class in PyTorch is a high-level abstraction that handles almost everything required by the data loaders. The custom dataset class defined by the user needs to override the __len__ and __getitem__ functions of the parent class, where __len__ is being used by the data loaders to determine the length of the dataset and __getitem__ is being used by the data loaders to get the item. The __getitem__ function expects the user to pass the index as an argument and get the item that resides on that index: from dataclasses import dataclassfrom torch.utils.data import Dataset, DataLoader@dataclass(eq=False)class FizBuzDataset(Dataset): input_size: int start: int = 0 end: int = 1000 def encoder(self,num): ret = [int(i) for i in '{0:b}'.format(num)] return[0] * (self.input_size - len(ret)) + ret def __getitem__(self, idx): x = self.encoder(idx) if idx % 15 == 0: y = [1,0,0,0] elif idx % 5 ==0: y = [0,1,0,0] elif idx % 3 == 0: y = [0,0,1,0] else: y = [0,0,0,1] return x,y def __len__(self): return self.end - self.start The implementation of a custom dataset uses brand new dataclasses from Python 3.7. dataclasses help to eliminate boilerplate code for Python magic functions, such as __init__, using dynamic code generation. This needs the code to be type-hinted and that's what the first three lines inside the class are for. You can read more about dataclasses in the official documentation of Python (https://docs.python.org/3/library/dataclasses.html). The __len__ function returns the difference between the end and start values passed to the class. In the fizzbuzz dataset, the data is generated by the program. The implementation of data generation is inside the __getitem__ function, where the class instance generates the data based on the index passed by DataLoader. PyTorch made the class abstraction as generic as possible such that the user can define what the data loader should return for each id. In this particular case, the class instance returns input and output for each index, where, input, x is the binary-encoder version of the index itself and output is the one-hot encoded output with four states. The four states represent whether the next number is a multiple of three (fizz), or a multiple of five (buzz), or a multiple of both three and five (fizzbuzz), or not a multiple of either three or five. Note: For Python newbies, the way the dataset works can be understood by looking first for the loop that loops over the integers, starting from zero to the length of the dataset (the length is returned by the __len__ function when len(object) is called). The following snippet shows the simple loop: dataset = FizBuzDataset()for i in range(len(dataset)): x, y = dataset[i]dataloader = DataLoader(dataset, batch_size=10, shuffle=True, num_workers=4)for batch in dataloader: print(batch) The DataLoader class accepts a dataset class that is inherited from torch.utils.data.Dataset. DataLoader accepts dataset and does non-trivial operations such as mini-batching, multithreading, shuffling, and so on, to fetch the data from the dataset. It accepts a dataset instance from the user and uses the sampler strategy to sample data as mini-batches. The num_worker argument decides how many parallel threads should be operating to fetch the data. This helps to avoid a CPU bottleneck so that the CPU can catch up with the GPU's parallel operations. Data loaders allow users to specify whether to use pinned CUDA memory or not, which copies the data tensors to CUDA's pinned memory before returning it to the user. Using pinned memory is the key to fast data transfers between devices, since the data is loaded into the pinned memory by the data loader itself, which is done by multiple cores of the CPU anyway. Most often, especially while prototyping, custom datasets might not be available for developers and in such cases, they have to rely on existing open datasets. The good thing about working on open datasets is that most of them are free from licensing burdens, and thousands of people have already tried preprocessing them, so the community will help out. PyTorch came up with utility packages for all three types of datasets with pretrained models, preprocessed datasets, and utility functions to work with these datasets. This article is about how to build a basic pipeline for deep learning development. The system we defined here is a very common/general approach that is followed by different sorts of companies, with slight changes. The benefit of starting with a generic workflow like this is that you can build a really complex workflow as your team/project grows on top of it. Build deep learning workflows and take deep learning models from prototyping to production with PyTorch Deep Learning Hands-On written by Sherin Thomas and Sudhanshu Passi. F8 PyTorch announcements: PyTorch 1.1 releases with new AI tools, open sourcing BoTorch and Ax, and more Facebook AI open-sources PyTorch-BigGraph for faster embeddings in large graphs Top 10 deep learning frameworks

0
0
44721

article-image-google-bristlecone-a-new-quantum-processor-by-googles-quantum-ai-lab

Sugandha Lahoti

06 Mar 2018

2 min read

Google Bristlecone: A New Quantum processor by Google’s Quantum AI lab

Sugandha Lahoti

06 Mar 2018

2 min read

The quest to conquer the Quantum World is rapidly advancing! Another contender in this conquest is Google, who has launched the preview of Bristlecone, a new Quantum Processor. Google’s Bristlecone was unveiled at the annual American Physical Society meeting in Los Angeles on March 5, 2018. According to Google, “Bristlecone would be a compelling proof-of-principle for building larger scale quantum computers.” The purpose of this quantum processor is to provide a testbed for research into system error rates and scalability of Google’s qubit technology along with applications in quantum simulation, optimization, and machine learning. A Preview of Bristlecone, Google’s New Quantum Processor. On the right, is a cartoon of the device: each “X” represents a qubit, with nearest neighbor connectivity. Google Bristlecone uses a new architecture that allows 72 quantum bits on a single array with an overlapping design that puts two different grids together. Google has optimized Bristlecone for the lowest possible error rate using a specialized process called Quantum Error Correction. The previous 9-qubit linear quantum computers by Google demonstrated error rates of 1% readout, 0.1% single-qubit gates and 0.6% two-qubit gates. Google Bristlecone uses the same scheme for coupling, control, and readout, but is scaled to a square array of 72 qubits. Google researchers chose a device of this size to demonstrate quantum supremacy in the future, to investigate first and second order error-correction using the surface code, and to facilitate quantum algorithm development on actual hardware. The intended research direction of the Quantum AI Lab is to access near-term applications on the road to building an error corrected quantum computer. For this, Google says, “would require harmony between a full stack of technology ranging from software and control electronics to the processor itself. Getting this right requires careful systems engineering over several iterations.” More information about Google Bristlecone is available in the Google research blog.

0
0
42812

article-image-google-tvcs-write-an-open-letter-to-googles-ceo-demands-for-equal-benefits-and-treatment

Natasha Mathur

06 Dec 2018

4 min read

Google TVCs write an open letter to Google's CEO; demands for equal benefits and treatment

Natasha Mathur

06 Dec 2018

4 min read

Google contractors ( often referred to as Google’s “shadow workforce”) wrote an open letter on Medium to Sundar Pichai, CEO, Google, yesterday, demanding him to address their demands of better conditions and equal benefits for contractors, that make up more than half of the company’s total staff. Contractors ( vendors, temps, TVCs) are workers who are employed by different outside agencies within Google for all types of different jobs (coders, managers, marketers, janitors, waiters, etc). https://twitter.com/GoogleWalkout/status/1070327480601509888 It was just last month when 20,000 Google employees along with TVCs, temps, vendors, and contractors walked out to protest against Google’s handling of sexual harassment and discrimination within the workplace. As a part of the walkout, Google employees had made five demands urging Google to bring about structural changes within the workplace. One of the demands laid out by Google employees was “commitment to ending pay and opportunity inequity” for all levels of the organization, including the contract workers and the sub-contract workers. However, Google didn’t address any of the issues surrounding the TVCs so far. “As TVCs who took equal part in the walkout, your silence has been deafening. Google routinely denies TVCs access to information that is relevant to our jobs and our lives,” reads the letter. An example mentioned in the letter is of the tragic shooting at YouTube headquarters in April, this year, where Google sent security related updates to its employees in real time, “leaving TVCs defenseless in the line of fire”. Moreover, TVCs were not even invited to the post-shooting town hall meeting the following day. Similarly, TVCs were also excluded from the town hall meeting that was conducted six days post walkout. “The exclusion of TVCs from important communications and fair treatment is part of a system of institutional racism, sexism, and discrimination. TVCs are disproportionately people from marginalized groups who are treated as less deserving of compensation, opportunities, workplace protections, and respect”, reads the letter. The letter also points out the fact that contractors wear different colored badges from full-time employees, get low wages despite doing the same work as full-time employees, and are offered minimal benefits as compared to full-time employees. “Google has the power — and the money — to ensure that we are treated equitably, with respect and dignity. However, it is clear that we will continue to be mistreated and ignored if we stay silent. We need transparency, accountability, and structural change to ensure equity for all Google workers, ”reads the letter. Contractors have now reiterated the demands of the walkout: End to pay and opportunity inequity for TVCs. This demand includes better pay and same benefits for contractors as full-time employees such as high-quality healthcare, paid vacations, paid sick days, holiday pay, family leave, and bonuses. There should also be a consistent and transparent conversion process to full-time employment, along with adopting single badge color for all workers. Access to company-wide information on the same terms as full-time employees. This includes access to town hall discussions, communications about safety, discrimination, sexual misconduct, access to internal forums like Google Groups, career growth, classes, and counseling opportunities, similar to the ones offered to full-time employees. Public response to the letter has been largely positive, with people supporting contractors for speaking out: https://twitter.com/andytliu/status/1070504767674245121 https://twitter.com/mer__edith/status/1070345492406644737 https://twitter.com/ireneista/status/1070375529650372608 https://twitter.com/techworkersco/status/1070337882714365952 https://twitter.com/spoonboy42/status/1070479331196059648 Google hasn’t responded yet regarding the demands and for now, we can only wait and see if and when these demands get addressed by Google. Recode Decode #GoogleWalkout interview shows why data and evidence don’t always lead to right decisions in even the world’s most data-driven company Google bypassed its own security and privacy teams for Project Dragonfly reveals Intercept Google employees join hands with Amnesty International urging Google to drop Project Dragonfly

0
0
40550

article-image-paper-in-two-minutes-attention-is-all-you-need

Sugandha Lahoti

05 Apr 2018

4 min read

Paper in Two minutes: Attention Is All You Need

Sugandha Lahoti

05 Apr 2018

4 min read

A paper on a new simple network architecture, the Transformer, based solely on attention mechanisms The NIPS 2017 accepted paper, Attention Is All You Need, introduces Transformer, a model architecture relying entirely on an attention mechanism to draw global dependencies between input and output. This paper is authored by professionals from the Google research team including Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. The Transformer – Attention is all you need What problem is the paper attempting to solve? Recurrent neural networks (RNN), long short-term memory networks(LSTM) and gated RNNs are the popularly approaches used for Sequence Modelling tasks such as machine translation and language modeling. However, RNN/CNN handle sequences word-by-word in a sequential fashion. This sequentiality is an obstacle toward parallelization of the process. Moreover, when such sequences are too long, the model is prone to forgetting the content of distant positions in sequence or mix it with following positions’ content. Recent works have achieved significant improvements in computational efficiency and model performance through factorization tricks and conditional computation. But they are not enough to eliminate the fundamental constraint of sequential computation. Attention mechanisms are one of the solutions to overcome the problem of model forgetting. This is because they allow dependency modelling without considering their distance in the input or output sequences. Due to this feature, they have become an integral part of sequence modeling and transduction models. However, in most cases attention mechanisms are used in conjunction with a recurrent network. Paper summary The Transformer proposed in this paper is a model architecture which relies entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and tremendously improves translation quality after being trained for as little as twelve hours on eight P100 GPUs. Neural sequence transduction models generally have an encoder-decoder structure. The encoder maps an input sequence of symbol representations to a sequence of continuous representations. The decoder then generates an output sequence of symbols, one element at a time. The Transformer follows this overall architecture using stacked self-attention and point-wise, fully connected layers for both the encoder and decoder. The authors are motivated to use self-attention because of three criteria. One is that the total computational complexity per layer. Another is the amount of computation that can be parallelized, as measured by the minimum number of sequential operations required. The third is the path length between long-range dependencies in the network. The Transformer uses two different types of attention functions: Scaled Dot-Product Attention, computes the attention function on a set of queries simultaneously, packed together into a matrix. Multi-head attention, allows the model to jointly attend to information from different representation subspaces at different positions. A self-attention layer connects all positions with a constant number of sequentially executed operations, whereas a recurrent layer requires O(n) sequential operations. In terms of computational complexity, self-attention layers are faster than recurrent layers when the sequence length is smaller than the representation dimensionality, which is often the case with machine translations. Key Takeaways This work introduces Transformer, a novel sequence transduction model based entirely on attention mechanism. It replaces the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention. Transformer can be trained significantly faster than architectures based on recurrent or convolutional layers for translation tasks. On both WMT 2014 English-to-German and WMT 2014 English-to-French translation tasks, the model achieves a new state of the art. In the former task the model outperforms all previously reported ensembles. Future Goals Transformer has only been applied to transduction model tasks as of yet. In the near future, the authors plan to use it for other problems involving input and output modalities other than text. They plan to apply attention mechanisms to efficiently handle large inputs and outputs such as images, audio and video. The Transformer architecture from this paper has gained major traction since its release because of major improvements in translation quality and other NLP tasks. Recently, the NLP research group at Harvard have released a post which presents an annotated version of the paper in the form of a line-by-line implementation. It is accompanied with 400 lines of library code, written in PyTorch in the form of a notebook, accessible from github or on Google Colab with free GPUs.

0
0
36534

article-image-introducing-spleeter-tensorflow-python-library-extracts-voice-sound-from-music

Sugandha Lahoti

05 Nov 2019

2 min read

Introducing Spleeter, a Tensorflow based python library that extracts voice and sound from any music track

Sugandha Lahoti

05 Nov 2019

2 min read

On Monday, Deezer, a French online music streaming service, released Spleeter which is a music separation engine. It comes in the form of a Python Library based on Tensorflow. Stating the reason behind Spleeter, the researchers state, “We release Spleeter to help the Music Information Retrieval (MIR) community leverage the power of source separation in various MIR tasks, such as vocal lyrics analysis from audio, music transcription, any type of multilabel classification or vocal melody extraction.” Spleeter comes with pre-trained models for 2, 4 and 5 track separation. These include: Vocals (singing voice) / accompaniment separation (2 stems) Vocals / drums / bass / other separation (4 stems) Vocals / drums / bass / piano / other separation (5 stems) It can also train source separation models or fine-tune pre-trained ones with Tensorflow if you have a dataset of isolated sources. Deezer benchmarked Spleeter against Open-Unmix another open-source model recently released and reported slightly better performances with increased speed. It can perform separation of audio files to 4 stems 100x faster than real-time when running on a GPU. You can use Spleeter straight from the command line as well as directly in your own development pipeline as a Python library. It can be installed with Conda, with pip or be used with Docker. Spleeter creators mention a number of potential applications of source separation engine including remixes, upmixing, active listening, educational purposes, and pre-processing for other tasks such as transcription. Spleeter received mostly positive feedback on Twitter, as people experimented to separate vocals from music. https://twitter.com/lokijota/status/1191580903518228480 https://twitter.com/bertboerland/status/1191110395370586113 https://twitter.com/CholericCleric/status/1190822694469734401 Wavy.org also ran several songs through the two-stem filter and evaluated them in a blog post. They tried a variety of soundtracks across multiple genres. The performance of audio was much better than expected, however, vocals sometimes felt robotically autotuned. The amount of bleed was shockingly low relative to other solutions and surpassed any available free tool and rival commercial plugins and services. https://twitter.com/waxpancake/status/1191435104788238336 Spleeter will be presented and live-demoed at the 2019 ISMIR conference in Delft. For more details refer to the official announcement. DeepMind AI’s AlphaStar achieves Grandmaster level in StarCraft II with 99.8% efficiency. Google AI introduces Snap, a microkernel approach to ‘Host Networking’ Firefox 70 released with better security, CSS, and JavaScript improvements

0
0
34746

article-image-what-we-learned-from-qlik-qonnections-2018

Amey Varangaonkar

09 May 2018

4 min read

What we learned from Qlik Qonnections 2018

Amey Varangaonkar

09 May 2018

4 min read

Qlik’s new CEO Mike Capone keynoted the recently held Qlik Qonnections 2018, with some interesting feature rollouts and announcements. He also shed light on the evolution of Qlik’s two premium products - Qlikview and Qlik Sense, and shared their roadmap for the coming year. Close to 4000 developers and Business Intelligence professionals were in attendance, and were very receptive to the positive announcements made in the keynote. Let us take a quick look at some of the important announcements: Qlik continues to be the market leader Capone began the keynote by sharing some of the interesting performance metrics over the past year, which have led to Qlik being listed as a ‘Leader’ in the Gartner Magic Quadrant 2017. One of the most impressive achievements among all is the impressive customer base that Qlik boasts of, including: 9 out of the 10 major banks 8 out of the 10 major insurance companies 11 out of the 15 major global investment and securities companies With an impressive retention rate of 94%, Qlik have also managed to add close to 4000 new customers over the last year and have also doubled the developer community to over 25,000 members. These numbers mean only one thing - Qlik will continue to dominate. Migration from Qlikview to Qlik Sense There has been a lot of talk (and confusion) of late about Qlik supposedly looking to transition its focus from Qlikview to Qlik Sense. In the keynote, Capone gave us all the much needed clarity on the licensing and migration options for those looking to move from Qlikview’s guided analytics features to Qlik Sense’s self-service analytics. These are some of the important announcements in this regard: Migration from Qlikview to Qlik Sense is optional: Acknowledging some of the loyal customers who don’t want to move away from QlikView, Capone said that the migration from Qlikview to Qlik Sense is optional. For those who do want to migrate, Qlik have assured that the transition will be made as smooth as possible, and that they would be making this a priority. Single license to use both Qlikview and Qlik Sense: Qlik have made it possible for customers to get the most out of their products without having to buy multiple licenses for multiple products. With just an additional maintenance fee, they will be able to enjoy the premium features of both the tools seamlessly. Qlik venturing into cognitive analytics One of the most notable announcements of this conference was incorporating aspects of Artificial Intelligence into the Business Intelligence capabilities of the Qlik products. Qlik are aiming to improving the core associative engine that works with the available data smartly. Not just that, they have also announced the Insight Advisor feature, to auto-generate the best possible visualizations and reports. Hybrid and multi-cloud support added Qlik’s vision going forward is quite simple and straightforward - to support deployment of their applications and services in a hybrid-cloud or multi-cloud environment. Going forward, users will be able to move their Qlik Sense applications that run using a microservices-based architecture on Linux, in either public or private clouds. They will also be able to self-manage these applications with the support features provided by Qlik. New tools for Qlik developers Qonnections 2018 saw 2 important announcements made to make the lives of Qlik developers easier. Along with Qlik Branch - a platform to collaborate on projects and share innovations and new developments, Qlik also announced a new platform for developers called Qlik Core. This new platform will allow Qlik developers to leverage the offerings of IoT, edge analytics and more to design and drive innovative business models and strategies. Qlik Core is currently in the beta stage, and is expected to be generally available very soon. Interesting times ahead for Qlik In recent times, Qlik has faced stiff competition from other popular Business Intelligence tools such as Tableau, Spotfire, Microsoft’s very own Power BI - apart from the freely available tools which are easily available to customers for fast, effective business intelligence. With all the tools delivering on a similar promise and not coming out with any groundbreaking blue ocean features, it will be interesting to see how Qlik’s new offerings will fare against these sharks. The recent restructuring of the Qlik management and the downsizing happening over the past few years can make one wonder if they are struggling to keep up. However, the announcements in Qonnections 2018 indicate the company is indeed moving in a positive direction with their products, and should restore the public faith and dispel any doubts Qlik’s customers may have. How Qlik Sense is driving self-service Business Intelligence Overview of a Qlik Sense® Application’s Life Cycle QlikView Tips and Tricks

0
0
33260

article-image-introducing-postgrest-a-rest-api-for-any-postgresql-database-written-in-haskell

Bhagyashree R

04 Nov 2019

3 min read

Introducing PostgREST, a REST API for any PostgreSQL database written in Haskell

Bhagyashree R

04 Nov 2019

3 min read

Written in Haskell, PostgREST is a standalone web server that enables you to turn your existing PostgreSQL database into a RESTful API. It offers you a much “cleaner, more standards-compliant, faster AP than you are likely to write from scratch.” The PostgREST documentation describes it as an “alternative to manual CRUD programming.” Explaining the motivation behind this tool, the documentation reads, “Writing business logic often duplicates, ignores or hobbles database structure. Object-relational mapping is a leaky abstraction leading to slow imperative code. The PostgREST philosophy establishes a single declarative source of truth: the data itself.” Performant by design In terms of performance, PostgREST shows subsecond response times for up to 2000 requests/sec on Heroku free tier. The main contributor to this impressive performance is its Haskell implementation using the Warp HTTP server. To maintain fast response times, it delegates most of the calculation part to the database including serializing JSON responses directly in SQL, data validation, and more. Along with that, it takes the help of the Hasql library to efficiently use the database. A single declarative source of truth for security PostgREST is responsible for handling authentication via JSON Web Tokens. You can also build other forms of authentication on top of the JWT primitive. It delegates authorization to the role information defined in the database to ensure there is a single declarative source of truth for security. Data integrity PostgREST does not rely on an Object Relational Mapper (ORM) and custom imperative coding. Instead, developers need to put declarative constraints directly into their database preventing any kind of data corruption. In a Hacker News discussion, many users praised the tool. “I think PostgREST is the first big tool written in Haskell that I’ve used in production. From my experience, it’s flawless. Kudos to the team,” a user commented. Some others also expressed that using this tool for systems in production can further complicate things. A user added, “Somebody in our team put this on production. I guess this solution has some merits if you need something quick, but in the long run it turned out to be painful. It's basically SQL over REST. Additionally, your DB schema becomes your API schema and that either means you force one for the purposes of the other or you build DB views to fix that.” You can read about PostgREST on its official website. Also, check out its GitHub repository. After PostgreSQL, DigitalOcean now adds MySQL and Redis to its managed databases’ offering Amazon Aurora makes PostgreSQL Serverless generally available PostgreSQL 12 progress update

0
0
31906

article-image-google-researchers-introduce-jax-a-tensorflow-like-framework-for-generating-high-performance-code-from-python-and-numpy-machine-learning-programs

Bhagyashree R

11 Dec 2018

2 min read

Google researchers introduce JAX: A TensorFlow-like framework for generating high-performance code from Python and NumPy machine learning programs

Bhagyashree R

11 Dec 2018

2 min read

Google researchers have build a tool called JAX, a domain-specific tracing JIT compiler, which generates high-performance accelerator code from pure Python and Numpy machine learning programs. It combines Autograd and XLA for high-performance machine learning research. At its core, it is an extensible system for transforming numerical functions. Autograd helps JAX automatically differentiate native Python and Numpy code. It can handle a large subset of Python features such as loops, branches, recursion, and closures. It comes with support for reverse-mode (backpropagation) and forward-mode differentiation, and these two can be composed arbitrarily in any order. XLA or Accelerated Linear Algebra is a linear algebra compiler used for optimizing TensorFlow computations. To run the NumPy programs on GPUs and TPUs, JAX uses XLA. The library calls are compiled and executed just-in-time. JAX also allows compiling your own Python functions just-in-time into XLA-optimized kernels using a one-function API, jit. How JAX works? The basic function of JAX is specializing and translating high-level Python and NumPy functions into a representation that can be transformed and then lifted back into a Python function. It traces Python functions by monitoring all the basic operations applied to its input to produce output and then records these operations and the data-flow between them in a directed acyclic graph (DAG). For tracing the functions, it wraps primitive operations and when they’re called they add themselves to a list of operations performed along with their inputs and outputs. In order to keep track of the data flow between these primitive operations, the values being tracked are wrapped in the Tracer class instances. The team is working towards expanding this project and provide support for cloud TPU, multi-GPU, and multi-TPU. In future, it will come with full NumPy coverage and some SciPy coverage, and more. As this is still a research project, we can expect bugs and is not recommended to be used in production. To read more in detail and contribute to this project, head over to GitHub. Google AdaNet, a TensorFlow-based AutoML framework Graph Nets – DeepMind’s library for graph networks in Tensorflow and Sonnet Dopamine: A Tensorflow-based framework for flexible and reproducible Reinforcement Learning research by Google

0
0
31177

article-image-numpy-drops-python-2-support-now-you-need-python-3-5-or-later

Prasad Ramesh

17 Dec 2018

2 min read

NumPy drops Python 2 support. Now you need Python 3.5 or later.

Prasad Ramesh

17 Dec 2018

2 min read

In a GitHub pull request last week, the NumPy community decided to remove support for Python 2.7. Python 3.4 support will also be dropped with this pull request. So now, to use NumPy 1.17 and newer versions, you will need Python 3.5 or later. NumPy has been supporting both Python versions since 2010. This move doesn't come as a surprise with the Python core team itself dropping support for Python 2 in 2020. The NumPy team had mentioned that this move comes in “Python 2 is an increasing burden on our limited resources”. The discussion to drop Python 2 support in NumPy started almost a year ago. Running pip install numpy on Python 2 will still install the last working version. But here on now, it may not contain the latest features as released for Python 3.5 or higher. However, NumPy on Python 2 will still be supported until December 31, 2019. After January 1, 2020, it may not contain the newest bug fixes. The Twitter audience sees this as a welcome move: https://twitter.com/TarasNovak/status/1073262599750459392 https://twitter.com/esc___/status/1073193736178462720 A comment on Hacker News reads: “Let's hope this move helps with the transitioning to Python 3. I'm not a Python programmer myself, but I'm tired of things getting hairy on Linux dependencies written in Python. It almost seems like I always got to have a Python 2 and a Python 3 version of some packages so my system doesn't break.” Another one reads: “I've said it before, I'll say it again. I don't care for everything-is-unicode-by-default. You can take my Python 2 when you pry it from my cold dead hands.” Some researchers who use NumPy and SciPy stick Python 2, this move from the NumPy team will help in getting everyone to work on a single version. One single supported version will sure help with the fragmentation. Often, Python developers find themselves in a situation where they have one version installed and a specific module is available/works properly in another version. Some also argue about stability, that Python 2 has greater stability and x or y feature. But the general sentiment is more supportive of adopting Python 3. Introducing numpywren, a system for linear algebra built on a serverless architecture NumPy 1.15.0 release is out! Implementing matrix operations using SciPy and NumPy

0
0
30921

article-image-google-rolls-out-mandatory-benefits-for-contractors-after-they-protest-for-fair-treatment-at-work

Natasha Mathur

03 Apr 2019

4 min read

Google workers demand fair treatment for contractors; company rolls out mandatory benefits, in response, to improve working conditions

Natasha Mathur

03 Apr 2019

4 min read

Over 900 Google workers signed a letter, yesterday, urging Google to treat its contract workers fair. The contract workers at Google make up to nearly 54% of the workforce. The letter was published on Medium by the Google Walkout For Real Change group. It states that on 8th March, about 82% of the Google’s ‘Personality team of 43 members’ were informed that their existing contract term has been shortened and they will be terminated by 5th April. Personality team describes themselves as an international contract team responsible for the voice of Google Assistant across the world. “We are the human labor that makes the Google Assistant relevant, funny, and relatable in more than 50 languages”, reads the letter. Given that the contract team consists of expats from around the world, means that many would have to make big changes in their personal life and move back to their respective homes, without any financial support. The letter states that contractors were assured by their leads that the contract would be respected, however, the onset of layoff globally at the Google offices seemed to belie that assurance. Other than this, the contractors were not informed by Google about the layoffs and termed it as a “change in strategy”. The letter also sheds light on the discriminatory environment within Google towards its TVCs (temps, vendors, contractors). For instance, neither are the contractors offered paid holidays nor any health care. Moreover, during the layoff process, Google had asked the managers and full-time employees to distance themselves from the contractors and to not offer them any support for Google to not come under legal obligations. The letter condemns the fact that Google boasts of its ability to scale up and down with agility, stating, “the whole team thrown into financial uncertainty is what scaling down quickly looks like for Google workers. This is the human cost of agility”. The group has laid down three demands in the letter: Google should respect and uphold the existing contract. In case, the contracts were shortened, payment should be made for the remaining length of the contract. Google should respect the work of contractors and should convert them to full-time employees. Google should respect humanity. A policy should be implemented that allows FTEs (full-time employees) to openly empathize with TVCs. FTEs should be able to thank TVCs for the kind of job they’ve done. Google’s response to the letter Google responded to the letter yesterday, stating that they are improving the working conditions of TVCs. As per the new changes, by 2022, all contractors who work at least 33 hours per week for Google would receive full benefits including: comprehensive health care paid parental leave a $15 minimum wage a minimum of eight days of sick leave $5,000 per year in tuition reimbursement for workers wanting to learn new skills and courses. “These changes are significant and we're inspired by the thousands of full-time employees and TVCs who came together to make this happen”, reads the letter. However, the Personality Team is still waiting to hear back from Google on whether the company will respect the current contracts or convert them into full-time positions. https://twitter.com/GoogleWalkout/status/1113206052957433856 Eileen Naughton, VP of people operations, Google told the Hill "These are meaningful changes, and we’re starting in the U.S., where comprehensive healthcare and paid parental leave are not mandated by U.S. law. As we learn from our implementation here, we’ll identify and address areas of potential improvement in other areas of the world." Check out the official letter by Google workers here. #GooglePayoutsForAll: A digital protest against Google’s $135 million execs payout for misconduct Google confirms it paid $135 million as exit packages to senior execs accused of sexual harassment Google finally ends Forced arbitration for all its employees

0
0
30545

article-image-youtube-promises-to-reduce-recommendations-of-conspiracy-theory-ex-googler-explains-why-this-is-a-historic-victory

Sugandha Lahoti

12 Feb 2019

4 min read

Youtube promises to reduce recommendations of ‘conspiracy theory’. Ex-googler explains why this is a 'historic victory'

Sugandha Lahoti

12 Feb 2019

4 min read

Talks of AI algorithms causing harms including addiction, radicalization. political abuse and conspiracies, disgusting kids videos and the danger of AI propaganda are all around. Last month, YouTube announced an update regarding YouTube recommendations aiming to reduce the recommendations of videos that promote misinformation ( eg: conspiracy videos, false claims about historical events, flat earth videos, etc). In a historical move, Youtube changed its Artificial Intelligence algorithm instead of favoring another solution, which may have cost them fewer resources, time, and money. Last Friday, an ex-googler who helped build the YouTube algorithm, Guillaume Chaslot, appreciated this change in AI, calling it “a great victory” which will help thousands of viewers from falling down the rabbit hole of misinformation and false conspiracy theories. In a twitter thread, he presented his views as someone who has had experience working on Youtube’s AI. Recently, there has been a trend in Youtube promoting conspiracy videos such as ‘Flat Earth theories’. In a blog post, Guillaume Chaslot explains, “Flat Earth is not a ’small bug’. It reveals that there is a structural problem in Google’s AIs and they exploit weaknesses of the most vulnerable people, to make them believe the darnedest things.” Youtube realized this problem and has made amends to its algorithm. “It’s just another step in an ongoing process, but it reflects our commitment and sense of responsibility to improve the recommendations experience on YouTube. To be clear, this will only affect recommendations of what videos to watch, not whether a video is available on YouTube. As always, people can still access all videos that comply with our Community Guidelines”, states the YouTube team in a blog post. Chaslot appreciated this fact in his twitter thread saying that although Youtube had the option to ‘make people spend more time on round earth videos’, they chose the hard way by tweaking their AI algorithm. AI algorithms also often get biased by tiny groups of hyperactive users. As Chaslot notes, people who spend their lives on YouTube affect recommendations more. The content they watch gets more views, which leads to Youtubers noticing and creating more of it, making people spend even more time on that content. This is because YouTube optimizes for things you might watch, not things you might like. As a hacker news user observed, “The problem was that pathological/excessive users were overly skewing the recommendations algorithms. These users tend to watch things that might be unhealthy in various ways, which then tend to get over-promoted and lead to the creation of more content in that vein. Not a good cycle to encourage.” The new change in Youtube’s AI makes use of machine learning along with human evaluators and experts from all over the United States to train these machine learning systems responsible for generating recommendations. Evaluators are trained using public guidelines and offer their input on the quality of a video. Currently, the change is applied only to a small set of videos in the US as the machine learning systems are not very accurate currently. The new update will roll out in different countries once the systems become more efficient. However, there is another problem lurking around which is probably even bigger than conspiracy videos. This is the addiction to spending more and more time online. AI engines used in major social platforms, including but not limited to YouTube, Netflix, Facebook all want people to spend as much time as possible. A hacker news user commented, “This is just addiction peddling. Nothing more. I think we have no idea how much damage this is doing to us. It’s as if someone invented cocaine for the first time and we have no social norms or legal framework to confront it.” Nevertheless, Youtube updating it’s AI engine was taken generally positively by Netizens. As Chaslot, concluded on his Twitter thread, “YouTube's announcement is a great victory which will save thousands. It's only the beginning of a more humane technology. Technology that empowers all of us, instead of deceiving the most vulnerable.” Now it is on Youtube’s part how they will strike a balance between maintaining a platform for free speech and living up to their responsibility to users. Is the YouTube algorithm’s promoting of #AlternativeFacts like Flat Earth having a real-world impact? YouTube to reduce recommendations of ‘conspiracy theory’ videos that misinform users in the US. YouTube bans dangerous pranks and challenges Is YouTube’s AI Algorithm evil?

0
0
30303

article-image-new-updates-microsoft-azure-services-sql-server-mysql-postgresql

Sugandha Lahoti

09 Mar 2018

3 min read

New updates to Microsoft Azure services for SQL Server, MySQL, and PostgreSQL

Sugandha Lahoti

09 Mar 2018

3 min read

Microsoft has announced multiple updates to its Microsoft Azure Cloud Platform today. These updates are meant to help companies migrate database workloads to its data centers and making it easier to run them in Azure. SQL Server customers can now try the preview for SQL Database Managed Instance, Azure Hybrid Benefit for SQL Server, and Azure Database Migration Service preview for Managed Instance. Additionally, Microsoft has also announced the preview for Apache Tomcat® support in Azure App Service and the general availability of Azure Database for MySQL and PostgreSQL in the coming weeks, making it easier to bring open source powered applications to Azure. Microsoft SQL Database Managed Instance Azure SQL Database Managed Instance allows seamless movement of any SQL Server application to Azure without application changes. Managed Instance offers full engine compatibility with existing SQL Server deployments including capabilities like SQLAgent, DBMail, and Change Data Capture, to name a few. Microsoft Azure Database Migration Service The Azure Database Migration Service is designed as an end-to-end solution to help customers moving databases from on-premises SQL Server instances to SQL Database Managed Instances. Microsoft Azure Hybrid Benefit program With the Azure Hybrid Benefit program customers can now move their on-premises SQL Server licenses with active Software Assurance to Managed Instance and soon the SQL Server Integration Services licenses to Azure Data Factory with upto 30% discounted pricing. Apache Tomcat® support in Microsoft Azure App Service Microsoft also announced a preview of built-in support for Apache Tomcat and OpenJDK 8 from Azure App Service. This will help Java developers easily deploy web applications and APIs to Azure’s market leading PaaS. Once deployed, customers can then extend it with the Azure SDK for Java to work with various Azure services such as Storage, Azure Database for MySQL, and Azure Database for PostgreSQL.  General availability of Azure database services for MySQL and PostgreSQL Azure Database Services for MySQL and PostgreSQL provide customers with fully managed homes for their open source databases in Microsoft’s cloud. These reduce a company's time spent in managing things like database scaling and patching. SQL Information Protection Preview SQL Information Protection lets organizations discover, classify, label and protect potentially sensitive data that's stored in a database management system, either in Microsoft's cloud or in an organization's datacenters. This service can be used with the Azure SQL Database service or with SQL Server on premises. More information about these updates is available on the Microsoft Azure blog.

0
0
29963

Tech News - Data

Salesforce Einstein team open sources TransmogrifAI, their automated machine learning library

YouTube bans dangerous pranks and challenges

Microsoft showcases its edgy AI toolkit at Connect(); 2017

Sherin Thomas explains how to build a pipeline in PyTorch for deep learning workflows

Google Bristlecone: A New Quantum processor by Google’s Quantum AI lab

Google TVCs write an open letter to Google's CEO; demands for equal benefits and treatment

Paper in Two minutes: Attention Is All You Need

Introducing Spleeter, a Tensorflow based python library that extracts voice and sound from any music track

What we learned from Qlik Qonnections 2018

Introducing PostgREST, a REST API for any PostgreSQL database written in Haskell

Trending Topics

Google researchers introduce JAX: A TensorFlow-like framework for generating high-performance code from Python and NumPy machine learning programs

NumPy drops Python 2 support. Now you need Python 3.5 or later.

Google workers demand fair treatment for contractors; company rolls out mandatory benefits, in response, to improve working conditions

Youtube promises to reduce recommendations of ‘conspiracy theory’. Ex-googler explains why this is a 'historic victory'

New updates to Microsoft Azure services for SQL Server, MySQL, and PostgreSQL

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access