Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-ibm-google-quantum-computing

14 Nov 2017

3 min read

Has IBM edged past Google in the battle for Quantum Supremacy?

14 Nov 2017

Last month when researchers at Google unveiled a blueprint for quantum supremacy, little did they know that rival IBM was about to snatch the pole position. In what could be the largest and the most sophisticated quantum computer built till date, IBM has announced the development of a quantum computer capable of handling 50 qubits (quantum bits). The Big Blue also announced another 20-qubit processor that will be made available through IBM Q cloud by the end of the year. "Our 20-qubit machine has double the coherence time, at an average of 90 microseconds, compared to previous generations of quantum processors with an average of 50 microseconds. It is also designed to scale; the 50-qubit prototype has similar performance," Dario Gil, who leads IBM's quantum computing and artificial intelligence research division, said in his blog post. IBM’s progress in this space has been truly rapid. After launching the 5-qubit system in May 2016, they followed with a 15-qubit machine this year, and then upgraded the IBM Q experience to 20-qubits, putting 50-qubits in line. That is quite a leap in 18 months. As a technology, quantum computing is a rather difficult area to understand — information is processed differently here. Unlike normal computers that interpret either a 0 or a 1, quantum computers can live in multiple states, leading to all kinds of programming possibilities for such type of computing. Add to it the coherence factor that makes it very difficult for programmers to build a quantum algorithm. While the company did not divulge the technical details about how its engineers could simultaneously expand the number of qubits and increase the coherence times, it did mention that the improvements were due to better “superconducting qubit design, connectivity and packaging.” That the 50-qubit prototype is a “natural extension” of the 20-qubit technology and both exhibit "similar performance metrics." The major goal though is to create a fault tolerant universal system that is capable of correcting errors automatically while having high coherence. "The holy grail is fault-tolerant universal quantum computing. Today, we are creating approximate universal, meaning it can perform arbitrary operations and programs, but it’s approximating so that I have to live with errors and a limited window of time to perform the operations," Gil said. The good news is that an ecosystem is building up. Through the IBM Q experience, more than 60,000 users have run over 1.7 million quantum experiments and generated over 35 third-party research publications. That the beta-testers included 1,500 universities, 300 high schools and 300 private-sector participants means quantum computing is closer to implementation in real world, in areas like medicine, drug discovery and materials science. "Quantum computing will open up new doors in the fields of chemistry, optimisation, and machine learning in the coming years," Gil added. "We should savor this period in the history of quantum information technology, in which we are truly in the process of rebooting computing." All eyes are now on Google, IBM’s nearest rival in quantum computing at this stage. While IBM’s 50-qubit processor has taken away half the charm out of Google’s soon to be announced 49-qubit system, expect more surprises in the offing as Google has so far managed to keep its entire quantum computing machinery behind closed doors.

0
0
21478

article-image-amoebanets-googles-new-evolutionary-automl

Savia Lobo

16 Mar 2018

2 min read

AmoebaNets: Google’s new evolutionary AutoML

Savia Lobo

16 Mar 2018

2 min read

In order to detect objects within an image, artificial neural networks require careful design by experts over years of difficult research. They later address one specific task, such as to find what's in a photograph, to call a genetic variant, or to help diagnose a disease. Google believes one approach to generate these ANN architectures is through the use of evolutionary algorithms. So, today Google introduced AmoebaNets, an evolutionary algorithm that achieves state-of-the-art results for datasets such as ImageNet and CIFAR-10. Google offers AmoebaNets as an answer to questions such as, By using the computational resources to programmatically evolve image classifiers at unprecedented scale, can one achieve solutions with minimal expert participation? How good can today's artificially-evolved neural networks be? These questions were addressed through the two papers: Large-Scale Evolution of Image Classifiers,” presented at ICML 2017. In this paper, the authors have set up an evolutionary process with simple building blocks and trivial initial conditions. The idea was to "sit back" and let evolution at scale do the work of constructing the architecture. Regularized Evolution for Image Classifier Architecture Search (2018). This paper includes a scaled up computation using Google's new TPUv2 chips. This combination of modern hardware, expert knowledge, and evolution worked together to produce state-of-the-art models on CIFAR-10 and ImageNet, two popular benchmarks for image classification. One important feature of the evolutionary algorithm (AmoebaNets) that the team used in their second paper is a form of regularization, which means: Instead of letting the worst neural networks die, they remove the oldest ones — regardless of how good they are. This improves robustness to changes in the task being optimized and tends to produce more accurate networks in the end. Since weight inheritance is not allowed, all networks must train from scratch. Therefore, this form of regularization selects for networks that remain good when they are re-trained. These models achieve state-of-the-art results for CIFAR-10 (mean test error = 2.13%), mobile-size ImageNet (top-1 accuracy = 75.1% with 5.1 M parameters) and ImageNet (top-1 accuracy = 83.1%). Read more about AmoebaNets on Google Research Blog

0
0
21470

article-image-build-custom-maps-the-easy-way-with-multiple-map-layers-in-tableau-from-whats-new

Anonymous

22 Dec 2020

5 min read

Build custom maps the easy way with multiple map layers in Tableau from What's New

Anonymous

22 Dec 2020

5 min read

Ashwin Kumar Senior Product Manager Kristin Adderson December 22, 2020 - 8:04pm December 22, 2020 The Tableau 2020.4 release comes fully-loaded with tons of great features, including several key updates to boost your geospatial analysis. In particular, the new multiple marks layers feature lets you add an unlimited number of layers to the map. This means you can visualize multiple sets of location data in context of one another, and there’s no need for external tools to build custom background maps. Drag and drop map layers—yes, it’s just that easy Spend less time preparing spatial datasets and more time analyzing your data with drag-and-drop map layers across Tableau Online, Server, and Desktop in Tableau 2020.4. Getting started is easy! Once you’ve connected to a datasource that contains location data and created a map, simply drag any geographic field onto the Add a Marks Layers drop target, and Tableau will instantly draw the new layer of marks on the map. For each layer that you create, Tableau provides a new marks card, so you can encode each layer’s data by size, shape, and color. What’s more, you can even control the formatting of each layer independently, giving you maximum flexibility in controlling the appearance of your map. But that’s not all. While allowing you to draw an unlimited number of customized map layers is a powerful capability in its own right, the multiple map layers feature in Tableau gives you even more tools that you can use to supercharge your analytics. First up: the ability to toggle the visibility of each layer. With this feature, you can decide to show or hide each layer at will, allowing you to visualize only the relevant layers for the question at hand. You can use this feature by hovering over each layer’s name in the marks card, revealing the interactive eye icon. Sometimes, you may want only some of your layers to be interactive, and the remaining layers to simply be part of the background. And luckily, the multiple map layers feature allows you to have exactly this type of control. Hovering over each layer’s name in the marks card reveals a dropdown arrow. Clicking on this arrow, you can select the first option in the context menu: Disable Selection. With this option, you can customize the end-user experience, ensuring that background contextual layers do not produce tooltips or other interactive elements when not required. Finally, you also have fine-grained control over the drawing order, or z-order, of layers on your map. With this capability, you can ensure that background layers that may obscure other map features are drawn on the bottom. To adjust the z-order of layers on the map, you can either drag to reorder your layers in the marks card, or you can use the Move Up and Move Down options in each layer’s dropdown context menu. Drawing an unlimited number of map layers is critical to helping you build authoritative, context-appropriate maps for your organization. This is helpful for a wide variety of use cases across industries and businesses. Check out some more examples below: A national coffee chain might want to visualize stores, competitor locations, and win/loss metrics by sales area to understand competitive pressures. In the oil and gas industry, visualizing drilling rigs, block leases, and nautical boundaries could help devise exploration and investment strategies.A disaster relief NGO may decide to map out hurricane paths, at-risk hospitals, and first-responder bases to deploy rescue teams to those in need.Essentially, you can use this feature to build rich context into your maps and support easy analysis and exploration for any scenario! Plus, spatial updates across the stack: Tableau Prep, Amazon Redshift, and offline maps The 2020.4 release will also include other maps feature to help you take location intelligence to the next level. In this release, we’re including support for spatial data in Tableau Prep, so you can clean and transform your location data without having to use a third party tool. We’re also including support for spatial data from Amazon Redshift databases, and offline maps for Tableau Server, so you can use Tableau maps in any environment and connect to your location data directly from more data sources. Want to know what else we released with Tableau 2020.4? Learn about Tableau Prep in the browser, web authoring and predictive modeling enhancements, and more in our launch announcement. We’d love your feedback Can you think of additional features you need to take your mapping in Tableau to greater heights? We would love to hear from you! Submit your request on the Tableau Ideas Forum today. Every idea is considered by our Product Management team and we value your input in making decisions about what to build next. Want to get a sneak peek at the latest and greatest in Tableau? Visit our Coming Soon page to learn more about what we’re working on next. Happy mapping!

0
0
21279

article-image-netflix-open-sources-metaflow-its-python-framework-for-building-and-managing-data-science-projects

Fatema Patrawala

04 Dec 2019

5 min read

Netflix open-sources Metaflow, its Python framework for building and managing data science projects

Fatema Patrawala

04 Dec 2019

5 min read

Yesterday, the Netflix team announced to open-source Metaflow, a Python library that helps scientists and engineers build and manage real-life data science projects. The Netflix team writes, “Over the past two years, Metaflow has been used internally at Netflix to build and manage hundreds of data-science projects from natural language processing to operations research.” Metaflow was developed by Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to deep learning. It provides a unified API to the infrastructure stack required to execute data science projects, from prototype to production. Metaflow integrates with Netflix's data science infrastructure stack Models are only a small part of an end-to-end data science project. Production-grade projects rely on a thick stack of infrastructure. At the minimum, projects need data and a way to perform computation on it. In a business environment like Netflix's typical data science project, the team touches upon all the layers of the stack depicted below: Source: Netflix website Data is accessed from a data warehouse, which can be a folder of files, a database, or a multi-petabyte data lake. The modeling code crunches the data executed in a compute environment and a job scheduler is used to orchestrate multiple units of work. Then the team architects the code to be executed by structuring it as an object hierarchy, Python modules, or packages. They version the code, input data, and produce ML models. After the model has been deployed to production, the team faces pertinent questions about model operations for example; How to keep the code running reliably in production? How to monitor its performance? How to deploy new versions of the code to run in parallel with the previous version? Additionally at the very top of the stack there are other questions like how to produce features for your models, or how to develop models in the first place using off-the-shelf libraries. In this Metaflow provides a unified approach to navigating the stack. Metaflow is more prescriptive about the lower levels of the stack but it is less opinionated about the actual data science at the top of the stack. Developers can use Metaflow with their favorite machine learning or data science libraries, such as PyTorch, Tensorflow, or SciKit Learn. Metaflow allows you to write models and business logic as idiomatic Python code. Internally, Metaflow leverages existing infrastructure when feasible. The core value proposition of Metaflow is its integrated full-stack, human-centric API, rather than reinventing the stack itself. Metaflow on Amazon Web Services Metaflow is a cloud-native framework which it leverages elasticity of the cloud by design — both for compute and storage. Netflix is one of the largest users of Amazon Web Services (AWS) and have accumulated plenty of operational experience and expertise in dealing with the cloud. For this open-source release, Netflix partnered with AWS to provide a seamless integration between Metaflow and various AWS services. Metaflow comes with built-in capability to snapshot all code and data in Amazon S3 automatically, a key value proposition for the internal Metaflow setup. This provides data science teams with a comprehensive solution for versioning and experiment tracking without any user intervention, core of any production-grade machine learning infrastructure. In addition, Metaflow comes bundled with a high-performance S3 client, which can load data up to 10Gbps. Additionally Metaflow provides a first-class local development experience. It allows data scientists to develop and test code quickly on laptops, similar to any Python script. If the workflow supports parallelism, Metaflow takes advantage of all CPU cores available on the development machine. How is Metaflow different from existing Python frameworks On Hacker News, developers discuss how Metaflow is different than existing tools or workflows. One of them comments, “I don't like to criticise new frameworks / tools without first understanding them, but I like to know what some key differences are without the marketing/PR fluff before giving one a go. For instance, this tutorial example here does not look substantially different to what I could achieve just as easily in R, or other Python data wrangling frameworks. Is the main feature the fact I can quickly put my workflows into the cloud?” Someone from the Metaflow team responds on this thread, “Here are some key features: - Metaflow snapshots your code, data, and dependencies automatically in a content-addressed datastore, which is typically backed by S3, although local filesystem is supported too. This allows you to resume workflows, reproduce past results, and inspect anything about the workflow e.g. in a notebook. This is a core feature of Metaflow. - Metaflow is designed to work well with a cloud backend. We support AWS today but technically other clouds could be supported too. There's quite a bit of engineering that has gone into building this integration. For instance, using the Metaflow's built-in S3 client, you can pull over 10Gbps, which is more than you can get with e.g. aws CLI today easily. - We have spent time and effort in keeping the API surface area clean and highly usable. YMMV but it has been an appealing feature to many users this far.” Developers can find the project home page here and its code at GitHub. Netflix open sources Polynote, an IDE-like polyglot notebook with Scala support, Apache Spark integration, multi-language interoperability, and more Tesla Software Version 10.0 adds Smart Summon, in-car karaoke, Netflix, Hulu, and Spotify streaming Netflix security engineers report several TCP networking vulnerabilities in FreeBSD and Linux kernels

0
0
21164

article-image-baidu-announces-clarinet-a-neural-network-for-text-to-speech-synthesis

Sugandha Lahoti

23 Jul 2018

2 min read

Baidu announces ClariNet, a neural network for text-to-speech synthesis

Sugandha Lahoti

23 Jul 2018

2 min read

Text-to-speech synthesis has been a booming research area, with Google, Facebook, Deepmind, and other tech giants showcasing their interesting research and trying to build better TTS models. Now Baidu has stolen the show with ClariNet, the first fully end-to-end TTS model, that directly converts text to a speech waveform in a single neural network. Classical TTS models such as Deepmind’s Wavenet usually have a separately text-to-spectrogram and waveform synthesis models. Having two models may result in suboptimal performance. ClariNet combines the two models into one fully convolutional single neural network. Not only that, their text-to-wave model significantly outperforms the previous separate TTS models, they claim. Baidu’s ClariNet consists of four components: Encoder, which encodes textual features into an internal hidden representation. Decoder, which decodes the encoder representation into the log-mel spectrogram in an autoregressive manner. Bridge-net: An intermediate processing block, which processes the hidden representation from the decoder and predicts log-linear spectrogram. It also upsamples the hidden representation from frame-level to sample-level. Vocoder: A Gaussian autoregressive WaveNet to synthesize the waveform. It is conditioned on the upsampled hidden representation from the bridge-net. ClariNet’s Architecture Baidu has also proposed a new parallel wave generation method based on the Gaussian inverse autoregressive flow (IAF). This mechanism generates all samples of an audio waveform in parallel, speeding up waveform synthesis dramatically as compared to traditional autoregressive methods. To teach a parallel waveform synthesizer, they use a Gaussian autoregressive WaveNet as the teacher-net and the Gaussian IAF as the student-net. Their Gaussian autoregressive WaveNet is trained with maximum likelihood estimation (MLE). The Gaussian IAF is distilled from the autoregressive WaveNet by minimizing KL divergence between their peaked output distributions, stabilizing the training process. For more details on ClariNet, you can check out Baidu’s paper and audio samples. How Deep Neural Networks can improve Speech Recognition and generation AI learns to talk naturally with Google’s Tacotron 2

0
0
21115

article-image-baidu-open-sources-ernie-2-0-a-continual-pre-training-nlp-model-that-outperforms-bert-and-xlnet-on-16-nlp-tasks

Fatema Patrawala

30 Jul 2019

3 min read

Baidu open sources ERNIE 2.0, a continual pre-training NLP model that outperforms BERT and XLNet on 16 NLP tasks

Fatema Patrawala

30 Jul 2019

3 min read

Today Baidu released a continual natural language processing framework ERNIE 2.0. ERNIE stands for Enhanced Representation through kNowledge IntEgration. Baidu claims in its research paper that ERNIE 2.0 outperforms BERT and the recent XLNet in 16 NLP tasks in Chinese and English. Additionally, Baidu has open sourced ERNIE 2.0 model. In March Baidu had announced the release of ERNIE 1.0, its pre-trained model based on PaddlePaddle, Baidu’s deep learning open platform. According to Baidu, ERNIE 1.0 outperformed BERT in all Chinese language understanding tasks. Pre-training procedures of the models such as BERT, XLNet and ERNIE 1.0 are mainly based on a few simple tasks modeling co-occurrence of words or sentences, highlights the paper. For example, BERT constructed a bidirectional language model task and the next sentence prediction task to capture the co-occurrence information of words and sentences; XLNet constructed a permutation language model task to capture the co-occurrence information of words. But besides co-occurring information, there are much richer lexical, syntactic and semantic information in training corpora. For example, named entities, such as person names, place names, and organization names, contain concept information; Sentence order and sentence proximity information can enable the models to learn structure-aware representations; Semantic similarity at the document level or discourse relations among sentences can enable the models to learn semantic-aware representations. So is it possible to further improve the performance if the model was trained to learn more kinds of tasks constantly? Source: ERNIE 2.0 research paper Based on this idea, Baidu has proposed a continual pre-training framework for language understanding in which pre-training tasks can be incrementally built and learned through multi-task learning in a continual way. According to Baidu, in this framework, different customized tasks can be incrementally introduced at any time and these tasks are trained through multi-task learning, which enables the encoding of lexical, syntactic and semantic information across tasks. And whenever a new task arrives, this framework can incrementally train the distributed representations without forgetting the previously trained parameters. The Structure of Released ERNIE 2.0 Model Source: ERNIE 2.0 research paper ERNIE is a continual pre-training framework which provides a feasible scheme for developers to build their own NLP models. The fine-tuning source codes of ERNIE 2.0 and pre-trained English version models can be downloaded from the GitHub page. The team at Baidu compared the performance of ERNIE 2.0 model with the existing pre-training models on the English dataset GLUE and 9 popular Chinese datasets separately. The results show that ERNIE 2.0 model outperforms BERT and XLNet on 7 GLUE language understanding tasks and outperforms BERT on all of the 9 Chinese NLP tasks, such as DuReader Machine Reading Comprehension, Sentiment Analysis and Question Answering. Specifically, according to the experimental results on GLUE datasets, ERNIE 2.0 model almost comprehensively outperforms BERT and XLNET on English tasks, whether it is a base model or the large model. Furthermore, the research paper shows that ERNIE 2.0 large model achieves the best performance and creates new results on the Chinese NLP tasks. Source: ERNIE 2.0 research paper To know more about ERNIE 2.0, read the research paper and check out their official blog on Baidu’s website. DeepMind’s AI uses reinforcement learning to defeat humans in multiplayer games CMU and Google researchers present XLNet: a new pre-training method for language modeling that outperforms BERT on 20 tasks Transformer-XL: A Google architecture with 80% longer dependency than RNNs

0
0
21102

article-image-can-a-production-ready-pytorch-1-0-give-tensorflow-a-tough-time

Sunith Shetty

03 May 2018

5 min read

Can a production ready Pytorch 1.0 give TensorFlow a tough time?

Sunith Shetty

03 May 2018

5 min read

PyTorch has announced a preview of the blueprint for PyTorch 1.0, the next major release of the framework. This breakthrough version is expected to bring more stability, integration support and complete production backing allowing developers to move from core research to production in an amicable way without having to deal with any migration challenges. PyTorch is an open-source Python-based scientific computing package which provides powerful GPU acceleration. PyTorch is known for advanced indexing and functions, imperative style, integration support and API simplicity. This is one of the key reasons why developers prefer PyTorch for research and hackability. To know more about how Facebook-based PyTorch competes with Google’s TensorFlow read our take on this deep learning war. Some of the noteworthy changes in the roadmap for PyTorch 1.0 are: Production support One of the biggest challenges faced by developers in terms of using PyTorch is production support. There are n number of issues faced while trying to run the models efficiently in production environments. Even though PyTorch provides excellent simplicity and flexibility, due to its tight coupling to Python, the performance at production-scale is a challenge. To counter these challenges, the PyTorch team has decided to bring PyTorch and Caffe2 together to provide production-scale readiness to the developers. However, adding production support brings complexity and configurable options for models in the API. The PyTorch team will stick to the goal of keeping the platform -- a favorable choice -- for researchers and developers. Hence, they are introducing a new just-in-time (JIT) compiler, named torch.jit. torch.jit compiler rewrites PyTorch models during runtime in order to achieve scalability and efficiency in production environments. It can also export PyTorch models to run in a C++ environment. (runtime based on Caffe2 bits) Note: In PyTorch version 1.0, your existing code will continue to work as-is. Let’s go through how JIT compiler can be used to export models to a Python-less environment in order to improve their working performance. torch.jit: The go-to compiler for your PyTorch models Building models using Python code, no doubt gives maximum productivity and makes PyTorch very simple and easy-to-use. However, this also means PyTorch finding it difficult to know which operation you will run next. This can be frustrating for the developers during model export and automatic performance optimizations because they need to be aware of how the computations will look like before it even gets implemented. To deal with these issues, PyTorch provides two ways of recovering information from the Python code. Both these methods will be useful based on different contexts, giving you the leverage to use/mix them with ease. Tracing the native Python code Compiling a subset of the Python language Tracing mode torch.jit.trace function allows you to record the native PyTorch operations performed along with the data dependencies between them. PyTorch version 0.3 already had a tracer function which is used to export models through ONNX. This new version uses a high-performance C++ runtime that allows PyTorch to re-execute programs for you. The key advantage of using this method is that it doesn’t have to deal with how your Python code is structured since we only trace through native PyTorch operations. Script mode PyTorch team has come up with a solution called scripting mode made specially for those models such as RNNs which make use of control flow. However, you will have to write out a regular Python function (avoiding complex language features) In order to get your function compiled, you can assign @script decorator. This will make sure it alters your Python function directly into high-performance C++ during runtime. Advantages in optimization and export techniques Irrespective of you using a trace or a script function, the technique allows you to optimize/export the model for use in production environments (i.e. Python-free portrayal of the model) Now you can derive bigger segments of the model into an intermediate representation to work with sophisticated models. You can use high-performance backends available in Caffe2 to run the models efficiently Usability If you don’t need to export or optimize your model, you do not need to use these set of new features. These modes will be included into the core of the PyTorch ecosystem, thus allowing you to mix and match them with the existing code seamlessly as per your needs. Additional changes and improvements In addition to the major update in the production support for 1.0, PyTorch team will continue working on optimizing, working on the stability of the interface, and fixing other modules in PyTorch ecosystem PyTorch 1.0 will see some changes in the backend side which might affect user-written C and C++ extensions. In order to incorporate new features and optimization techniques from Caffe2, PyTorch team is replacing (optimizing) the backend ATen library. PyTorch team is planning to release 1.0 during the summer. For the detailed preview of the roadmap, you can refer the official PyTorch blog. Top 10 deep learning frameworks The Deep Learning Framework Showdown: TensorFlow vs CNTK Why you should use Keras for deep learning

0
0
21053

article-image-neo4j-enterprise-edition-is-now-available-under-a-commercial-license

Amrata Joshi

21 Nov 2018

3 min read

Neo4j Enterprise Edition is now available under a commercial license

Amrata Joshi

21 Nov 2018

3 min read

Last week, the Neo4j community announced that the Neo4j Enterprise Edition will be available under a commercial license. The source code is available only for the Neo4j Community Edition. The Neo4j Community Edition will continue to be provided under an open source GPLv3 license. According to the Neo4j community, this new change won’t affect any Neo4j open source projects. Also, it won’t create an impact over customers, partners or OEM users operating under a Neo4j subscription license. The Neo4j Desktop users using Neo4j Enterprise Edition under free development license also won’t get affected. It doesn’t impact the members of Neo4j Startup program. The reason for choosing an open core licensing model The idea behind getting Neo4j Enterprise Edition under commercial license was to clarify and simplify the licensing model and remove ambiguity. Also, the community wanted to clear the confusion regarding what they sell and what they open source. Also, the community wanted to clarify about options they offer. The Enterprise Edition source and object code were initially available under multiple licenses. This led to multiple interpretations of these multiple licenses which ultimately created confusion in the open source community, in the buyers, and even in legal reviewers’ minds. According to the Neo4j blog, “ >99% of Neo4j Enterprise Edition code was written by individuals on Neo4j’s payroll – employed or contracted by Neo4j-the-company. As for the fractional <1%... that code is still available in older versions. We’re not removing it. And we have reached out to the few who make up the fractional <1% to affirm their contributions are given proper due.” Developers can use the Enterprise Edition for free by using the Neo4j Desktop for desktop-based development. Startups can benefit through the startup license offered by Neo4j, which is also available now to the startups with up to 20 employees. Data journalists, such as the ICIJ and NBC News can use the Enterprise Edition for free via the Data Journalism Accelerator Program. Neo4j also offers a free license to universities for teaching and learning. To know more about this news, check out Neo4j’s blog. Neo4j rewarded with $80M Series E, plans to expand company Why Neo4j is the most popular graph database Neo4j 3.4 aims to make connected data even more accessible

0
0
21015

article-image-google-employees-protest-against-the-use-of-artificial-intelligence-in-military

Amey Varangaonkar

06 Apr 2018

3 min read

Google Employees Protest against the use of Artificial Intelligence in Military

Amey Varangaonkar

06 Apr 2018

3 min read

Thousands of Google employees have raised their concerns regarding the use of Artificial Intelligence for military purposes. The employees, which included many senior engineers as well, have signed a petition requesting Google CEO Sundar Pichai to pull Google out of Project Maven - a Pentagon-backed project harvesting AI to improve the military technology. Pichai was also urged by employees to establish and enforce strict policies which keep Google and its associated subsidiaries from indulging in ‘the business of war’. What does the petition say? The letter, signed by over 3000 Google employees, argues that collaborating with the government to work on military projects is strictly against Google’s core ideology that technology must be used for welfare and not for destruction of mankind. It argues that backing the military could backfire tremendously by creating a negative image of Google in the minds of customers, and also affect potential recruitment. The concerned employees are of the opinion that since Google is currently engaged in a serious competition with many other companies to hire the best possible talent, some candidates could be put off by Google’s military connections with the government. What is Project Maven? Project Maven is a Pentagon-backed initiative which was announced in May 2017. The main purpose of this project was to integrate Artificial Intelligence with various defense programs to make them smarter. Backed with Google’s technology, this program aims to improve the image and video processing capabilities of drones to accurately pick out human targets for strikes, while identifying innocent civilians to reduce or prevent their accidental killing. Google have declared their participation in this program in a ‘non-offensive capacity’, and have maintained that their products or technology would not be used to create autonomous weapons that operate without human intervention. Connections with the Pentagon It is also interesting to note that some of Google’s top executives are connected to Pentagon in some capacity. Eric Schmidt, the former executive chairman of Google who is still a member of the executive board of Google’s parent company Alphabet, serves in the Defense Innovation Board, a Pentagon advisory body. Milo Medin, Vice President of Access Services, Google Capital is also a part of this body. What about Amazon and Microsoft? When it comes to connections with the Pentagon, Google aren’t the only ones involved. Amazon has collaborated with the Department of Defense through the Amazon Rekognition API for image recognition. Also, Microsoft announced their collaboration with the US government by providing IaaS (Infrastructure as a Service) and PaaS (Platform as a Service) capabilities to meet the data storage and security needs of the government. The news related to the dispute and the subsequent petition was initially reported by Gizmodo earlier this March. Considering the project is expected to cost close to $70 million in its first year, the petitioners are aiming to discourage Google from getting into more lucrative contracts as the demand for AI in defense and military applications grows.

0
0
20985

article-image-spark-h2o-sparkling-water-machine-learning-needs

Aarthi Kumaraswamy

15 Nov 2017

3 min read

Spark + H2O = Sparkling water for your machine learning needs

Aarthi Kumaraswamy

15 Nov 2017

3 min read

[box type="note" align="" class="" width=""]The following is an excerpt from the book Mastering Machine Learning with Spark, Chapter 1, Introduction to Large-Scale Machine Learning and Spark written by Alex Tellez, Max Pumperla, and Michal Malohlava. This article introduces Sparkling water - H2O's integration of their platform within the Spark project, which combines the machine learning capabilities of H2O with all the functionality of Spark. [/box] H2O is an open source, machine learning platform that plays extremely well with Spark; in fact, it was one of the first third-party packages deemed "Certified on Spark". Sparkling Water (H2O + Spark) is H2O's integration of their platform within the Spark project, which combines the machine learning capabilities of H2O with all the functionality of Spark. This means that users can run H2O algorithms on Spark RDD/DataFrame for both exploration and deployment purposes. This is made possible because H2O and Spark share the same JVM, which allows for seamless transitions between the two platforms. H2O stores data in the H2O frame, which is a columnar-compressed representation of your dataset that can be created from Spark RDD and/or DataFrame. Throughout much of this book, we will be referencing algorithms from Spark's MLlib library and H2O's platform, showing how to use both the libraries to get the best results possible for a given task. The following is a summary of the features Sparkling Water comes equipped with: Use of H2O algorithms within a Spark workflow Transformations between Spark and H2O data structures Use of Spark RDD and/or DataFrame as inputs to H2O algorithms Use of H2O frames as inputs into MLlib algorithms (will come in handy when we do feature engineering later) Transparent execution of Sparkling Water applications on top of Spark (for example, we can run a Sparkling Water application within a Spark stream) The H2O user interface to explore Spark data Design of Sparkling Water Sparkling Water is designed to be executed as a regular Spark application. Consequently, it is launched inside a Spark executor created after submitting the application. At this point, H2O starts services, including a distributed key-value (K/V) store and memory manager, and orchestrates them into a cloud. The topology of the created cloud follows the topology of the underlying Spark cluster. As stated previously, Sparkling Water enables transformation between different types of RDDs/DataFrames and H2O's frame, and vice versa. When converting from a hex frame to an RDD, a wrapper is created around the hex frame to provide an RDD-like API. In this case, data is not duplicated but served directly from the underlying hex frame. Converting from an RDD/DataFrame to a H2O frame requires data duplication because it transforms data from Spark into H2O-specific storage. However, data stored in an H2O frame is heavily compressed and does not need to be preserved as an RDD anymore: If you enjoyed this excerpt, be sure to check out the book it appears in.

0
0
20966

article-image-pytorch-org-revamps-for-pytorch-1-0-with-design-changes-and-added-static-graph-support

Natasha Mathur

21 Sep 2018

2 min read

Pytorch.org revamps for Pytorch 1.0 with design changes and added Static graph support

Natasha Mathur

21 Sep 2018

2 min read

The Pytorch team updated their official website “Pytorch.org” for PyTorch 1.0 yesterday. The new update comprises minor changes to the overall look and feel of the website. In addition to that, more information has been added under the tutorials section for converting your PyTorch models to a static graph. PyTorch is a Python-based scientific computing package which uses the power of graphics processing units. It is also one of the preferred deep learning research platforms built to offer maximum flexibility and speed. Key Updates Design Changes The layout of the webpage is still the same. But color changes have been made with additional tabs included on top of the webpage. Revamped Pytorch.org Previously, there were only five tabs, namely, get started, about, support, discuss and docs. Now, there are eight tabs included namely, Get Started, Features, Ecosystem, Blog, Tutorials, Docs, Resources, and Github. Older Python.org Updated Tutorials With new tutorial tab, additional information has been provided for users to convert their models into a static graph, which is a feature in the upcoming PyTorch 1.0 version. Added static graph support One of the main differences between TensorFlow and PyTorch is that TensorFlow uses static computational graphs while PyTorch uses dynamic computational graphs. In TensorFlow we first set up the computational graph, then execute the same graph many times. There has been an additional section under tutorials on static graphs. This implementation makes use of basic TensorFlow operations to set up a computational graph, then executes the graph many times to actually train a fully-connected ReLU network. For more details on the changes, visit the official PyTorch website. What is PyTorch and how does it work? Can a production ready PyTorch 1.0 give TensorFlow a tough time? PyTorch 0.3.0 releases, ending stochastic functions

0
0
20856

article-image-intel-acquires-vertex-ai-to-join-it-under-their-artificial-intelligence-unit

Prasad Ramesh

17 Aug 2018

2 min read

Intel acquires Vertex.ai to join it under their artificial intelligence unit

Prasad Ramesh

17 Aug 2018

2 min read

After acquiring Nervana, Mobileye, and Movidius, Intel has now bought Vertex.ai and is merging it with their artificial intelligence group. Vertex.ai is a Seattle based startup unicorn with the vision to develop deep learning for every platform with their PlaidML deep learning engine. The terms of the deal are undisclosed but the 7-person Vertex.ai team including founders Choong Ng, Jeremy Bruestle, and Brian Retford will become a part of Movidius in Intel’s Artificial Intelligence Products Group. Vertex.ai was founded in 2015 and initially funded by Curious Capital and Creative Destruction Lab, among others. Intel said in a statement "With this acquisition, Intel gained an experienced team and IP (intellectual property) to further enable flexible deep learning at the edge." The chipmaker does intend to continue developing PlaidML as an open source project. They will shortly transition it to the Apache 2.0 license from the existing AGPLv3 license. The priority for PlaidML will continue to be an engine that supports a variety of hardware with an Intel nGraph backend. “There’s a large gap between the capabilities neural networks show in research and the practical challenges in actually getting them to run on the platforms where most applications run,” Ng stated on Vertex.ai’s launch in 2016. “Making these algorithms work in your app requires fast enough hardware paired with precisely tuned software compatible with your platform and language. Efficient plus compatible plus portable is a huge challenge—we can help.” Intel is among many other giants in the tech industry making heavy investments in AI. Their AI chip business is currently at $1B a year. Their PC/chip business makes $8.8B while their data-centric business makes $7.2 billion. “After 50 years, this is the biggest opportunity for the company,” Navin Shenoy, executive vice president said at Intel’s 2018 Data Centric Innovation Summit this year. “We have 20 percent of this market today…Our strategy is to drive a new era of data center technology.” The official announcement is stated in the Vertex.ai website. Intel acquires eASIC, a custom chip (FPGA) maker for IoT, cloud and 5G environments Intel’s Spectre variant 4 patch impacts CPU performance SingularityNET and Mindfire unite talents to explore artificial intelligence

0
0
20767

article-image-patreon-speaks-out-against-the-protests-over-its-banning-sargon-of-akkad-for-violating-its-rules-on-hate-speech

Natasha Mathur

19 Dec 2018

3 min read

Patreon speaks out against the protests over its banning Sargon of Akkad for violating its rules on hate speech

Natasha Mathur

19 Dec 2018

3 min read

0
0
20618

article-image-mongodb-switches-to-server-side-public-license-sspl-to-prevent-cloud-providers-from-exploiting-its-open-source-code

Natasha Mathur

17 Oct 2018

3 min read

MongoDB switches to Server Side Public License (SSPL) to prevent cloud providers from exploiting its open source code

Natasha Mathur

17 Oct 2018

3 min read

MongoDB switches to Server Side Public License (SSPL) to prevent cloud providers from exploiting its open source code MongoDB, a leading free, and open source general purpose database platform, announced yesterday that it has issued a new software license, the Server Side Public License (SSPL), for the MongoDB community server. This new license will be applied to all the new releases and versions of the MongoDB community server, including the patch fixes for prior versions. “The market is increasingly consuming software as a service, creating an incredible opportunity to foster a new wave of great open source server-side software. Unfortunately, once an open source project becomes interesting, it is too easy for cloud vendors who have not developed the software to capture all of the value while contributing little back to the community,” mentioned Eliot Horowitz, CTO, and co-founder, MongoDB. Earlier, MongoDB was licensed under the GNU AGPLv3 (AGPL). This license allowed the companies to modify and run MongoDB as a publicly available service but only if they open source their software or acquire a commercial license from MongoDB. However, as the popularity of MongoDB grew, some cloud providers started taking MongoDB’s open-source code to offer a hosted commercial version of its database to their users without abiding by the open-source rules. This is why MongoDB decided to switch to the SSPL. “We have greatly contributed to, and benefited from, open source, and are in a unique position to lead on an issue impacting many organizations. We hope this new license will help inspire more projects and protect open source innovation”, said Horowitz. The SSPL is not very different from the AGPL license. Only that SSPL clearly specified the condition for providing open source software as a service. In fact, the new license offers the same level of freedom as the AGPL to the open source community. Companies still have the freedom to use, review, modify and redistribute the software but to use MongoDB as a service, they need to open source the software that they’re using. This is not applicable to customers who have purchased a commercial license from MongoDB. “We are big believers in open source. It leads to more valuable, robust and secure software. However, it is important that open source licenses evolve to keep pace with the changes in our industry. With the added protection of the SSPL, we can continue to invest in R&D and further drive innovation and value for the community”, mentioned Dev Ittycheria, President & CEO, MongoDB. For more information, check out the official MongoDB announcement. MongoDB acquires mLab to transform the global cloud database market and scale MongoDB Atlas MongoDB Sharding: Sharding clusters and choosing the right shard key [Tutorial] MongoDB 4.0 now generally available with support for multi-platform, mobile, ACID transactions and more

0
0
20576

article-image-watermelon-db-a-new-relational-database-to-make-your-react-and-react-native-apps-highly-scalable

Bhagyashree R

11 Sep 2018

2 min read

Introducing Watermelon DB: A new relational database to make your React and React Native apps highly scalable

Bhagyashree R

11 Sep 2018

2 min read

Now you can store your data in Watermelon! Yesterday, Nozbe released Watermelon DB v0.6.1-1, a new addition to the database world. It aims to help you build powerful React and React Native apps that scale to large number of records and remain fast. Watermelon architecture is database-agnostic, making it cross-platform. It is a high-level layer for dealing with data, but can be plugged in to any underlying database, depending on platform needs. Why choose Watermelon DB? Watermelon DB is optimized for building React and React Native complex applications. Following are the factors that help in ensuring high speed of applications: It makes your application highly scalable by using lazy loading, which means Watermelon DB loads data only when it is requested. Most queries resolve in less than 1ms, even with 10,000 records, as all querying is done on SQLite database on a separate thread. You can launch your app instantly irrespective of how much data you have. It is supported on iOS, Android, and the web. It is statically typed keeping Flow, a static type checker for JavaScript, in mind. It is fast, asynchronous, multi-threaded, and highly cached. It is designed to be used with a synchronization engine to keep the local database up to date with a remote database. Currently, Watermelon DB is in active development and cannot be used in production. Their roadmap states that, migrations will soon be added to allow the production use of Watermelon DB. Schema migrations is the mechanism by which you can add new tables and columns to the database in a backward-compatible way. To know how you can install it and to try few examples, check out Watermelon DB on GitHub. React Native 0.57 coming soon with new iOS WebViews What’s in the upcoming SQLite 3.25.0 release: windows functions, better query optimizer and more React 16.5.0 is now out with a new package for scheduling, support for DevTools, and more!

0
0
20574

Tech News - Data

Has IBM edged past Google in the battle for Quantum Supremacy?

AmoebaNets: Google’s new evolutionary AutoML

Build custom maps the easy way with multiple map layers in Tableau from What's New

Netflix open-sources Metaflow, its Python framework for building and managing data science projects

Baidu announces ClariNet, a neural network for text-to-speech synthesis

Baidu open sources ERNIE 2.0, a continual pre-training NLP model that outperforms BERT and XLNet on 16 NLP tasks

Can a production ready Pytorch 1.0 give TensorFlow a tough time?

Neo4j Enterprise Edition is now available under a commercial license

Google Employees Protest against the use of Artificial Intelligence in Military

Spark + H2O = Sparkling water for your machine learning needs

Trending Topics

Pytorch.org revamps for Pytorch 1.0 with design changes and added Static graph support

Intel acquires Vertex.ai to join it under their artificial intelligence unit

Patreon speaks out against the protests over its banning Sargon of Akkad for violating its rules on hate speech

MongoDB switches to Server Side Public License (SSPL) to prevent cloud providers from exploiting its open source code

Introducing Watermelon DB: A new relational database to make your React and React Native apps highly scalable

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access