Tech News

article-image-google-strides-forward-deep-learning-open-sources-google-lucid-answer-neural-networks-make-decisions

07 Mar 2018

2 min read

Google strides forward in deep learning: open sources Google Lucid to answer how neural networks make decisions

07 Mar 2018

In an attempt to deepen neural network interpretability, Google has released Google Lucid, a neural network visualization library along with publishing an article “The Building Blocks of Interpretability”, which answers one of the most popular questions in Deep Learning: how do neural networks make decisions? Google Lucid is a neural network visualization library building off Google’s work on DeepDream. You may remember DeepDream as Google’s earlier attempt to visualize how neural networks understand images, which led to the creation of psychedelic images. Google Lucid adds feature visualizations to create more artistic DeepDream images. It is basically a collection of infrastructure and tools for research in neural network interpretability. In particular, it provides state of the art implementations of feature visualization techniques, and flexible abstractions that make it very easy to explore new research directions. To add more flexibility and ease of work, Google is also releasing colab notebooks. These notebooks make it extremely easy to use Lucid to reproduce visualizations. Just open the notebook and click a button to run code without worrying about setup requirements. To further make things exciting, Google’s new Distill article, titled, “The Building Blocks of Interpretability,” shows how feature visualization in combination with other interpretability techniques allows a clear cut view of the neural network. This is helpful to see how a neural network makes some decisions at a point, and how they influence the final output. For example, Google says, “we can see things like how a network detects a floppy ear, and then that increases the probability it gives to the image being a “Labrador retriever” or “beagle”. The article explores techniques for understanding which neurons fire in the network by attaching visualizations to each neuron, almost a kind of MRI for neural networks. It can also zoom out and show how the entire image was perceived at different layers. Thus detecting very simple combinations of edges, to rich textures and 3d structure, to high-level structures. The purpose of this research, Google says is to “address one of the most exciting questions in Deep Learning: how do neural networks do what they do?” However, it adds, “This work only scratches the surface of the kind of interfaces that we think it’s possible to build for understanding neural networks. We’re excited to see what the community will do.” You can read the entire article on Distill.

0
0
9867

article-image-apache-spark-2-3-now-native-kubernetes-support

Savia Lobo

07 Mar 2018

2 min read

Apache Spark 2.3 now has native Kubernetes support!

Savia Lobo

07 Mar 2018

2 min read

Two of the leading open-source projects, Apache Spark and Kubernetes now collaborate: Apache Spark 2.3 has native Kubernetes support. Kubernetes : A natural fit for Apache Spark Apache Spark is a framework for large-scale data processing and an important tool for data scientists. It offers a robust platform to carry out major tasks; be it data transformation, analytics, or machine learning. Recently, data scientists have been embracing the concept of working with containers in order to improve their workflows. Benefits such as packaging of dependencies and creating reproducible artifacts can be leveraged by the container adoption. This is where Kubernetes, an open-source system for automating deployment, to scale and manage containerized environments, comes to the rescue. It enables one to run containerized applications within Spark. This combination of Apache Spark and Kubernetes has dual benefits. Firstly, data scientists get to use their principal tool i.e., Apache Spark’s ability to manage distributed data processing tasks and secondly, they can work with containers using Kubernetes API. With Apache Spark 2.3, users can run Spark workloads in an existing Kubernetes 1.7+ cluster. This means Apache Spark workloads can make direct use of Kubernetes clusters for multi-tenancy and sharing through Namespaces and Quotas. It can also make use of administrative features such as Pluggable Authorization and Logging. Also, Spark workloads require no changes or new installations on the Kubernetes cluster. One simply has to create a container image and set up the right RBAC roles for the Spark Application and it is ready. The native Kubernetes support offers a fine-grained management of Spark Applications along with improved elasticity, and seamless integration with logging and monitoring solutions. The community is also exploring advanced use cases such as managing streaming workloads and leveraging service meshes like Istio. Visit Databricks blog to read more on this topic.

0
0
15685

article-image-alteryx-analytics-2018-1-analytics-platform-enterprises

Savia Lobo

07 Mar 2018

2 min read

Alteryx Analytics 2018.1 is here: The analytics platform for enterprises

Savia Lobo

07 Mar 2018

2 min read

Alteryx, one of the leading providers of self-service analytics have released Alteryx Analytics 2018.1. This collaborative platform helps data scientists and business analysts to discover, prepare, and blend from more data sources, and easily operationalize models. Let’s have a quick sneak-peek at the features of the new 2018.1 platform: Collaborative Insights: These insights will help in gaining quick access to the right data at the right time in a governed manner. Using the Alteryx Connect Loaders one can directly access metadata stored in DB2, HDFS, and SAP Hana. One can also evaluate and display analytic assets. The option to find, view and launch assets stored in Alteryx Connect directly from Alteryx Designer using the global search is possible using these insights. One can also establish Alteryx Connect lineage from Designer workflows that use In-Database processes/tools. Analytic Flexibility: The 2018.1 platform can be utilized for harnessing the full value of one’s existing architecture and emerging data assets. The user can experience expanded data connections i.e., new connectors for AWS Athena, Redshift spectrum, and an enhanced integration with Excel. This new version has a new Tableau support, which outputs directly from an Alteryx workflow into Tableau Hyper. One can execute code from R, Python or Scala directly against the Spark cluster with the new code tool for Apache Spark. Generating fast and accurate suggestions, error notifications, and auto-completion of expressions are all now possible. Operationalize Models : One can easily deploy predictive models (built in Alteryx, R, or Python) into production with Alteryx Promote. Also, this new version allows managing models from development to production to ensure they deliver the best impact on the business. It also aids in monitoring model performance and health in order to understand if the model need to be retrained or discarded. Read more about this topic in detail on the Alteryx Community Blog.

0
0
1764

article-image-data-science-news-daily-roundup-7th-march-2018

Packt Editorial Staff

07 Mar 2018

2 min read

Data Science News Daily Roundup – 7th March 2018

Packt Editorial Staff

07 Mar 2018

2 min read

Google’s Lucid for ML interpretability, Native Kubernetes support in Apache Spark 2.3, Cloudera Altus with SDX, and more in today’s top stories around machine learning, deep learning, and data science news. Top Data science News Stories of the Day Google strides forward in deep learning: open sources Lucid to answer how neural networks make decisions. Apache Spark 2.3 now has native Kubernetes support. Alteryx Analytics 2018.1 is here: The analytics platform for enterprises. Other Data Science News at a Glance 1. Cloudera announces Cloudera Altus with SDX, the first machine learning and analytics Platform-as-a-Service (PaaS), for simplifying Multi-function Big Data Analytics. Read more on PR Newswire. 2. Prisma Cloud , a new GraphQL Database Platform launched. It offers powerful data workflows like exploring and editing data in an intuitive data browser as well as automatic rollbacks. Further features include team collaboration, performance metrics, health checks and more. Read more on the Graphcool blog. 3. Pentagon uses AI to analyze drone footage. Google is reportedly working on a pilot project with the US Defense Department to use TensorFlow APIs to assist in object recognition on unclassified data. Read more on ZDNet. 4. CRAN gets a new addition of textfeatures, a simple package for extracting useful features from character objects. Read more on CRAN-project. 5. Allscripts Healthcare Solutions unveiled Avenel, its new electronic health record, which uses machine learning to reduce time for clinical documentation and is designed to work like an app. Read more on Nasdaq.

0
0
1205

article-image-data-science-news-daily-6th-march-2017

Packt Editorial Staff

06 Mar 2018

3 min read

Data Science News Daily Roundup – 6th March 2018

Packt Editorial Staff

06 Mar 2018

3 min read

Microsoft Cloud AI Research Challenge, Neblio - a next-generation blockchain network, Salesforce Einstein Analytics’ ‘Conversational Queries’, and more in today’s top stories around machine learning, deep learning, and data science news. Top Data science News Stories of the Day Google Bristlecone: A New Quantum processor by Google’s Quantum AI lab Google-Landmarks, a novel dataset for instance-level image recognition Pandas on Ray: Make Pandas faster by replacing one line of your code Other Data Science News at a Glance 1. The Microsoft Cloud AI Research Challenge invites any researcher—from students to academics to employees of public and private organizations—to build AI applications on Microsoft AI services, and the two best will be awarded USD25,000. Read more at Microsoft Research. 2. Neblio, a next-generation blockchain network that aims to make enterprise integration seamless, simple, and cost-effective. It offers a suite of solutions that are intended to streamline the process of integrating blockchain technology in a simple and efficient manner. Read more at CryptoSlate. 3. Big-data company HVR Software B.V. today launched its real-time data integration platform. The new architecture provides a more efficient method moves data continuously using a log-based change data capture method, which is a low-impact way of moving data from a variety of sources into target systems. Read more at SiliconANGLE. 4. Google open sources a protocol buffer implementation of the Fast Healthcare Interoperability Resources (FHIR) standard to make healthcare data work better with machine learning. To enable large-scale machine learning, the protocol buffer have a few additions such as, implementations in various programming languages, an efficient way to serialize large amounts of data to disk, and a representation that allows analyses of large datasets. Read more at Google Research. 5. MXNet is now faster and more scalable with the 1.1.0 release. With this release, MXNet makes it easier for developers to build vocabulary and load pre-trained word embeddings by adding experimental API.It also includes improved batching for GEMM/TRSM operators with large matrices on GPU makes it faster for you to train models. Read more at The Apache Blog. 6. Paxata announced the general availability of Spring ’18, the next major release of the company’s award-winning Adaptive Information Platform. The latest offering significantly accelerates how business consumers prepare enterprise data volumes at speed and creates high-quality information for analysis and collaboration across global organizations with new enhancements that include one-click profiling, rapid data onboarding, and multi-tenancy capabilities. Read more at Paxata Press releases. 7. Axoni announces AxLang, a new programming language based on Scala that supports functional programming and enables formal verification of smart contracts for Ethereum-compatible networks. Read more on Medium. 8. Salesforce introduced a new feature to Einstein Analytics today called ‘Conversational Queries’. With Conversational Queries, users can type phrases related to their data — such as “show me top accounts by annual revenue” or “rank accounts decreasing by annual revenue and billing country” — and instantly view answers in automatically configured dynamic charts. Read more on Techcrunch.

0
0
2038

article-image-google-landmarks-novel-dataset-instance-level-image-recognition

Sugandha Lahoti

06 Mar 2018

2 min read

Google-Landmarks, a novel dataset for instance-level image recognition

Sugandha Lahoti

06 Mar 2018

2 min read

Image retrieval and image recognition are fundamental problems in the machine learning and computer vision world. Image classification technology has shown remarkable progress over the past few years. An obstacle in this research, however, is the unavailability of large annotated datasets. Google has made an attempt to solve this challenge by introducing Google-Landmarks, a worldwide dataset for recognition of human-made and natural landmarks. This dataset was made with the intention of solving fine-grained and instance-level recognition problems. Examples of this include identifying important landmarks in images (Eiffel Tower, Mount Fuji, Taj Mahal, etc), which accounts for a large portion of what people like to photograph. Landmark recognition can help predict landmark labels directly from image pixels to help people better understand and organize their photo collections. The Google-Landmarks dataset contains more than 2 million images depicting 30 thousand unique landmarks from across the world, a number of classes that is almost 30x larger than what is available in commonly used datasets. Geographic distribution of landmarks in the Landmark dataset Google has also open-sourced Deep Local Features DELF, an attentive local feature descriptor, which is useful for large-scale instance-level image recognition, in order to advance research in this area. DELF detects and describes semantic local features which can be geometrically verified between images showing the same object instance. It is also optimized for landmark recognition. Google-Landmarks is being released as part of the Landmark Recognition and Landmark Retrieval Kaggle challenges. The Landmark recognition challenge calls for developers to build models that recognize the correct landmark (if any) in a dataset of challenging test images. In the retrieval challenge, developers are given query images and for each query, they are expected to retrieve all database images containing the same landmarks (if any). Participants are encouraged to compete in both these challenges as the test set for both the problems is same. Participants may also use the training data from the recognition challenge to train models which could be useful for the retrieval challenge. However, there are no landmarks in common between the training/index sets of the two challenges. This challenge is the focal point of the CVPR’18 Landmarks workshop. More details of the challenge and the dataset can be found in the Google research blog.

0
0
12085

article-image-google-bristlecone-a-new-quantum-processor-by-googles-quantum-ai-lab

Sugandha Lahoti

06 Mar 2018

2 min read

Google Bristlecone: A New Quantum processor by Google’s Quantum AI lab

Sugandha Lahoti

06 Mar 2018

2 min read

The quest to conquer the Quantum World is rapidly advancing! Another contender in this conquest is Google, who has launched the preview of Bristlecone, a new Quantum Processor. Google’s Bristlecone was unveiled at the annual American Physical Society meeting in Los Angeles on March 5, 2018. According to Google, “Bristlecone would be a compelling proof-of-principle for building larger scale quantum computers.” The purpose of this quantum processor is to provide a testbed for research into system error rates and scalability of Google’s qubit technology along with applications in quantum simulation, optimization, and machine learning. A Preview of Bristlecone, Google’s New Quantum Processor. On the right, is a cartoon of the device: each “X” represents a qubit, with nearest neighbor connectivity. Google Bristlecone uses a new architecture that allows 72 quantum bits on a single array with an overlapping design that puts two different grids together. Google has optimized Bristlecone for the lowest possible error rate using a specialized process called Quantum Error Correction. The previous 9-qubit linear quantum computers by Google demonstrated error rates of 1% readout, 0.1% single-qubit gates and 0.6% two-qubit gates. Google Bristlecone uses the same scheme for coupling, control, and readout, but is scaled to a square array of 72 qubits. Google researchers chose a device of this size to demonstrate quantum supremacy in the future, to investigate first and second order error-correction using the surface code, and to facilitate quantum algorithm development on actual hardware. The intended research direction of the Quantum AI Lab is to access near-term applications on the road to building an error corrected quantum computer. For this, Google says, “would require harmony between a full stack of technology ranging from software and control electronics to the processor itself. Getting this right requires careful systems engineering over several iterations.” More information about Google Bristlecone is available in the Google research blog.

0
0
42812

article-image-pandas-ray-make-pandas-faster-replacing-one-line-code

Savia Lobo

06 Mar 2018

3 min read

Pandas on Ray: Make Pandas faster by replacing one line of your code

Savia Lobo

06 Mar 2018

3 min read

Pandas on Ray is the latest development in the Ray framework. It is a DataFrame library that wraps Pandas and provides a transparent distribution of data and computation. Pandas on Ray is targeted towards existing Pandas users who are looking to improve performance and see faster runtimes without having to switch to another API. It accelerates Pandas queries by 4 times on an 8-core machine. This requires users to change just a single line of code in their notebooks. Ray: A machine learning substitute for Apache Spark Developed by two Ph.D.students, Philipp Moritz and Robert Nishihara, at the RISELab, Ray is a a distributed execution framework for AI applications and also a potential project to replace Apache Spark. RISELab is the successor to the U.C.Berkeley group, which created Apache Spark. Apache Spark was designed to be faster than its forerunner, MapReduce, but still faced issues with design decisions which made it difficult to write applications that included Complex task dependencies. This was mainly because of Spark’s internal synchronization mechanisms. Ray was designed to provide better speeds than Apache Spark. Ray, is designed to provide better speeds than even Apache Spark. It is written in C++ and aims at accelerating the execution of machine learning algorithms developed in Python. It makes use of an immutable object model--any objects that can be made immutable don’t need to be synchronized across the cluster--which save a lot of time. Also, Ray maintains a state of computation among various other nodes in the cluster, which in turn maximizes robustness. Additional features include: Ray can handle heterogeneous hardware (where some application workload is being executed on CPUs and others on GPUs) as it has a number of schedulers that can bring both CPUs and GPUs together. It can also borrow task-dependency attributes from MPI, the low-level distributed programming environment. Ray is also useful for building an array of applications that require fast decision-making on real-world data such as what’s required for autonomous driving and so on. Pandas on Ray On comparing Pandas with Pandas on Ray, following results were obtained: Pandas on Ray: 100 loops, best of 3: 4.14 ms per loop Pandas: The slowest run took 32.21 times longer than the fastest. This could mean that an intermediate result is being cached. 1 loop, best of 3: 17.3 ms per loop This concluded, Pandas on Ray is about 4 times faster than Pandas. This was run on a machine with eight cores, so the speedup isn't perfect because of the overheads. Here, no special optimizations were done for Pandas on Ray; only the default settings were used in this experimentation. Also, Ray uses Eager execution and thus one cannot have query planning or have advanced knowledge of the best way to compute a given workflow. To know more about Ray in detail, visit its GitHub repository. Also, to more about Pandas on Ray at the RISELab blog.

0
0
2808

Savia Lobo

05 Mar 2018

2 min read

TensorFlow 1.6.0 is here!

Savia Lobo

05 Mar 2018

2 min read

After a sneak-peek into TensorFlow’s release candidates 1.6.0-rc0 and 1.6.0-rc1, its major release 1.6.0 is finally here! Tensorflow 1.6.0 includes two new breaking changes, feature improvements and bug fixes in its list. The previous version, TensorFlow 1.5, introduced us to jaw dropping inclusions such as TensorFlow Lite developer preview and TensorFlow Eager Execution. Let’s have a look at what’s in store with the newly released TensorFlow version 1.6.0. The two most important changes include: The prebuilt binaries are now built against CUDA 9.0 and cuDNN 7 These prebuilt binaries would now use AVX instructions, which may break TensorFlow on older CPUs. List of major feature improvements: A new optimizer internal API for non-slot variables. tf.estimator.{FinalExporter,LatestExporter} can now export stripped SavedModels, which improves forward compatibility of the SavedModel. FFT support has been added to XLA CPU/GPU. Also, Android TF can now be built with CUDA acceleration on compatible Tegra devices. Few API changes in 1.6.0: Introducing prepare_variance boolean with default setting to False for backward compatibility. Move layers_dense_variational_impl.py to layers_dense_variational.py. Minor bug fixes include: Addition of a client-side throttle in the Google Cloud Storage (GCS). Addition of a FlushCaches() method to the FileSystem interface, with an implementation for GcsFileSystem. In addition to these, TensorFlow 1.6.0 includes a second version of the Getting started guide exclusively for newcomers in Machine learning. Not only this, documentation for TPUs is a must-watch. It also includes certain other changes which you will be able to read at the GitHub version release page.

0
0
13930

Packt Editorial Staff

05 Mar 2018

3 min read

5th March 2018 – Data Science News Daily Roundup

Packt Editorial Staff

05 Mar 2018

3 min read

Tensorflow 1.6.0, Pandas on Ray, Google-Landmarks, Microsoft’s Custom Vision and Face API and more in today’s top stories around machine learning, deep learning, and data science news. 1. Tensorflow v1.6.0 finally makes its debut Tensorflow 1.6.0 has finally released after two release candidates. The breaking changes, major features, and improvements include: Pre-built binaries are now built against CUDA 9.0 and cuDNN 7. Pre-built binaries will use AVX instructions. New Optimizer internal API for non-slot variables. tf.estimator.{FinalExporter,LatestExporter} now export stripped SavedModels. FFT support added to XLA CPU/GPU. Android TF can now be built with CUDA acceleration on compatible Tegra devices. To know about Bug Fixes and other changes, you may visit the GitHub repo. 2. Pandas on Ray, A DataFrame library for making Pandas faster The team at UC Berkeley are developing a DataFrame library that wraps Pandas and transparently distributes the data and computation. The early stage library, Pandas on Ray, can accelerate Pandas queries by 4x on an 8-core machine, only requiring users to change a single line of code in their notebooks. Pandas on Ray is targeted towards existing Pandas users who are looking to improve performance and see faster runtimes without having to switch to another API. The ultimate goal of this project is to be able to use Pandas in a cloud setting. 3. Google launches Landmarks, a new Dataset for Landmark Recognition Google has released Google-Landmarks, the largest worldwide dataset for recognition of human-made and natural landmarks. The dataset contains more than 2 million images depicting 30 thousand unique landmarks from across the world and a number of classes that is ~30x larger than what is available in commonly used datasets. Additionally, Google is also open-sourcing Deep Local Features (DELF), an attentive local feature descriptor. They have also launched two Kaggle challenges. The recognition track challenge is to build models that recognize the correct landmark in a dataset of challenging test images, while the retrieval track challenges participants to retrieve images containing the same landmark. 4. Microsoft Azure adds computer vision and image processing capabilities Microsoft has updated its Azure platform with computer vision capabilities with the launch of Custom Vision, a service that lets developers train models for processing specific kind of images. Alongside Custom Vision, the company also made its Face API service for face and emotion detection generally available. The major improvement in Face API includes a scalability boost that enables the service to recognize up to a million different individuals within images. It also launched Bing Entity Search, which allows developers to harness Microsoft’s search engine to help users find needed information within their application. 5. Intela launches Farrago, an online tool that uses machine learning to clean up dirty data Data science company Intela AI, launches Farrago, a machine learning tool to clean up dirty data. This tool can automate the manual work of identifying and removing duplicate records from databases. It can also analyze a company’s data and intelligently recommend the best way to organize, clean and transform it. According to Intela CEO Asa Cox, “Farrago could save a company, client or programme, hundreds of man-hours of time spent manually (or semi-manually) cleaning data.” The online demonstration of Farrago is readily available.

0
0
1527

article-image-learn-azure-serverless-computing-free-download-ebook-microsoft

Packt

05 Mar 2018

2 min read

Learn Azure serverless computing for free - Download a free eBook from Microsoft

Packt

05 Mar 2018

2 min read

There has been a lot of noise around serverless computing over the last couple of years. There have been arguments that it’s going to put the container revolution to bed, and while that’s highly unlikely (containers and serverless are simply different solutions that are appropriate in different contexts), it’s significant that a trend like serverless could emerge so quickly to capture the attention of engineers and architects. It says a lot about the rapidly changing nature of software infrastructures and the increased demands for agility, scalability, and power. Azure is a cloud solution that’s only going to help drive serverless adoption further. But we know there’s always some trepidation among tech decision makers when choosing to implement something new or use a new platform. That’s why we’re delighted to be partnering with Microsoft Azure to give the world free access to Azure Serverless Computing Cookbook. Packed with more than 50 Azure serverless tutorials and recipes to help solve common and not so common challenges, this 325-page eBook is both a useful introduction to Azure’s serverless capabilities and a useful resource for anyone already acquainted with it. Simply click here to go to Microsoft Azure to download the eBook for free.

0
0
16569

article-image-1st-march-2018-data-science-news-daily-roundup

Packt Editorial Staff

01 Mar 2018

4 min read

1st March 2018 – Data Science News Daily Roundup

Packt Editorial Staff

01 Mar 2018

4 min read

Apache Spark 2.3 now on Databricks Runtime 4.0 Beta, Twitter donates Heron to Apache Software Foundation, new Blockchain-based platform to build AI apps, and more in today’s top stories around machine learning, deep learning, and data science news. 1. Apache Spark 2.3 Now Available on Databricks Runtime 4.0 Beta Databricks announced the availability of Apache Spark 2.3.0 on Databricks as part of its Databricks Runtime 4.0 beta. The Spark 2.3: Marks a major milestone for Structured Streaming by introducing low-latency continuous processing and stream-to-stream joins. Boosts PySpark by improving performance with pandas UDFs Runs on Kubernetes clusters by providing native support for Apache Spark applications. The release extends new functionality to SparkR, Python, MLlib, and GraphX. It also focuses on usability, stability, and refinement, resolving over 1400 tickets. For additional features and other information, read the Spark 2.3 release notes. 2. New Blockchain-Based Platform to Collectively Build AI Apps Dbrain, a new project built on the Ethereum Blockchain leverages smart contracts to develop a simple tool that allows everyone to label and validate data in exchange for cryptocurrency. Dbrain introduces a platform that targets businesses and data scientists that need the data to develop AI solutions. By building smart contracts on Ethereum, Dbrain plans to use its internal protocols to solve fundamental AI-based development, execution, and adoption challenges which include: Dataset quality Trust and security Infrastructure costs It aims to have a complete AI production line integrated with its platform. Thus it combines labeling functionality and ensures that payment validation is transparent. It also aims to provide customized AI solutions within a single product. To know more about the challenges in detail, visit Dbrain’s blogpost. 3. Now on GitHub: The Autonomous Driving Cookbook from Microsoft as Jupyter Notebooks The Autonomous Driving Cookbook from Microsoft is now available on GitHub. The cookbook is an open source collection of scenarios, tutorials, and demos to help you quickly onboard various aspects of the autonomous driving pipeline. It is an ongoing project developed and maintained by the Deep Learning and Robotics chapter of Microsoft Garage, the team that helped develop the recent expansion of AirSim to include cars for autonomous driving research. Tutorials in the cookbook are presented as Jupyter notebooks, making it very easy for you to download the instructions and get started without a lot of setup time. To help this further, wherever needed, tutorials come with their own datasets, helper scripts and binaries. Read more of this in detail at Microsoft + Open Source Blogpost here. 4. Tenet Partners Launches Data Analytics Platform Tenet Partners announced CoreBrand® Data Science, a new business unit leveraging the power of predictive analytics and data science to transform how corporations and capital markets can generate value from their brands. Tenet Partners help the C-suite to drive positive business outcomes by using a combination of research that underpins the CoreBrand® Index and advanced analytics. Read the official press release for detailed information about this launch. 5. Twitter donates Heron to Apache Software Foundation Twitter announced that it is donating Heron to the Apache Incubator where the community will continue to grow and thrive under the guidance of the Apache Software Foundation. Heron is a real-time analytics platform developed by Twitter to reliably process billions of events generated at Twitter every day. Open-sourced in 2011, it is the next generation distributed streaming engine that was built to be backwards compatible with Apache Storm. It was built to improve Twitter’s developer and operational experiences with Storm and introduced a wide array of architectural improvements and native support for Apache Aurora. Heron has become Twitter’s primary streaming system, reliabily powering all of Twitter’s real-time analytics and running hundreds of development and production topologies deployed on thousands of nodes. For more, read Twitter’s official announcement.

0
0
1320

article-image-paper-two-minutes-certifiable-distributional-robustness-principled-adversarial-training

Savia Lobo

01 Mar 2018

3 min read

Paper in two minutes: Certifiable Distributional Robustness with Principled Adversarial Training

Savia Lobo

01 Mar 2018

3 min read

Certifiable Distributional Robustness with Principled Adversarial Training, a paper accepted for ICLR 2018, is a collaborative effort of Aman Sinha, Hongseok Namkoong, and John Duchi. In this paper, the authors state the vulnerability of neural networks to adversarial examples and further take the perspective of a distributionally robust optimization which guarantees performance under adversarial input perturbations. Certifiable Distributional Robustness with Applying Principled Adversarial Training What problem is the paper trying to solve? Recent works have shown that neural networks are vulnerable to adversarial examples; seemingly imperceptible perturbations to data can lead to misbehavior of the model, such as misclassifications of the output. Many researchers proposed adversarial attack and defense mechanisms to counter these vulnerabilities. While these works provide an initial foundation for adversarial training, there are no guarantees on whether proposed white-box attacks can find the most adversarial perturbation and whether there is a class of attacks such defenses can successfully prevent. On the other hand, verification of deep networks using SMT (satisfiability modulo theories) solvers provides formal guarantees on robustness but is NP-hard in general. This approach requires prohibitive computational expense even on small networks. The authors take the perspective of distributionally robust optimization and provide an adversarial training procedure with provable guarantees on its computational and statistical performance. Paper summary This paper proposes a principled methodology to induce distributional robustness in trained neural nets with the purpose of mitigating the impact of adversarial examples. The idea is to train the model to perform well not only with respect to the unknown population distribution, but to perform well on the worst-case distribution in a Wasserstein ball around the population distribution. In particular, the authors adopt the Wasserstein distance to define the ambiguity sets. This allows them to use strong duality results from the literature on distributionally robust optimization and express the empirical minimax problem as a regularized ERM (empirical risk minimization) with a different cost. Key takeaways The paper provides a method for efficiently guaranteeing distributional robustness with a simple form of adversarial data perturbation. The method values strong statistical guarantees and fast optimization rates for a large class of problems. Empirical evaluations indicate that the proposed methods are in fact robust to perturbations in the data, and they outperform less-principled adversarial training techniques. The major benefit of this approach is its simplicity and wide applicability across many models and machine-learning scenarios. Reviewer comments summary Overall Score: 27/30 Average Score: 9 The reviewers have strongly accepted this paper and have stated that it is of a great quality and originality. They said that this paper is an interesting attempt, but some of the key claims seem to be inaccurate and miss comparison to proper baselines. Another reviewer said, the paper applies recently developed ideas in the literature of robust optimization, in particular distributionally robust optimization with Wasserstein metric, and showed that under this framework for smooth loss functions when not too much robustness is requested, then the resulting optimization problem is of the same difficulty level as the original one (where the adversarial attack is not concerned). The paper has also received some criticisms but at the end of all it is majorly liked by many of the reviewers.

0
0
7537

article-image-jupyterlab-set-phase-jupyter-notebooks

Savia Lobo

28 Feb 2018

3 min read

Is JupyterLab all set to phase out Jupyter Notebooks?

Savia Lobo

28 Feb 2018

3 min read

To keep up with Project Jupyter’s motto of developing open-source software, open-standards, and services with a goal to offer interactive computing across various programming languages they released JupyterLab beta readily available for users this month. JupyterLab is tagged as the next generation UI for Project Jupyter, and is a successor to Jupyter Notebooks, a successful and a widely adopted application launched by Project Jupyter last year. Saying hello to JupyterLab Jupyter Notebook is an open-source web application that allows users to create and share documentations that contains live code, visualizations, narrative text, and equations. Jupyter notebooks are used for tasks such as data cleaning, data transformation, numerical simulation, machine learning, and many more. It is now well established that the data science community loves using Jupyter Notebooks for interactive computing. However, there are certain barriers they face which made their interaction with Jupyter Notebook a little less than ideal. Some of the cons include: Transition from different building blocks within a workflow is difficult Real-time collaboration of notebooks onto Dropbox or Google Drive is not possible with Jupyter Notebooks. Too many wasted spaces on the right and left of the Jupyter notebook These are some of the issues with Jupyter Notebooks, which are taken care of in the brand new JupyterLab. A swift move to JupyterLab JupyterLab has complete support for Jupyter Notebooks. So, one won’t miss working with notebooks but can do a lot more using JupyterLab. JupyterLab is an interactive environment which allows you to work with notebooks, code, and data, all under one roof. The most important feature of JupyterLab is real-time collaboration with several people on a single project. An add-on to this is its user-friendly interface, which makes it all the more easy-to-use. JupyterLab also shows a high level of integration between notebooks. This means, you can drag-and-drop notebooks cells and can also copy them between notebooks. You can also run code blocks from text files with .py, .R, .tex extensions. JupyterLab can also multi-task, i.e. you can open up notebooks, text editors, terminals, and other components, view them and edit them in different tabs simultaneously. JupyterLab offers an entire range of extensions which could be used to enhance parts of JupyterLab. One can choose from a variety of themes, editors, and renderers for rich outputs on notebooks. JupyterLab extensions are npm packages (the standard package format in Javascript development). There are also many community-developed extensions being built on GitHub. To find extensions, you can search GitHub for jupyterlab-extension. You can also check out the developer documentation guide for information on developing extensions. Some additional features of JupyterLab include: JupyterLab is more about development unlike Jupyter Notebook which focuses on presentation. Developers can perform syntax completion using the Tab key and object tool-tip inspection, using the Shift-Tab keys. Files can be opened up in variety of formats. Also, developers can run their codes interactively inside of 'consoles' and not only notebooks. This promotes an imperative programming mode for them. JupyterLab accommodates notebooks in multiple languages, provided the kernels for those languages are installed. Browsers such as Chrome, Firefox, and Safari are compatible with JupyterLab. The Jupyter community plans to unleash version 1.0 of JupyterLab some time later this year. The version 1.0 will replace the classic Jupyter Notebook. However, the notebook document format would be supported by both classic notebook as well as JupyterLab. For a further detailed information on JupyterLab beta, visit Project Jupyter’s official blogpost

0
0
15341

article-image-28th-feb-2018-data-science-news-daily-roundup

Packt Editorial Staff

28 Feb 2018

4 min read

28th Feb 2018 – Data Science News Daily Roundup

Packt Editorial Staff

28 Feb 2018

4 min read

Algorithmia’s AI smart contract, Microsoft’s ML server 9.3, PostgreSQL 10 supported in Amazon RDS, Bitcoin Core 0.16.0, and more in today’s top stories around machine learning, blockchain, and data science news. 1. Algorithmia has developed an AI smart contract with a neural network running on the Ethereum blockchain Algorithmia Inc, the AI and ML algorithm marketplace provider, has created the first ever AI smart contract with a neural network running on the Ethereum blockchain. The contract basically offers a bounty for developers to create an AI model that can determine voter preferences based on their latitude and longitude. The smart contract will use the blockchain to automatically validate the solution. Here’s how the model works: The buyer creates a new contract. The contract is published to the Ethereum blockchain. Machine Learning engineers download the data and train an AI/ML model. The model is submitted and run on the ethereum blockchain using the data set from the contract. If the model fulfils the criteria of the contract, the model is sent to the buyer and payment sent to the ML engineer. 2. Microsoft Machine Learning Server 9.3 releases Microsoft Machine Learning Server 9.3 has been released. Key areas of change in the 9.3 release include: Set-up and configuration of Operationalization. Platform upgrades, better-together with Azure ML. Support for local Spark. Improved revoscalepy library. Linux R-Client support for SQL Server compute context. More partnerships and solution templates. Microsoft Machine Learning Server 9.3 can be downloaded from Visual Studio Dev Essentials, or via ML Server VMs in Azure. It comes packed with the power of the open source R and Python engines, making both R and Python ready for enterprise-class ML and advanced analytics. 3. PostgreSQL 10 is now supported in Amazon RDS Amazon RDS for PostgreSQL now supports PostgreSQL major version 10. Amazon RDS for PostgreSQL makes it easy to set up, operate, and scale PostgreSQL deployments in the cloud. To use the new versions, users can create an Amazon RDS for PostgreSQL database instance with just a few clicks in the AWS Management Console, or upgrade an existing instance using point-and-click upgrades. PostgreSQL 10 includes various new features including native table partitioning, support for improved parallelism in query execution, ICU collation support, column group statistics, enhanced postgres_fdw extension, and many more. 4. Bitcoin Core 0.16.0 is now released Bitcoin Core version 0.16.0 is now available. This is a new major version release, including new features, various bug fixes, performance improvements, as well as updated translations. Bitcoin Core 0.16.0 introduces full support for segwit in the wallet and user interfaces. Version 0.16.0 will only create hierarchical deterministic (HD) wallets. It now has more flexibility in where the wallets directory can be located. The minimum version of the GCC compiler required to compile Bitcoin Core is now 4.8. Pruned nodes can now signal BIP159's NODE_NETWORK_LIMITED using service bits, in preparation for full BIP159 support in later versions. A new RPC ‘rescanblockchain’ has been added to manually invoke a blockchain rescan. Safe mode is now disabled by default and must be manually enabled. The `validateaddress` RPC output has been extended with a few new fields, and support for segwit addresses. The detailed report is available in the change log. 5. Introducing Draw.io JupyterLab extension, a Diagram Editor for JupyterLab The Draw.io JupyterLab extension is a LaTeX editor for JupyterLab which is an easy way to live-compile text documents, diagrams, flow charts and draw figures. The Draw.io JupyterLab extension takes advantage of the JupyterLab architecture: i.e. registering a new mime type (.dio) with the file explorer to open files, and adding a launcher button and menu items. It also provides multiple synchronized views of the same diagrams, displayed at the same time. It allows a user to visualize the same content with different zoom levels, or with a bare text editor. The entire code is available on GitHub.

0
0
1358

Google strides forward in deep learning: open sources Google Lucid to answer how neural networks make decisions

Apache Spark 2.3 now has native Kubernetes support!

Alteryx Analytics 2018.1 is here: The analytics platform for enterprises

Data Science News Daily Roundup – 7th March 2018

Data Science News Daily Roundup – 6th March 2018

Google-Landmarks, a novel dataset for instance-level image recognition

Google Bristlecone: A New Quantum processor by Google’s Quantum AI lab

Pandas on Ray: Make Pandas faster by replacing one line of your code

TensorFlow 1.6.0 is here!

5th March 2018 – Data Science News Daily Roundup

Trending Topics

Learn Azure serverless computing for free - Download a free eBook from Microsoft

1st March 2018 – Data Science News Daily Roundup

Paper in two minutes: Certifiable Distributional Robustness with Principled Adversarial Training

Is JupyterLab all set to phase out Jupyter Notebooks?

28th Feb 2018 – Data Science News Daily Roundup

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access