Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-cloudflare-raises-150m-with-franklin-templeton-leading-the-latest-round-of-funding

13 Mar 2019

4 min read

Cloudflare raises $150M with Franklin Templeton leading the latest round of funding

13 Mar 2019

After a long break from fundraising, yesterday Cloudflare, a U.S. based company that provides content delivery network services, Internet security, etc, announced that it raised $150 million of funding. The company also announced the joining of Stan Meresman, board member and chair of the Audit Committee of Guardant Health (GH) and Maria Eitel, founder and co-chair of the Nike Foundation as the board of directors. In 2014, Cloudflare raised around $110 million funding and the company has raised more than $330 million till date from investors including New Enterprise Associates, Union Square Ventures, Microsoft, Baidu, and many more. During the latest round of funding Franklin Templeton, an investment management company joined these investors and further extending its support to Cloudflare’s growth. Matthew Prince, co-founder and CEO of Cloudflare, said, “I’m honored to welcome Maria and Stan to our board of directors. Both of them bring a wealth of knowledge and experience to our board and know what it takes to propel companies forward. Our entire board looks forward to working with them as we continue to help build a better Internet.” Eitel has previously run European corporate affairs for Microsoft and worked in media affairs at the White House, and also had been an assistant to President George H.W. Bush. Eitel said, “My career has been focused on creating global change, and the Internet is a huge part of that. The Internet has the ability to unleash human potential, and I believe that Cloudflare is one of the major players able to drive the change that’s necessary for the world and Internet community.” Stan Meresman was previously CFO of Silicon Graphics (SGI) and Cypress Semiconductor (CY). He said, “Cloudflare’s technologies, customer base, and global network have helped propel the company to a position of leadership in the Internet ecosystem. I look forward to lending my skills and expertise to Cloudflare’s board in order to continue this growth and make even more of an impact.” According to a report by Reuters, last year, Cloudflare was considering an IPO in the first half of 2019, that could have valued the company more than $3.5 billion. According to this latest funding round, it seems that the company isn’t yet in the direction of going public, but Cloudflare is growing and public offering could possibly be the next big step. Few users are expecting the company to go public this year and are happy that the company is moving in a good direction. One of the users commented on HackerNews, “I do wonder how people feel about this internally though. There's a lot of expectation that the company would go public this year (and some even expected it would go public last year). Hopefully, no one needs the money they put in to early exercise any time soon!” Another comment reads, “Cloudflare is undergoing a lot of big projects to break away from the image that they are "just a CDN". Raising a round now instead of going public allows them to invest more on those projects instead of focusing on quarter to quarter results. Also, avoiding brain-drains post-IPO while they need those talents the most.” Few others think that the company might start monetizing over the data flow. A user commented, “Doesn't raising this kind of money scream that you're eventually going to start to monetize the data flowing through your network (e.g. telecoms selling location data to bounty hunters)?” To know more about this news, check out the official announcement. Cloudflare takes a step towards transparency by expanding its government warrant canaries workers.dev will soon allow users to deploy their Cloudflare Workers to a subdomain of their choice Cloudflare’s 1.1.1.1 DNS service is now available as a mobile app for iOS and Android

0
0
13205

article-image-introducing-googles-tangent

Sugandha Lahoti

14 Nov 2017

3 min read

Introducing Google's Tangent: A Python library with a difference

Sugandha Lahoti

14 Nov 2017

3 min read

The Google Brain team, in a recent blog post, announced the arrival of Tangent, an open source and free Python library for ahead-of-time automatic differentiation. Most machine learning algorithms require the calculation of derivatives and gradients. If we do it manually, it is time-taking as well as error-prone. Automatic differentiation or autodiff is a set of techniques to accurately compute the derivatives of numeric functions expressed as computer programs. Autodiff techniques can run large-scale machine learning models with high-performance and better usability. Tangent uses the Source code transformation (SCT) in Python to perform automatic differentiation. What it basically does is, take the Python source code as input, and then produce new Python functions as its output. The new python function calculates the gradient of the input. This improves readability of the automatic derivative code similar to the rest of the program. In contrast, TensorFlow and Theano, the two most popular machine learning frameworks do not perform autodiff on the Python Code. They instead use Python as a metaprogramming language to define a data flow graph on which SCT is performed. This at times is confusing to the user, considering it involves a separate programming paradigm. Source: https://github.com/google/tangent/blob/master/docs/toolspace.png Tangent has a one-function API: import tangent df = tangent.grad(f) For printing out derivatives: import tangent df = tangent.grad(f, verbose=1) Because it uses SCT, it generates a new python function. This new function follows standard semantics and its source code can be inspected directly. This makes it easy to understand by users, easy to debug, and has no runtime overhead. Another highlighting feature is the fact that it is easily compatible with TensorFlow and NumPy. It is high performing and is built on Python, which has a large and growing community. For processing arrays of numbers, TensorFlow Eager functions are also supported in Tangent. This library also auto-generates derivatives of codes that contain if statements and loops. It also provides easy methods to generate custom gradients. It improves usability by using abstractions for easily inserting logic into the generated gradient code. Tangent provides forward-mode auto differentiation. This is a better alternative than the backpropagation, which fails for cases where the number of outputs exceeds the number of inputs. In contrast, forward-mode auto diff runs in proportion to the input variables. According to the Github repository, “Tangent is useful to researchers and students who not only want to write their models in Python but also read and debug automatically-generated derivative code without sacrificing speed and flexibility.” Currently Tangent does not support classes and closures. Although the developers do plan on incorporating classes. This will enable class definitions of neural networks and parameterized functions. Tangent is still in the experimental stage. In the future, the developers plan to extend it to other numeric libraries and add support for more aspects of the Python language. These include closures, classes, more NumPy and TensorFlow functions etc. They also plan to add more advanced autodiff and compiler functionalities. To summarize, here’s a bullet list of key features of Tangent: Auto differentiation capabilities Code is easy to interpret, debug, and modify Easily compatible Custom Gradients Forward-mode autodiff High performance and optimization You can learn more about the project on their official GitHub.

0
0
13203

article-image-aws-introduces-amazon-documentdb-featuring-compatibility-with-mongodb-scalability-and-much-more

Amrata Joshi

10 Jan 2019

4 min read

AWS introduces Amazon DocumentDB featuring compatibility with MongoDB, scalability and much more

Amrata Joshi

10 Jan 2019

4 min read

Today, Amazon Web Services (AWS) introduced a MongoDB compatible Amazon DocumentDB designed to provide performance, scalability, and availability needed while operating mission-critical MongoDB workloads. Customers use MongoDB as a document database to retrieve, store and manage semi-structured data. But it is difficult to build performant, highly available applications that can quickly scale to multiple terabytes and thousands of reads- and writes-per-second because of the complexity that comes with setting up MongoDB clusters at scale. https://twitter.com/nathankpeck/status/1083144657591255043 Amazon DocumentDB uses a fault-tolerant, distributed and self-healing storage system that auto-scales up to 64 TB per database cluster. With AWS Database Migration Service (DMS) users can migrate their MongoDB databases which are on-premise or on Amazon EC2 to Amazon DocumentDB for free (for six months) with no downtime. Features of Amazon DocumentDB Compatibility Amazon DocumentDB, compatible with version 3.6 of MongoDB, also implements the Apache 2.0 open source MongoDB 3.6 API. This implementation is possible by emulating the responses that a MongoDB client expects from a MongoDB server, further allowing users to use the existing MongoDB drivers and tools with Amazon DocumentDB. Scalability Storage in the Amazon DocumentDB can be scaled from 10 GB to 64 TB in increments of 10 GB. With this document database service, users don’t have to preallocate storage or monitor free space. Users can choose between six instance sizes (15.25 GiB (Gibibyte) to 488 GiB of memory) and also create up to 15 read replicas. Storage and compute can be decoupled and one can easily scale each one independently as needed. Performance Amazon DocumentDB stores database changes as a log stream which allows users to process millions of reads per second with millisecond latency. This storage model provides an increase in the performance without compromising data durability and further enhance the overall scalability. Reliability Amazon DocumentDB’s 6-way storage replication provides high availability. It can failover from primary to a replica within 30 seconds and also supports MongoDB replica set emulation such that the applications can quickly handle system failure. Fully Managed Amazon DocumentDB is fully managed with fault detection, built-in monitoring, and failover. Users can set up daily snapshot backups, take manual snapshots, or use either one to create a fresh cluster if necessary. It integrates with Amazon CloudWatch, so users can monitor over 20 key operational metrics for their database instances via the AWS Management Console. Secure Users can encrypt their active data, snapshots, and replicas with the KMS (Key Management Service) key while creating Amazon DocumentDB clusters. In this document database service, authentication is enabled by default. For the security of the database, it also uses network isolation with the help of Amazon VPC. According to Infoworld, this news has given rise to few speculations as AWS isn’t promising that its managed service will work with all applications that use MongoDB. But this move by Amazon has now led to a new rivalry. MongoDB CEO and president Dev Ittycheria told Techcrunch, “Imitation is the sincerest form of flattery, so it’s not surprising that Amazon would try to capitalize on the popularity and momentum of MongoDB’s document model. However, developers are technically savvy enough to distinguish between the real thing and poor imitation. MongoDB will continue to outperform any impersonations in the market.” As reported by Geekwire and Techcrunch, Amazon DocumentDB’s compatibility with MongoDB is unlikely to require commercial licensing from MongoDB. https://twitter.com/tomkrazit/status/1083165858891915264 To know more about Amazon DocumentDB, check out Amazon DocumentDB. US government privately advised by top Amazon executive on web portal worth billions to the Amazon; The Guardian reports Amazon Rekognition faces more scrutiny from Democrats and German antitrust probe Amazon re:Invent Day 3: Lamba Layers, Lambda Runtime API and other exciting announcements!

0
0
13193

article-image-openai-block-sparse-kernels-accelerating-neural-networks

Savia Lobo

08 Dec 2017

3 min read

OpenAI announces Block sparse GPU kernels for accelerating neural networks

Savia Lobo

08 Dec 2017

3 min read

OpenAI, an Artificial intelligence research firm, brings in a wave of faster GPUs with their new GPU kernels, Block-Sparse GPU Kernels--software programs optimized to build sparse networks on Nvidia’s hardware chips. These help in building faster yet efficient neural networks. Also, it won’t eat up much of memory space on your GPUs. Neural networks are a complex branch of AI and are built using layers of connected nodes. However, their processing power is restricted by the architecture of the GPUs that they run on. Due to which, neural networks lack the presence of an efficient GPU implementation for sparse linear operations. Researchers at OpenAI say that it is now possible to make neural networks highly efficient by bringing in sparse matrices into their design. How sparse matrix helps GPUs A sparse matrix is simply a mathematical matrix filled in with multiple entries of value zero. Such zero-valued elements can be easily compresses and detoured within matrix multiplications, which in turn saves computation time and also takes up lesser memory on GPUs. Source: https://blog.openai.com/block-sparse-gpu-kernels/ The saved computational power can be later on used to train deep neural networks more efficiently. This means, neural networks can multi-task by performing inference, and running algorithms simultaneously, that too 10 times faster than the regular matrices. The problem that OpenAi face with these sparse matrix is, Nvidia, the biggest name in the manufacturing of GPUs for neural networks does not have a support for sparse matrix models within its hardware. Enter Block sparse GPU kernels... Block sparse GPU kernels: Sparse matrix gets an upgrade To overcome the problem with sparsity within the Nvidia hardware, a team of researchers at OpenAI developed Block sparse GPU kernels. Source: https://blog.openai.com/block-sparse-gpu-kernels/ Key points to note about block sparse GPU kernels: They are written in Nvidia’s CUDA programming language. At present, they are only compatible with TensorFlow Also, they only support Nvidia’s GPUs. OpenAI also declared that it is sharing its block sparse GPU kernels with the wider research community in order to put it to use in other developments. Also, these kernels would be expanded to support other hardware and frameworks. OpenAI used the neural network enhanced with the block sparse GPU kernels, to carry out sentiment analysis on the reviews for IMDB and Amazon. The result was, these sparse models won over the dense models on all sentiment datasets. Source: https://s3-us-west-2.amazonaws.com/openai-assets/blocksparse/blocksparsepaper.pdf OpenAI also mentioned that their sparse model improved at a state-of-the-art level on the IMDB dataset from 5.91% error to 5.01%. They say it has been a promising improvement over their previous results, which performed extremely well on shorter sentence level datasets. As these new kernels seem very promising, the OpenAI research team does not have an ultimate view on when and where these kernels would help. The community promises to explore this space further. To learn how to install and develop Block sparse GPU kernels, click on the GitHub link here.

0
0
13189

article-image-linux-foundation-introduces-strict-telemetry-data-collection-and-usage-policy-for-all-its-projects

Fatema Patrawala

31 Oct 2019

3 min read

Linux Foundation introduces strict telemetry data collection and usage policy for all its projects

Fatema Patrawala

31 Oct 2019

3 min read

0
0
13183

article-image-facebook-introduces-a-fully-convolutional-speech-recognition-approach-and-open-sources-wav2letter-and-flashlight

Bhagyashree R

24 Dec 2018

3 min read

Facebook introduces a fully convolutional speech recognition approach and open sources wav2letter++ and flashlight

Bhagyashree R

24 Dec 2018

3 min read

Last week, Facebook AI Research (FAIR) speech team introduced the first fully convolutional speech recognition approach. Additionally, they have also open-sourced flashlight, a C++ library for machine learning and wav2letter++, a fast and simple system for developing end-to-end speech recognizers. Fully convolutional speech recognition approach The current state-of-the-art-speech recognition systems are built on RNNs for acoustic or language modeling. Facebook’s newly-introduced system provides an alternative approach based solely on convolutional neural networks. This system eliminates the feature extraction step altogether as it is trained end-to-end to predict characters from the raw waveform. It uses an external convolutional language model to decode words. The following diagram depicts the architecture of this CNN-based speech recognition system: Source: Facebook Learnable frontend: This section of the system first contains a convolution of width 2 that emulates the pre-emphasis step followed by a complex convolution of width 25 ms. After calculating the squared absolute value, the low-pass filter and stride perform the decimation. The frontend finally applies a log-compression and a per-channel mean-variance normalization. Acoustic model: It is a CNN with gated linear units (GLU), which is fed with the output of the learnable frontend. These acoustic models are trained to predict letters directly with the Auto Segmentation Criterion. Language model: The convolutional language model (LM) contains 14 convolutional residual blocks and uses GLUs as the activation function. It is used to score candidate transcriptions in addition to the acoustic model in the beam search decoder. Beam-search decoder: The beam-search decoder is used to generate word sequences given the output from our acoustic model. Apart from this CNN-based approach, Facebook released the wav2letter++ and flashlight frameworks to complement this approach and enable reproducibility. flashlight is a C++ standalone library for machine learning. It uses the ArrayFire tensor library and features just-in-time compilation with modern C++. It targets both CPU and GPU backends to provide maximum efficiency and scale. The wav2letter++ toolkit is built on top of flashlight and written entirely in C++. It also uses ArrayFire as its primary library for tensor operations. ArrayFire is a highly optimized tensor library that can execute on multiple backends including a CUDA GPU and CPU backed. It supports multiple audio file formats such as wav and flac. And, also supports several feature types including the raw audio, a linearly scaled power spectrum, log-Mels (MFSC) and MFCCs. To read more in detail, check out Facebook’s official announcement. Facebook halted its project ‘Common Ground’ after Joel Kaplan, VP, public policy, raised concerns over potential bias allegations Facebook releases DeepFocus, an AI-powered rendering system to make virtual reality more real The district of Columbia files a lawsuit against Facebook for the Cambridge Analytica scandal

0
0
13163

article-image-introducing-amazon-neptune-graph-database-service-applications

Savia Lobo

04 Dec 2017

4 min read

Introducing Amazon Neptune: A graph database service for your applications

Savia Lobo

04 Dec 2017

4 min read

Last week was lined up with many exhilarating product releases from Amazon at their AWS re:Invent. Releases pertaining to Machine learning, IoT, Cloud services, databases, and many more were unveiled, which gave an altogether new outlook. Amidst all these, Amazon Web Services announced a fast and a reliable graph database built exclusively for the cloud. Presenting Amazon Neptune! Well, Amazon isn’t entering into our solar system. By Amazon Neptune, it means a fully managed graph database for end users, which makes building and deploying applications a cakewalk. It also allows organizations to identify hidden datasets within a highly connected environment. Let’s explore some of the benefits: It is built exclusively to cater a high-performance service for storing billions of relationships and for running graph queries within a millisecond. Neptune backs the famous graph models such as Property Graph and W3C’s Resource Description Framework (RDF). It also supports their corresponding query languages such as Apache TinkerPop Gremlin and SPARQL. It allows customers to build queries with ease. Also, these queries can be efficiently steered through highly associated datasets. It has availability of more than 99.99%. Neptune continuously monitors data and backs it up to Amazon S3. It enables a point-in-time recovery from physical storage failures. Neptune is fault-tolerant and includes a self-healing storage within the cloud, which means, it can replicate six copies of data across three Availability Zones. It offers scalable database deployment with instance, types ranging from small to large--as per your needs. Neptune is highly secure with different levels of security for each database. It makes use of Amazon VPC for network isolation, AWS Key Management Service (KMS) for encryption at rest, and TLS for encryption in transit. Lastly, known as fully managed, Neptune excellently handles database management tasks, be it software patching, hardware provisioning, configurations, backups, and many more. One can also monitor the performance of their database using Amazon CloudWatch. Neptune in action: Possible use cases In social Networks: With the help of Amazon Neptune, one can easily set up large scale processing of user profiles and interactions in order to build applications for social networks. Neptune offers graph queries that are highly interactive and provides a high throughput for bringing social features within any application. For instance, notifying the user with latest updates from their family or close friends’ zone. In Recommendation Engines : As Neptune features a highly available graph database, it allows one to store relationships between information such as customer interests, purchase history, and so on. It can also draft a query to fire personalized and relevant recommendations. For instance, add a friend recommendation based on your mutual friends. In fraud detection: A graph query can be built which allows easy detection of relationship patterns such as multiple people making use of a similar e-mail id, or people using similar IP address. In this way, Neptune consists of a fully managed service, which helps in detecting possible fraud cases by analyzing buyers who make use of fraudulent e-mail and IP addresses. In knowledge graphs: Neptune allows you to store information within a graph model and makes use of graph queries to let customers easily navigate through information. For instance, a person interested in knowing about The Great Wall of China, can also know the other wonders of the world and where each of them are located. Additionally, it can recommend other places to visit in China, and so on. Thus, with a knowledge graph one can give additional information based on varied topics. In Network/IT operations: By building a graph pattern, Neptune can track the origin of a malicious file i.e the host that spread the malicious file and the host that downloaded it. Though in its infancy, Amazon Neptune can shoot up to great heights as and when it is absorbed by many organizations. Although, it has many competitors, but it would be exciting to see how it paves a way amidst all, and shines as the brightest ‘graph database’ planet.

0
0
13160

article-image-anaconda-version-5-1-1-released

Savia Lobo

14 Mar 2018

2 min read

Anaconda Enterprise version 5.1.1 released!

Savia Lobo

14 Mar 2018

2 min read

Anaconda, a Python-based tool for encapsulating, running, and reproducing data science projects has released its enterprise version 5.1.1. This release includes some administrator-facing and user-facing changes. Following are some of the changes included in the Anaconda Enterprise 5.1.1: Administrator-facing changes This version includes the ability to specify custom UID for service account at install-time (default UID: 1000) An added pre-flight checks for kernel modules, kernel settings, and filesystem options when installing or adding nodes. Improved consistency between GUI- and CLI-based installation paths. Also, and improved security and isolation between internal database from user sessions and deployments. Added capability to configure a custom trust store and LDAPS certificate validation Simplified installer packaging using a single tarball and consistent naming Updated documentation for system requirements, including XFS filesystem requirements and kernel modules/settings. Added documentation for configuring AE to point to online Anaconda repositories, securing the internal database, and an updated documentation for mirroring packages from channels. Other added documentation for configuring RBAC, role mapping, and access control and also for LDAP federation and identity management. Includes fixed issues related to deleting related versions of custom Anaconda parcels, default admin role (ae-admin), using special characters with AE Ops Center accounts/passwords, Administrator Console link in menu, and many more. Added command to remove channel permission User-facing changes This version includes some improvements to the collaborative workflow such as, added notification on changes made to a project, ability to pull changes, and resolve conflicting changes when saving or pulling changes into a project. Additional documentation and examples for connecting to remote data and compute sources: Spark, Hive, Impala, and HDFS Optimized startup time for Spark and SAS project templates. Improvement in the initial startup time of project creation, sessions, and deployments by pre-pulling images after installation. Increased upload limit of projects from 100 MB to 1GB Added capability to sudo yum install system packages from within project sessions Fixed R kernel in R project template, and issues related to loading sparklyr in Spark Project, displaying kernel names and Spark project icons. Improved performance when rendering large number of projects, packages, etc. Improved rendering of long version names in environments and projects Render full names when sharing projects and deployments with collaborators. Read more on this, and some other changes on the Anaconda Enterprise Documentation.

0
0
13154

article-image-facebook-is-investigating-data-analytics-firm-crimson-hexagon-over-misuse-of-data

Richard Gall

23 Jul 2018

2 min read

Facebook is investigating data analytics firm Crimson Hexagon over misuse of data

Richard Gall

23 Jul 2018

2 min read

Facebook has suspended Boston-based data analytics firm Crimson Hexagon following concerns that the company has misused data. The decision was made after the Wall Street Journal reported that the company has contracts with government agencies and "a Russian nonprofit with ties to the Kremlin." Back in March 2017, Facebook banned the use of data to develop surveillance tools. It's this ruling for which Crimson Hexagon are being investigated. A Facebook spokesperson, speaking to CNN Money on Friday, said: "We don't allow developers to build surveillance tools using information from Facebook or Instagram... We take these allegations seriously, and we have suspended these apps while we investigate.” Crimson Hexagon CTO responds with a blog post Crimson Hexagon hasn't explicitly responded to their suspension, but CTO Chris Bingham did write a blog post: "Understanding the Role of Public Online Data in Society." He writes that "the real conversation is not about a particular social media analytics provider, or even a particular social network like Facebook. It is about the broader role and use of public online data in the modern world." Although the investigation is ongoing it's worth noting, as TechCrunch has, that Crimson Hexagon isn't quite as opaque in its relationships and operations as Cambridge Analytica. They have, for example, done data analytics projects for the likes of Adidas, the BBC, and Samsung. Read next Google, Microsoft, Twitter, and Facebook team up for Data Transfer Project Is Facebook planning to spy on you through your mobile’s microphones? Did Facebook just have another security scare?

0
0
13131

article-image-whats-new-jupyter-notebook-5-3-0

Sugandha Lahoti

18 Jan 2018

2 min read

What’s new in Jupyter Notebook 5.3.0

Sugandha Lahoti

18 Jan 2018

2 min read

Buckle up, Guys. Jupyter Notebook version 5.3.0 is here! Jupyter Notebook, the popular language-agnostic HTML notebook application for Project Jupyter, is now available in version 5.3.0. The Notebook is an open-source web application for creating and sharing documents that contain live code, equations, visualizations and narrative text. It can be used for data cleaning and transformation, data visualization, and machine learning to name a few. The new version includes a myriad of bug fixes and changes, most notably terminal support for Windows. It also includes support for OS trash. So now the files deleted from the notebook dashboard are moved to the OS trash as opposed to being deleted permanently. Other changes include: A restart and run all button to the toolbar. Programmatic copy to clipboard is now allowed. DOM History API can be used for navigating between directories in the file browser. Translated files can now be added to folder(docs-translations). Token-authenticated requests cross-origin allowed by default. A “close” button is displayed on load notebook error. Action is added to command palette to run CodeMirror’s indentAuto on selection. A new option is added to specify extra services. Shutdown trans loss is now fixed. Finding available kernelspecs is now more efficient. The new version uses requirejs vs. require. It also fixes some ui bugs in firefox. It can now compare non-specific language code when choosing to use arabic numerals. Save-script deprecation is fixed. Moment locales in package_data are now included. The new version now has Use /files prefix for pdf-like files. The feature of adding a folder for document translation is now available. Users can now set the password, when login-in via token. Other minor changes can be found in the changelog. Users can upgrade to the latest release by pip install notebook --upgrade or conda upgrade notebook. It is recommended to upgrade to version 9+ of pip before upgrading notebook. Fun Fact: Jupyter is a loose acronym meaning Julia, Python, and R. These programming languages were the first target languages of the Jupyter application, but nowadays, the notebook also supports many other languages.

0
0
13125

article-image-tensorflow-data-validation-tfdv-automates-and-scales-data-analysis-validation-and-monitoring

Bhagyashree R

11 Sep 2018

2 min read

TensorFlow announces TensorFlow Data Validation (TFDV) to automate and scale data analysis, validation, and monitoring

Bhagyashree R

11 Sep 2018

2 min read

Today the TensorFlow team announced the launch of TensorFlow Data Validation (TFDV), an open-source library that enables developers to understand, validate, and monitor their machine learning data at scale. Why is TensorFlow Data Validation introduced? While building machine learning algorithms a lot of attention is paid on improving their performance. However, if the input data is wrong, all this optimization effort goes to waste. Understanding and validating small amount of data is easy, you can do it manually as well. However, in the real-world this is not the case. Data in production is huge and often arrives continuously and in big chunks. This is why, it is necessary to automate and scale the tasks of data analysis, validation, and monitoring. What are some features of TFDV? TFDV is part of the TensorFlow Extended (TFX) platform, a TensorFlow-based general-purpose machine learning platform. It is already being used by Google every day to analyze and validate petabytes of data. TFDV provides some of the following features: It can compute descriptive statistics that provide a quick overview of the data in terms of the features that are present and the shapes of their value distributions. It includes tools such as Facets Overview, which provides a visualization of the computed statistics for easy browsing. Data-schema can be generated automatically to describe expectations about data such as required values, ranges, and vocabularies. Since writing a schema can be a tedious task for datasets with lots of features, TFDV provides a method to generate an initial version of the schema based on the descriptive statistics. You can inspect the schema with the help of schema viewer. You can identify anomalies such as missing features, out-of-range values, or wrong feature types with Anomaly detection. Provides an anomalies viewer so that you can see what features have anomalies and learn more in order to correct them. To learn more on how it is used in production, read the official announcement by TensorFlow on Medium and also check out TFDV’s GitHub repository. Why TensorFlow always tops machine learning and artificial intelligence tool surveys TensorFlow 2.0 is coming. Here’s what we can expect. Can a production ready Pytorch 1.0 give TensorFlow a tough time?

0
0
13122

article-image-apples-ceo-tim-cook-calls-for-new-federal-privacy-law-while-attacking-the-shadow-economy-in-an-interview-with-time

Amrata Joshi

18 Jan 2019

4 min read

Apple’s CEO, Tim Cook calls for new federal privacy law while attacking the ‘shadow economy’ in an interview with TIME

Amrata Joshi

18 Jan 2019

4 min read

Last year we saw some major data breaches and top companies compromising user data. This year naturally the sentiments are strongly inclining towards protecting user’s data privacy. Just two days ago, U.S. Senator introduced a bill titled ‘American Data Dissemination (ADD) Act’ for creating federal standards of privacy protection for big companies including Google, Amazon, and Facebook. The U.S. Congress is yet to pass this bill. Yesterday, Tim Cook, CEO, Apple, asked the U.S. Congress to introduce a national privacy law for securing users’ personal data, while attacking the shadow economy which trades users’ data without their consent. https://twitter.com/guardian/status/1085847219419267073 In a statement to TIME magazine, Mr. Cook said, “Last year, before a global body of privacy regulators, I laid out four principles that I believe should guide legislation” The first one was the right to have personal data minimized. According to this principle, companies should challenge themselves for identifying information from customer data or avoid collecting it in the first place. The second one is the right to knowledge, which states the right to know what data is being collected and why. The third principle is the right to access which states companies should make it easy for users to access, correct and delete their personal data. And lastly, the right to data security, without which trust is not possible. According to Cook, companies that sell data will have to register with the Federal Trade Commission. Users and lawmakers are also unaware of the secondary markets who use personal information of users and fall under shadow economy. He pointed out that few companies are into trading user data and how most of the users are unaware of it. He says, “One of the biggest challenges in protecting privacy is that many of the violations are invisible. For example, you might have bought a product from an online retailer – something most of us have done. But what the retailer doesn’t tell you is that it then turned around and sold or transferred information about your purchase to a ‘data broker’ – a company that exists purely to collect your information, package it and sell it to yet another buyer.” In November, the campaign group Privacy International filed complaints asking regulators to investigate whether the basis of their businesses was working against GDPR, the European privacy regulation. Post which, top data brokers, companies such as Experian, Acxiom, Oracle, and Criteo, came under scrutiny in Europe. Ailidh Callander, Privacy International’s legal officer, said in a press release, “The data broker and ad-tech industries are premised on exploiting people’s data. Most people have likely never heard of these companies, and yet they are amassing as much data about us as they can and building intricate profiles about our lives. GDPR sets clear limits on the abuse of personal data.” Tim Cook called for comprehensive federal privacy legislation in the US for establishing a registry of data brokers, which would let consumers check what data of theirs is getting sold. The users will further have the right to easily remove their data from that market. He writes in the TIME magazine, “I and others are calling on the US Congress to pass comprehensive federal privacy legislation - a landmark package of reforms that protect and empower the consumer.” Tim Cook said in a statement to TIME magazine, “Let’s be clear: you never signed up for that. We think every user should have the chance to say, Wait a minute. That’s my information that you’re selling, and I didn’t consent.” Tim said companies should minimize the amount of data they collect and make an easier way for users to delete it. Tim Cook seems to have hit the chord with the public with this call. https://twitter.com/antoniogm/status/1085968094730674180 https://twitter.com/SecurityBeat/status/1086022312015642625 One of the users commented on Twitter, “You have a first-party relationship with FB/TWTR/etc. They show you ads on their service, you manage your data on it (which can be deleted or de-activated). They have to face whatever user outrage they cause.” Users won’t let their data getting compromised and are much agitated by platforms like Facebook. Few users are even thinking of deactivating their accounts on Facebook. A new privacy bill was introduced for creating federal standards for privacy protection aimed at big tech firms like Facebook, Google and Amazon Project Erasmus: Former Apple engineer builds a user interface that responds to environment light Cyber security researcher withdraws public talk on hacking Apple’s Face ID from Black Hat Conference 2019: Reuters report

0
0
13105

article-image-nvidia-makes-its-new-brain-for-autonomous-ai-machines-jetson-agx-xavier-module-available-for-purchase

Natasha Mathur

17 Dec 2018

3 min read

NVIDIA makes its new “brain for autonomous AI machines”, Jetson AGX Xavier Module, available for purchase

Natasha Mathur

17 Dec 2018

3 min read

NVIDIA made Jetson AGX Xavier module, its new “powerful brain” for autonomous AI machines, available for purchase worldwide, last week, starting at volume pricing $1099 for batches of 1,000 units or more. Jetson AGX Xavier module is the new addition to the Jetson TX2 and TX1 developer kits family. It is aimed at providing high-level performance and will allow the companies to go into volume production of applications that are developed on the Jetson AGX Xavier developer kit, that was released back in September. Jetson AGX Xavier module consumes as little as 10-watt power and delivers 32 trillion computer operations per second (TOPS). It is supported by a 512-core Volta GPU with Tensor Cores and an 8-core ARM v8.2 64-bit CPU. It also comes with two NVDLA deep learning chips and dedicated image, video and vision processors. Other than that, it is supported by the NVIDIA’s JetPack and DeepStream software development kits. JetPack is NVIDIA’s SDK for autonomous machines that includes support for AI, computer vision, multimedia and more. The DeepStream SDK will enable streaming analytics, and developers can build multi-camera and multi-sensor applications to detect and identify objects such as vehicles, pedestrians, and cyclists. “These SDKs save developers and companies time and money while making it easy to add new features and functionality to machines to improve performance. With this combination of new hardware and software, it’s now possible to deploy AI-powered robots, drones, intelligent video analytics applications and other intelligent devices at scale,” mentions the NVIDIA team. Jetson AGX Xavier module has already been put to use by Oxford Nanopore, a U.K. medical technology startup, where it handles DNA sequencing in real time with the MinION, a powerful handheld DNA sequencer. Also, Japan’s DENSO, a global auto parts maker, believes that Jetson AGX Xavier will be a key platform to helping it introduce AI to its auto parts manufacturing factories where it will help with boosting productivity and efficiency. “Developers can use Jetson AGX Xavier to build the autonomous machines that will solve some of the world’s toughest problems, and help transform a broad range of industries. Millions are expected to come onto the market in the years ahead”, says the NVIDIA team. NVIDIA open sources its game physics simulation engine, PhysX, and unveils PhysX SDK 4.0 NVIDIA leads the AI hardware race. But which of its GPUs should you use for deep learning NVIDIA shows off GeForce RTX, real-time raytracing GPUs, as the holy grail of computer graphics to gamers

0
0
13102

article-image-jack-ma-defends-the-extreme-996-work-culture-in-chinese-tech-firms

Natasha Mathur

16 Apr 2019

3 min read

Jack Ma defends the extreme “996 work culture” in Chinese tech firms

Natasha Mathur

16 Apr 2019

3 min read

It was just last month when Chinese developers protested over the “996 work schedule” on GitHub. The “996” work culture refers to an unofficial work schedule that requires employees to work from 9 am to 9 pm, 6 days a week, totaling up to 60 hours of work per week. The site, 996icu, that went viral last month, also called out to companies such as Youzan and Jingdong, who both follow the 996 work rule. One such example given was of a Jingdong PR who posted on their maimai ( Chinese business social network) account that “(Our culture is to devote ourselves with all our hearts (to achieve the business objectives)”, defending the 996 work culture. Jack Ma, co-founder and executive chairman of the Alibaba Group, is also among the ones who believe in promoting the 12-hour, 6-day week working schedule. Ma defended the Chinese 996 tech-life on Alibaba’s WeChat account, last week, and chastised people wanting a balanced and typical eight-hour work shift. “In this world, everyone wants success, wants a nice life, wants to be respected. Let me ask everyone if you don't put out more time and energy than others, how can you achieve the success you want?" Compared to them, up to this day, I still feel lucky, I don't regret (working 12 hour days), I would never change this part of me”, states Ma. Other companies to support Ma includes JD.com Inc., whose chief executive, Richard Liu, said that although he wouldn’t force people to work 996, however, people who have slacked off are not his “brothers”, reports Bloomberg. “Many people just want to buy a car and buy a house for Mom and Dad. This is very important, but I think you should have this ideal...some things that you are willing to do can help your child to be more blessed and help you to be blessed. Isn't it a good thing? This requires 996”, states Ma. Jack Ma’s remarks in defense of the 996 work schedule have again ignited uproar among people who strictly disapprove of this extreme over-time work environment in Chinese tech companies. Many have pointed out how promoting “996 work culture” in the name of “devotion” and “hard work” is blatantly ignoring the consequences of such gladiatorial work environment on employees. Unbearable pressure and unrealistic demands from employees often lead to them being depressed, over-fatigued, burned out, and in worst cases, suicidal. For instance, during February 2015, Foxconn forced its employees to work overtime, which resulted in occasional karoshi (death due to job-related exhaustion) and suicide. Public reaction to Jack Ma’s approval of 996 work culture has been largely negative, with people condemning Ma for his utter ignorance towards the issue: https://twitter.com/lindydonna/status/1116870541967683584 https://twitter.com/limeytim/status/1116879825799610368 https://twitter.com/yanni02141/status/1117049673339015168 https://twitter.com/RogerDara/status/1117319069139656704 https://twitter.com/designer_dick/status/1117798646202941441 Chinese tech companies don’t want to hire employees over 30 years of age Alibaba’s Singles Day sale hit record $30 billion in 24 hours Alibaba Cloud released Mars, a tensor-based framework for large-scale data computation

0
0
13093

article-image-trending-datascience-news-25th-sept-17

Packt Editorial Staff

25 Sep 2017

4 min read

New quantum computing language, Intel's self-learning chip Loihi - 25th Sept’ 17 Headlines

Packt Editorial Staff

25 Sep 2017

4 min read

A new quantum computing language, a self-learning chip called Loihi, and more in today's data science news. Microsoft gifts developers a programming language for quantum computing Considered the next wave of computing revolution, Quantum Computing became one step closer to reality. On day 1 of the ongoing Ignite conference, Microsoft CEO Satya Nadella announced the launch of a programming language toolkit for quantum computing. The programming language will be "deeply integrated into Visual Studio," the tech giant said. The system includes a number of tutorials and libraries to help developers experiment with this new paradigm.The programming language itself has elements of C#, Python, and F# along with new features specific to quantum computing. While developers can use this language on classical computers to try their hand at developing quantum apps, in future, they will be writing programs that actually run on topological quantum computers, Microsoft added. [box type="info" align="alignleft" class="" width=""]Fun Fact: Qubit, the binary bit equivalent for quantum computers, is basically a very small particle that exists in a state of uncertainty (1 and 0) until it exists in a state of certainty (1 or 0).[/box] DEEP LEARNING IN NEWS Intel launches a self-learning chip called Loihi In what could redefine artificial intelligence, Intel Labs has developed a self-learning neuromorphic chip codenamed “Loihi”. It imitates how the human brain operates based on the different types of feedback it receives from its environment. This energy-efficient chip has been developed adopting a first-of-its-kind approach to computing with asynchronous spiking, and it does not require traditional training to learn and make inferences. Just like the brain, Loihi uses the data to understand and then make its inference. What’s more, it also gets smarter over time based on the inferences, Intel said. [box type="info" align="alignleft" class="" width=""]Fun Fact: Loihi (meaning ‘long’ or ‘tall’ in Hawaiian) is an active submarine volcano in the Hawaiian chain.[/box] NVIDIA releases TensorRT 3 AI Inference Software NVIDIA developers can now avail TensorRT 3 release candidate. Claimed to be 40x faster than CPUs at one-tenth of the cost, TensorRT 3 comes with easy to use Python API with improved performance, the company said on its official website. It can deploy TensorFlow models 18 times faster than the TensorFlow framework inference on Tesla V100. DATABASES IN NEWS Microsoft launches SQL Server 2017 that supports Linux On day 1 of the ongoing Ignite conference, tech giant Microsoft announced the general availability of SQL Server 2017 which will run on both windows as well as Linux platforms. This is seems by many as a significant move by Microsoft towards open source. “SQL Server on Linux is an engineering feat,” Microsoft Principal Program Manager Travis Wright said, "The database engine binaries you install on Windows and Linux are literally the same exact files down to the byte. I can attest that even features like Active Directory authentication, backup, and restore all work just the same as on Windows.” The news comes only a year after Microsoft released SQL Server 2016, and the price and licensing model stay exactly the same. SQL Server 2017 will have several enhanced features such as automatic tuning and the much-needed graph databases. There is also an added support for Python programming language which is great news for data science professionals. Oracle makes available MySQL 8.0 first release candidate MySQL, the popular open-source RDBMS, may get a makeover. Oracle, which gained MySQL platform after acquiring Sun Microsystems in 2010, has been stressing on a “mobile-first” approach for modern applications. The new release candidate for MySQL 8.0 which has features like improved JSON support and Unicode 9.0 support among others has been built keeping in mind the requirements for most modern apps. Developers can test the MySQL 8.0 RC1 after downloading the source code from GitHub.

0
0
13084

Tech News - Data

Cloudflare raises $150M with Franklin Templeton leading the latest round of funding

Introducing Google's Tangent: A Python library with a difference

AWS introduces Amazon DocumentDB featuring compatibility with MongoDB, scalability and much more

OpenAI announces Block sparse GPU kernels for accelerating neural networks

Linux Foundation introduces strict telemetry data collection and usage policy for all its projects

Facebook introduces a fully convolutional speech recognition approach and open sources wav2letter++ and flashlight

Introducing Amazon Neptune: A graph database service for your applications

Anaconda Enterprise version 5.1.1 released!

Facebook is investigating data analytics firm Crimson Hexagon over misuse of data

What’s new in Jupyter Notebook 5.3.0

Trending Topics

TensorFlow announces TensorFlow Data Validation (TFDV) to automate and scale data analysis, validation, and monitoring

Apple’s CEO, Tim Cook calls for new federal privacy law while attacking the ‘shadow economy’ in an interview with TIME

NVIDIA makes its new “brain for autonomous AI machines”, Jetson AGX Xavier Module, available for purchase

Jack Ma defends the extreme “996 work culture” in Chinese tech firms

New quantum computing language, Intel's self-learning chip Loihi - 25th Sept’ 17 Headlines

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access