Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-how-serverless-computing-is-making-ai-development-easier

12 Sep 2018

5 min read

How Serverless computing is making AI development easier

12 Sep 2018

AI has been around for quite some time, enabling developers to build intelligent apps that cater to the needs of their users. Not only app developers, businesses are also using AI to gain insights from their data such as their customers’ buying behaviours, the busiest time of the year, and so on. While AI is all cool and fascinating, developing an AI-powered app is not that easy. Developers and data scientists have to invest a lot of their time in collecting and preparing the data, building and training the model, and finally deploying it in production. Machine learning, which is a subset of AI, feels difficult because the traditional development process is complicated and slow. While creating machine learning models we need different tools for different functionalities, which means we should have knowledge of them all. This is certainly not practical. The following factors make the current situation even more difficult: Scaling the inferencing logic Addressing continuous development Making it highly available Deployment Testing Operation This is where serverless computing comes into picture. Let’s dive into what exactly serverless computing is and how it can help in easing AI development. What is serverless computing? Serverless computing is the concept of building and running applications in which the computing resources are provided as scalable cloud services. It is a deployment model where applications, as bundle of functions, are uploaded to a cloud platform and then executed. Serverless computing does not mean that servers are no longer required to host and run code. Of course we need servers, but server management for the applications is taken care of by the cloud provider. This also does not implies that operations engineers are no longer required. In fact, it means that with serverless computing, consumers no longer need to spend time and resources on server provisioning, maintenance, updates, scaling, and capacity planning. Instead, all of these tasks and capabilities are handled by a serverless platform and are completely abstracted away from the developers and IT/operations teams. This allows developers to focus on writing their business logic and operations engineers to elevate their focus to more business critical tasks. Serverless computing is the union of two ideas: Backend as a Service (BaaS): BaaS provides developers a way to link their application with third-party backend cloud storage. It includes services such as, authentication, access to database, and messaging, which are supplied through physical or virtual servers located in the cloud. Function as a Service (FaaS): FaaS allows users to run a specific task or function remotely and after the function is complete, the function results return back to the user. The applications run in stateless compute containers that are event-triggered and fully managed by a third party. AWS Lambda, Google Cloud Function, Azure Functions, and IBM Cloud Functions, are some of the serverless computing providers which enable us to upload a function and the rest is taken care for us automatically. Read also: Modern Cloud Native architectures: Microservices, Containers, and Serverless – Part 2 Why serverless is a good choice for AI development? Along with the obvious advantage of hassle free server management, let’s see what else it has to offer for your artificial intelligence project development: Focus on core tasks Managing servers and deploying a machine learning model is not a good skill match for a data scientist or even for a machine learning engineer. With serverless computing, servers will conveniently vanish from your development and deployment workflow. Auto-scalability This is one of the key benefits of using serverless computing. As long as your model is correctly deployed on the serverless platform, you don’t have to worry about making it scale when your workload raises. Serverless computing gives all businesses, big and small, the ability to use what they need and scale without worrying about complex and time-consuming data migrations. Never pay for idle In traditional application deployment models, users need to pay a fixed and recurring cost for compute resources, regardless of the amount of computing work that is actually being performed by the server. In serverless computing deployment, you only have to pay for service usage. You are only charged for the number of executions and the corresponding duration. Reduces interdependence You can think of machine learning models as functions in serverless, which can be invoked, updated, and deleted. You can do this any time without having any side effect on the rest of the system. Different teams can work independently to develop, deploy, and scale their microservices. This greatly simplifies the orchestration of timelines by Product and Dev Managers. Abstraction from the users Your machine learning model will be exposed as a service to the users with the help of API Gateway. This makes it easier to decentralize your backend, isolate failure on a per-model level, and hide every implementation details from the final user. High availability Serverless applications have built-in availability and fault tolerance. You don't need to architect for these capabilities since the services running the application provide them by default. Serverless computing can facilitate a simpler approach to artificial intelligence by removing the baggage of server maintenance from developers and data scientists. But nothing is perfect, right? It also comes with some drawbacks, number one being, vendor lock-in. Serverless features varies from one vendor to another, which makes it difficult to switch vendors. Another disadvantage is decreased transparency. Your infrastructure is managed by someone else, so understanding the entire system becomes a little bit difficult. Serverless is not an answer to every problem but it is definitely improving each day making AI development easier. What’s new in Google Cloud Functions serverless platform Serverless computing wars: AWS Lambdas vs Azure Functions Google’s event-driven serverless platform, Cloud Function, is now generally available

0
0
36512

article-image-tools-for-reinforcement-learning

Pravin Dhandre

21 May 2018

4 min read

Top 5 tools for reinforcement learning

Pravin Dhandre

21 May 2018

4 min read

0
0
36351

article-image-5-javascript-machine-learning-libraries-you-need-to-know

Pravin Dhandre

08 Jun 2018

3 min read

5 JavaScript machine learning libraries you need to know

Pravin Dhandre

08 Jun 2018

3 min read

Technologies like machine learning, predictive analytics, natural language processing and artificial intelligence are the most trending and innovative technologies of 21st century. Whether it is an enterprise software or a simple photo editing application, they all are backed and rooted in machine learning technology making them smart enough to be a friend to humans. Until now, the tools and frameworks that were capable of running machine learning were majorly developed in languages like Python, R and Java. However, recently the web ecosystem has picked up machine learning into its fold and is achieving transformation in web applications. Today in this article, we will look at the most useful and popular libraries to perform machine learning in your browser without the need of softwares, compilers, installations and GPUs. TensorFlow.js GitHub: 7.5k+ stars With the growing popularity of TensorFlow among machine learning and deep learning enthusiasts, Google recently released TensorFlowjs, the JavaScript version of TensorFlow. With this library, JavaScript developers can train and deploy their machine learning models faster in browser without much hassle. This library is speedy, tensile, scalable and a great start to practically experience the taste of machine learning. With TensorFlow.js, importing existing models and retraining pretrained model is a piece of cake. To check out examples on tensorflow.js, visit GitHub repository. ConvNetJS GitHub: 9k+ stars ConvNetJS provides neural networks implementation in JavaScript with numerous demos of neural networks available on GitHub repository. The framework has a good number of active followers who are programmers and coders. The library provides support to various neural network modules, and popular machine learning techniques like Classification and Regression. Developers who are interested in getting reinforcement learning onto the browser or in training complex convolutional networks, can visit the ConvNetJS official page. Brain.js GitHub: 8k+ stars Brain.js is another addition to the web development ecosystem that brings smart features onto the browser with just a few lines of code. Using Brain.js, one can easily create simple neural networks and can develop smart functionality in their browser applications without much of the complexity. It is already preferred by web developers for client side applications like in-browser games or placement of Ads, or for character recognition. You can checkout its GitHub repository to see a complete demonstration of approximating XOR function using brain.js. Synaptic GitHub: 6k+ stars Synaptic is a well-liked machine learning library for training recurrent neural networks as it has in-built architecture-free generalized algorithm. Few of the in-built architectures include multilayer perceptrons, LSTM networks and Hopfield networks. With Synaptic, you can develop various in-browser applications such as Paint an Image, Learn Image Filters, Self-Organizing Map or Reading from Wikipedia. Neurojs GitHub: 4k+ stars Another recently developed framework especially for reinforcement learning tasks in your browser, is neurojs. It mainly focuses on Q-learning, but can be used for any type of neural network based task whether it is for building a browser game or an autonomous driving application. Some of the exciting features this library has to offer are full-stack neural network implementation, extended support to reinforcement learning tasks, import/export of weight configurations and many more. To see the complete list of features, visit the GitHub page. How should web developers learn machine learning? NVIDIA open sources NVVL, library for machine learning training Build a foodie bot with JavaScript

0
0
35939

article-image-two-popular-data-analytics-methodologies-every-data-professional-should-know-tdsp-crisp-dm

Amarabha Banerjee

21 Dec 2017

7 min read

Two popular Data Analytics methodologies every data professional should know: TDSP & CRISP-DM

Amarabha Banerjee

21 Dec 2017

7 min read

0
0
35288

article-image-best-machine-learning-datasets-for-beginners

Natasha Mathur

19 Sep 2018

13 min read

Best Machine Learning Datasets for beginners

Natasha Mathur

19 Sep 2018

13 min read

“It’s not who has the best algorithm that wins. It’s who has the most data” ~ Andrew Ng If you would look at the way algorithms were trained in Machine Learning, five or ten years ago, you would notice one huge difference. Training algorithms in Machine Learning are much better and efficient today than it used to be a few years ago. All credit goes to the hefty amount of data that is available to us today. But, how does Machine Learning make use of this data? Let’s have a look at the definition of Machine Learning. “Machine Learning provides computers or machines the ability to automatically learn from experience without being explicitly programmed”. Machines “learn from experience” when they’re trained, this is where data comes into the picture. How’re they trained? Datasets! This is why it is so crucial that you feed these machines with the right data for whatever problem it is that you want these machines to solve. Why datasets matter in Machine Learning? The simple answer is because Machines too like humans are capable of learning once they see relevant data. But where they vary from humans is the amount of data they need to learn from. You need to feed your machines with enough data in order for them to do anything useful for you. This why Machines are trained using massive datasets. We can think of machine learning data like a survey data, meaning the larger and more complete your sample data size is, the more reliable your conclusions will be. If the data sample isn’t large enough then it won’t be able to capture all the variations making your machine reach inaccurate conclusions, learn patterns that don’t really exist, or not recognize patterns that do. Datasets help bring the data to you. Datasets train the model for performing various actions. They model the algorithms to uncover relationships, detect patterns, understand complex problems as well as make decisions. Apart from using datasets, it is equally important to make sure that you are using the right dataset, which is in a useful format and comprises all the meaningful features, and variations. After all, the system will ultimately do what it learns from the data. Feeding right data into your machines also assures that the machine will work effectively and produce accurate results without any human interference required. For instance, training a speech recognition system with a textbook English dataset will result in your machine struggling to understand anything but textbook English. So, any loose grammar, foreign accents, or speech disorders would get missed out. For such a system, using a dataset comprising all the infinite variations in a spoken language among speakers of different genders, ages, and dialects would be a right option. So keep in mind that it is important that the quality, variety, and quantity of your training data is not compromised as all these factors help determine the success of your machine learning models. Top Machine Learning Datasets for Beginners Now, there are a lot of datasets available today for use in your ML applications. It can be confusing, especially for a beginner to determine which dataset is the right one for your project. It is better to use a dataset which can be downloaded quickly and doesn’t take much to adapt to the models. Further, always use standard datasets that are well understood and widely used. This lets you compare your results with others who have used the same dataset to see if you are making progress. You can pick the dataset you want to use depending on the type of your Machine Learning application. Here’s a rundown of easy and the most commonly used datasets available for training Machine Learning applications across popular problem areas from image processing to video analysis to text recognition to autonomous systems. Image Processing There are many image datasets to choose from depending on what it is that you want your application to do. Image processing in Machine Learning is used to train the Machine to process the images to extract useful information from it. For instance, if you’re working on a basic facial recognition application then you can train it using a dataset that has thousands of images of human faces. This is how Facebook knows people in group pictures. This is also how image search works in Google and in other visual search based product sites. Dataset Name Brief Description 10k US Adult Faces Database This database consists of 10,168 natural face photographs and several measures for 2,222 of the faces, including memorability scores, computer vision, and psychological attributes. The face images are JPEGs with 72 pixels/in resolution and 256-pixel height. Google's Open Images Open Images is a dataset of 9 million URLs to images which have been annotated with labels spanning over 6000 categories. These labels cover more real-life entities and the images are listed as having a Creative Commons Attribution license. Visual Genome This is a dataset of over 100k images densely annotated with numerous region descriptions ( girl feeding elephant), objects (elephants), attributes(large), and relationships (feeding). Labeled Faces in the Wild This database comprises more than 13,000 images of faces collected from the web. Each face is labeled with the name of the person pictured. Fun and easy ML application ideas for beginners using image datasets: Cat vs Dogs: Using Cat and Stanford Dogs dataset to classify whether an image contains a dog or a cat. Iris Flower classification: You can build an ML project using Iris flower dataset where you classify the flowers in any of the three species. What you learn from this toy project will help you learn to classify physical attributes based content to build some fun real-world projects like fraud detection, criminal identification, pain management ( eg; ePAT which detects facial hints of pain using facial recognition technology), and so on. Hot dog - Not hot dog: Use the Food 101 dataset, to distinguish different food types as a hot dog or not. Who knows, you could end up becoming the next Emmy award nominee! Sentiment Analysis As a beginner, you can create some really fun applications using Sentiment Analysis dataset. Sentiment Analysis in Machine Learning applications is used to train machines to analyze and predict the emotion or sentiment associated with a sentence, word, or a piece of text. This is used in movie or product reviews often. If you are creative enough, you could even identify topics that will generate the most discussions using sentiment analysis as a key tool. Dataset Name Brief Description Sentiment140 A popular dataset, which uses 160,000 tweets with emoticons pre-removed Yelp Reviews An open dataset released by Yelp, contains more than 5 million reviews on Restaurants, Shopping, Nightlife, Food, Entertainment, etc. Twitter US Airline Sentiment Twitter data on US airlines starting from February 2015, labeled as positive, negative, and neutral tweets. Amazon reviews This dataset contains over 35 million reviews from Amazon spanning 18 years. Data include information on products, user ratings, and the plaintext review. Easy and Fun Application ideas using Sentiment Analysis Dataset: Positive or Negative: Using Sentiment140 dataset in a model to classify whether given tweets are negative or positive. Happy or unhappy: Using Yelp Reviews dataset in your project to help machine figure out whether the person posting the review is happy or unhappy. Good or Bad: Using Amazon Reviews dataset, you can train a machine to figure out whether a given review is good or bad. Natural Language Processing Natural language processing deals with training machines to process and analyze large amounts of natural language data. This is how search engines like Google know what you are looking for when you type in your search query. Use these datasets to make a basic and fun NLP application in Machine Learning: Dataset Name Brief Description Speech Accent Archive This dataset comprises 2140 speech samples from different talkers reading the same reading passage. These Talkers come from 177 countries and have 214 different native languages. Each talker is speaking in English. Wikipedia Links data This dataset consists of almost 1.9 billion words from more than 4 million articles. Search is possible by word, phrase or part of a paragraph itself. Blogger Corpus A dataset comprising 681,288 blog posts gathered from blogger.com. Each blog consists of minimum 200 occurrences of commonly used English words. Fun Application ideas using NLP datasets: Spam or not: Using Spambase dataset, you can enable your application to figure out whether a given email is spam or not. Video Processing Video Processing datasets are used to teach machines to analyze and detect different settings, objects, emotions, or actions and interactions in videos. You’ll have to feed your machine with a lot of data on different actions, objects, and activities. Dataset Name Brief Description UCF101 - Action Recognition Data Set This dataset comes with 13,320 videos from 101 action categories. Youtube 8M YouTube-8M is a large-scale labeled video dataset. It contains millions of YouTube video IDs, with high-quality machine-generated annotations from a diverse vocabulary of 3,800+ visual entities. Fun Application ideas using video processing dataset: Action detection: Using UCF101 - Action Recognition DataSet, or Youtube 8M, you can train your application to detect the actions such as walking, running etc, in a video. Speech Recognition Speech recognition is the ability of a machine to analyze or identify words and phrases in a spoken language. Feed your machine with the right and good amount of data, and it will help it in the process of recognizing speech. Combine speech recognition with natural language processing, and get Alexa who knows what you need. Dataset Name Brief Description Gender Recognition by Voice and speech analysis This database identifies a voice as male or female, depending on the acoustic properties of voice and speech. The dataset contains 3,168 recorded voice samples, collected from male and female speakers. Human Activity Recognition w/Smartphone Human Activity Recognition database consists of recordings of 30 subjects performing activities of daily living (ADL) while carrying a smartphone ( Samsung Galaxy S2 ) on the waist. TIMIT TIMIT provides speech data for acoustic-phonetic studies and for the development of automatic speech recognition systems. It comprises broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences, phonetic and word transcriptions. Speech Accent Archive This dataset contains 2140 speech samples, each from a different talker reading the same reading passage. Talkers come from 177 countries and have 214 different native languages. Each talker is speaking in English. Fun Application ideas using Speech Recognition dataset: Accent detection: Use Speech Accent Archive dataset, to make your application identify different accents from a given sample of accents. Identify the activity: Use Human Activity Recognition w/Smartphone dataset to help your application detect the human activity. Natural Language Generation Natural Language generation refers to the ability of machines to simulate the human speech. It can be used to translate written information into aural information or assist the vision-impaired by reading out aloud the contents of a display screen. This is how Alexa or Siri respond to you. Dataset Name Brief Description Common Voice by Mozilla Common Voice dataset contains speech data read by users on the Common Voice website from a number of public sources like user-submitted blog posts, old books, movies, etc. LibriSpeech This dataset consists of nearly 500 hours of clean speech of various audiobooks read by multiple speakers, organized by chapters of the book with both the text and the speech. Fun Application ideas using Natural Language Generation dataset: Converting text into Audio: Using Blogger Corpus dataset, you can train your application to read out loud the posts on blogger. Autonomous Driving Build some basic self-driving Machine Learning Applications. These Self-driving datasets will help you train your machine to sense its environment and navigate accordingly without any human interference. Autonomous cars, drones, warehouse robots, and others use these algorithms to navigate correctly and safely in the real world. Datasets are even more important here as the stakes are higher and the cost of a mistake could be a human life. Dataset Name Brief Description Berkeley DeepDrive BDD100k This is one of the largest datasets for self-driving AI currently. It comprises over 100,000 videos of over 1,100-hour driving experiences across different times of the day and weather conditions. Baidu Apolloscapes Large dataset consisting of 26 different semantic items such as cars, bicycles, pedestrians, buildings, street lights, etc. Comma.ai This dataset consists of more than 7 hours of highway driving. It includes details on car’s speed, acceleration, steering angle, and GPS coordinates. Cityscape Dataset This is a large dataset that contains recordings of urban street scenes in 50 different cities. nuScenes This dataset consists of more than 1000 scenes with around 1.4 million image, 400,000 sweeps of lidars (laser-based systems that detect the distance between objects), and 1.1 million 3D bounding boxes ( detects objects with a combination of RGB cameras, radar, and lidar). Fun Application ideas using Autonomous Driving dataset: A basic self-driving application: Use any of the self-driving datasets mentioned above to train your application with different driving experiences for different times and weather conditions. IoT Machine Learning in building IoT applications is on the rise these days. Now, as a beginner in Machine Learning, you may not have advanced knowledge on how to build these high-performance IoT applications using Machine Learning, but you certainly can start off with some basic datasets to explore this exciting space. Dataset Name Brief Description Wayfinding, Path Planning, and Navigation Dataset This dataset consists of samples of trajectories in an indoor building (Waldo Library at Western Michigan University) for navigation and wayfinding applications. ARAS Human Activity Dataset This dataset is a Human activity recognition Dataset collected from two real houses. It involves over 26 millions of sensor readings and over 3000 activity occurrences. Fun Application ideas using IoT dataset: Wearable device to track human activity: Use the ARAS Human Activity Dataset to train a wearable device to identify human activity. Read Also: 25 Datasets for Deep Learning in IoT Once you’re done going through this list, it’s important to not feel restricted. These are not the only datasets which you can use in your Machine Learning Applications. You can find a lot many online which might work best for the type of Machine Learning Project that you’re working on. Some popular sources of a wide range of datasets are Kaggle, UCI Machine Learning Repository, KDnuggets, Awesome Public Datasets, and Reddit Datasets Subreddit. With all this information, it is now time to use these datasets in your project. In case you’re completely new to Machine Learning, you will find reading, ‘A nonprogrammer’s guide to learning Machine learning’quite helpful. Regardless of whether you’re a beginner or not, always remember to pick a dataset which is widely used, and can be downloaded quickly from a reliable source. How to create and prepare your first dataset in Salesforce Einstein Google launches a Dataset Search Engine for finding Datasets on the Internet Why learn machine learning as a non-techie?

0
0
35243

article-image-how-rolls-royce-is-applying-ai-and-robotics-for-smart-engine-maintenance

Sugandha Lahoti

20 Jul 2018

5 min read

How Rolls Royce is applying AI and robotics for smart engine maintenance

Sugandha Lahoti

20 Jul 2018

5 min read

Rolls Royce has been working in the civil aviation domain for quite some time now, to build what they call as ‘intelligent engines’. The IntelligentEngine vision was first announced at the Singapore Airshow in February 2018. The idea was built around how robotics could be used to revolutionise the future of engine maintenance. Rolls Royce aims to build engines which are: Connected, using cloud based nodes and IoT devices with other engines of the fleet, as well as with the customers and operators. Contextually aware, of its operations, constraints, and customers, with modern data analysis and big data mining techniques. Comprehending, of its own experiences and other engines in the fleet using state-of-the-art machine learning and recommendation algorithms. The company has been demonstrating steady progress and showing off their rapidly developing digital capabilities. Using tiny SWARM robots for engine maintenance Their latest inventions are, tiny roach-sized ‘SWARM’ robots, capable of crawling inside airplane engines and fix them. They look like they’ve just crawled straight out of a Transformers movie. This small robot, almost 10mm in size can perform a visual inspection of hard to reach airplane engine parts. The devices will be mounted with tiny cameras providing a live video feed to allow engineers to see what’s going on inside an engine without having to take it apart. These swarm robots will be deposited on the engine via another invention, the ‘snake’ robots. Officially called FLARE, these snake robots are flexible enough to travel through an engine, like an endoscope. Source Another group of robots, the INSPECT robots is a network of periscopes permanently embedded within the engine. These bots can inspect engines using periscope cameras to spot and report any maintenance requirements. Current prototypes of these bots are much larger than the desired size and not quite ready for intricate repairs. They may be production ready in almost two years. Reducing flight delays with data analysis R2 Data Labs (Rolls Royce data science department) offers technical insight capabilities to their Airline Support Teams (ASTs). ASTs generally assess incident reports, submitted after disruption events or maintenance is undertaken. The Technical Insight platform will help ASTs easily capture, categorize and collate report data in a single place. This platform builds a bank of high-quality data (almost 10 times the size of the database ASTs had access to previously), and then analyze it to identify trends and common issues for more insightful analytics. The technical insight platform has so far shown positive results and has been critical to achieving the company’s IntelligentEngine vision. According to their blog, it was able to avoid delays and cancellations in a particular operator’s 757 fleet by 30%, worth £1.5m per year. The social network for engines In May 2018, the company launched an engine network app. This app was designed to bring all of the engine data under a single hood, much like how Facebook brings all your friends on a single platform. In this app, all the crucial information regarding all the engines in a fleet is available in a single place. Much like Facebook, each engine has a ‘profile’, which shows data on how it’s been operated, the aircraft it has been paired with, the parts it contains, and how much service life is left in each component. It also has a ‘Timeline’ which shows the complete story of the engine’s operational history. In fact, you also have a ‘newsfeed’ to display the most important insights from across the fleet. Source The engine also has an in-built recommendation algorithm which suggests future maintenance work for individual engines, based on what it learns from other similar engines in the fleet. As Juan Carlos Cabrejas, Technical Product Manager, R2 Data Labs writes, “This capability is essential to our IntelligentEngine vision, as it underpins our ability to build a frictionless data ecosystem across our fleets.” Transforming Engine Health Management Rolls-Royce is taking Engine Health Management (EHM) to a new level of connectivity. Their latest EHM system can measure thousands of parameters and monitor entirely new parts of the engine. And interestingly, the EHM has a ‘talk back’ feature. An operational center can ask the system to focus on one particular part or parameter of the engine. The system listens and responds back with hundreds of hours of information specifically tailored to that request. Axel Voege, Rolls-Royce, Head of Digital Operations, Germany, says” By getting that greater level of detail, instantly, our engineering teams can work out a solution much more quickly.” This new system will go into service next year making it their most IntelligentEngine yet. As IntelligentEngine makes rapid progress, the company sees itself designing, testing, and managing engines entirely through their digital twin in the near future. You can read more about the IntelligentEngine vision and other stories to discover new products and updates at the Rolls Royce site. Unity announces a new automotive division and two-day Unity AutoTech Summit Apollo 11 source code: A small step for a woman, and a huge leap for ‘software engineering’

0
0
35109

article-image-7-ai-tools-mobile-developers-need-to-know

Bhagyashree R

20 Sep 2018

11 min read

7 AI tools mobile developers need to know

Bhagyashree R

20 Sep 2018

11 min read

Advancements in artificial intelligence (AI) and machine learning has enabled the evolution of mobile applications that we see today. With AI, apps are now capable of recognizing speech, images, and gestures, and translate voices with extraordinary success rates. With a number of apps hitting the app stores, it is crucial that they stand apart from competitors by meeting the rising standards of consumers. To stay relevant it is important that mobile developers keep up with these advancements in artificial intelligence. As AI and machine learning become increasingly popular, there is a growing selection of tools and software available for developers to build their apps with. These cloud-based and device-based artificial intelligence tools provide developers a way to power their apps with unique features. In this article, we will look at some of these tools and how app developers are using them in their apps. Caffe2 - A flexible deep learning framework Source: Qualcomm Caffe2 is a lightweight, modular, scalable deep learning framework developed by Facebook. It is a successor of Caffe, a project started at the University of California, Berkeley. It is primarily built for production use cases and mobile development and offers developers greater flexibility for building high-performance products. Caffe2 aims to provide an easy way to experiment with deep learning and leverage community contributions of new models and algorithms. It is cross-platform and integrates with Visual Studio, Android Studio, and Xcode for mobile development. Its core C++ libraries provide speed and portability, while its Python and C++ APIs make it easy for you to prototype, train, and deploy your models. It utilizes GPUs when they are available. It is fine-tuned to take full advantage of the NVIDIA GPU deep learning platform. To deliver high performance, Caffe2 uses some of the deep learning SDK libraries by NVIDIA such as cuDNN, cuBLAS, and NCCL. Functionalities Enable automation Image processing Perform object detection Statistical and mathematical operations Supports distributed training enabling quick scaling up or down Applications Facebook is using Caffe2 to help their developers and researchers train large machine learning models and deliver AI on mobile devices. Using Caffe2, they significantly improved the efficiency and quality of machine translation systems. As a result, all machine translation models at Facebook have been transitioned from phrase-based systems to neural models for all languages. OpenCV - Give the power of vision to your apps Source: AndroidPub OpenCV short for Open Source Computer Vision Library is a collection of programming functions for real-time computer vision and machine learning. It has C++, Python, and Java interfaces and supports Windows, Linux, Mac OS, iOS and Android. It also supports the deep learning frameworks TensorFlow and PyTorch. Written natively in C/C++, the library can take advantage of multi-core processing. OpenCV aims to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products. The library consists of more than 2500 optimized algorithms including both classic and state-of-the-art computer vision algorithms. Functionalities These algorithms can be used for the following: To detect and recognize faces Identify objects Classify human actions in videos Track camera movements and moving objects Extract 3D models of objects Produce 3D point clouds from stereo cameras Stitch images together to produce a high-resolution image of an entire scene Find similar images from an image database Applications Plickers is an assessment tool, that lets you poll your class for free, without the need for student devices. It uses OpenCV as its graphics and video SDK. You just have to give each student a card called a paper clicker, and use your iPhone/iPad to scan them to do instant checks-for-understanding, exit tickets, and impromptu polls. Also check out FastCV BoofCV TensorFlow Lite and Mobile - An Open Source Machine Learning Framework for Everyone Source: YouTube TensorFlow is an open source software library for building machine learning models. Its flexible architecture allows easy model deployment across a variety of platforms ranging from desktops to mobile and edge devices. Currently, TensorFlow provides two solutions for deploying machine learning models on mobile devices: TensorFlow Mobile and TensorFlow Lite. TensorFlow Lite is an improved version of TensorFlow Mobile, offering better performance and smaller app size. Additionally, it has very few dependencies as compared to TensorFlow Mobile, so it can be built and hosted on simpler, more constrained device scenarios. TensorFlow Lite also supports hardware acceleration with the Android Neural Networks API. But the catch here is that TensorFlow Lite is currently in developer preview and only has coverage to a limited set of operators. So, to develop production-ready mobile TensorFlow apps, it is recommended to use TensorFlow Mobile. Also, TensorFlow Mobile supports customization to add new operators not supported by TensorFlow Mobile by default, which is a requirement for most of the models of different AI apps. Although TensorFlow Lite is in developer preview, its future releases “will greatly simplify the developer experience of targeting a model for small devices”. It is also likely to replace TensorFlow Mobile, or at least overcome its current limitations. Functionalities Speech recognition Image recognition Object localization Gesture recognition Optical character recognition Translation Text classification Voice synthesis Applications The Alibaba tech team is using TensorFlow Lite to implement and optimize speaker recognition on the client side. This addresses many of the common issues of the server-side model, such as poor network connectivity, extended latency, and poor user experience. Google uses TensorFlow for advanced machine learning models including Google Translate and RankBrain. Core ML - Integrate machine learning in your iOS apps Source: AppleToolBox Core ML is a machine learning framework which can be used to integrate machine learning model in your iOS apps. It supports Vision for image analysis, Natural Language for natural language processing, and GameplayKit for evaluating learned decision trees. Core ML is built on top of the following low-level APIs, providing a simple higher level abstraction to these: Accelerate optimizes large-scale mathematical computations and image calculations for high performance. Basic neural network subroutines (BNNS) provides a collection of functions using which you can implement and run neural networks trained with previously obtained data. Metal Performance Shaders is a collection of highly optimized compute and graphic shaders that are designed to integrate easily and efficiently into your Metal app. To train and deploy custom models you can also use the Create ML framework. It is a machine learning framework in Swift, which can be used to train models using native Apple technologies like Swift, Xcode, and Other Apple frameworks. Functionalities Face and face landmark detection Text detection Barcode recognition Image registration Language and script identification Design games with functional and reusable architecture Applications Lumina is a camera designed in Swift for easily integrating Core ML models - as well as image streaming, QR/Barcode detection, and many other features. ML Kit by Google - Seamlessly build machine learning into your apps Source: Google ML Kit is a cross-platform suite of machine learning tools for its Firebase mobile development platform. It comprises of Google's ML technologies, such as the Google Cloud Vision API, TensorFlow Lite, and the Android Neural Networks API together in a single SDK enabling you to apply ML techniques to your apps easily. You can leverage its ready-to-use APIs for common mobile use cases such as recognizing text, detecting faces, identifying landmarks, scanning barcodes, and labeling images. If these APIs don't cover your machine learning problem, you can use your own existing TensorFlow Lite models. You just have to upload your model on Firebase and ML Kit will take care of the hosting and serving. These APIs can run on-device or in the cloud. Its on-device APIs process your data quickly and work even when there’s no network connection. Its cloud-based APIs leverage the power of Google Cloud Platform's machine learning technology to give you an even higher level of accuracy. Functionalities Automate tedious data entry for credit cards, receipts, and business cards, or help organize photos. Extract text from documents, which you can use to increase accessibility or translate documents. Real-time face detection can be used in applications like video chat or games that respond to the player's expressions. Using image labeling you can add capabilities such as content moderation and automatic metadata generation. Applications A popular calorie counter app, Lose It! uses Google ML Kit Text Recognition API to quickly capture nutrition information to ensure it’s easy to record and extremely accurate. PicsArt uses ML Kit custom model APIs to provide TensorFlow–powered 1000+ effects to enable millions of users to create amazing images with their mobile phones. Dialogflow - Give users new ways to interact with your product Source: Medium Dialogflow is a Natural Language Understanding (NLU) platform that makes it easy for developers to design and integrate conversational user interfaces into mobile apps, web applications, devices, and bots. You can integrate it on Alexa, Cortana, Facebook Messenger, and other platforms your users are on. With Dialogflow you can build interfaces, such as chatbots and conversational IVR that enable natural and rich interactions between your users and your business. It provides this human-like interaction with the help of agents. Agents can understand the vast and varied nuances of human language and translate that to standard and structured meaning that your apps and services can understand. It comes in two types: Dialogflow Standard Edition and Dialogflow Enterprise Edition. Dialogflow Enterprise Edition users have access to Google Cloud Support and a service level agreement (SLA) for production deployments. Functionalities Provide customer support One-click integration on 14+ platforms Supports multilingual responses Improve NLU quality by training with negative examples Debug using more insights and diagnostics Applications Domino’s simplified the process of ordering pizza using Dialogflow’s conversational technology. Domino's leveraged large customer service knowledge and Dialogflow's NLU capabilities to build both simple customer interactions and increasingly complex ordering scenarios. Also check out Wit.ai Rasa NLU Microsoft Cognitive Services - Make your apps see, hear, speak, understand and interpret your user needs Source: Neel Bhatt Cognitive Services is a collection of APIs, SDKs, and services to enable developers easily add cognitive features to their applications such as emotion and video detection, facial, speech, and vision recognition, among others. You need not be an expert in data science to make your systems more intelligent and engaging. The pre-built services come with high-quality RESTful intelligent APIs for the following: Vision: Make your apps identify and analyze content within images and videos. Provides capabilities such as image classification, optical character recognition in images, face detection, person identification, and emotion identification. Speech: Integrate speech processing capabilities into your app or services such as text-to-speech, speech-to-text, speaker recognition, and speech translation. Language: Your application or service will understand the meaning of the unstructured text or the intent behind a speaker's utterances. It comes with capabilities such as text sentiment analysis, key phrase extraction, automated and customizable text translation. Knowledge: Create knowledge-rich resources that can be integrated into apps and services. It provides features such as QnA extraction from unstructured text, knowledge base creation from collections of Q&As, and semantic matching for knowledge bases. Search: Using Search API you can find exactly what you are looking for across billions of web pages. It provides features like ad-free, safe, location-aware web search, Bing visual search, custom search engine creation, and many more. Applications To safeguard against fraud, Uber uses the Face API, part of Microsoft Cognitive Services, to help ensure the driver using the app matches the account on file. Cardinal Blue developed an app called PicCollage, a popular mobile app that allows users to combine photos, videos, captions, stickers, and special effects to create unique collages. Also check out AWS machine learning services IBM Watson These were some of the tools that will help you integrate intelligence into your apps. These libraries make it easier to add capabilities like speech recognition, natural language processing, computer vision, and many others, giving users the wow moment of accomplishing something that wasn’t quite possible before. Along with choosing the right AI tool, you must also consider other factors that can affect your app performance. These factors include the accuracy of your machine learning model, which can be affected by bias and variance, using correct datasets for training, seamless user interaction, and resource-optimization, among others. While building any intelligent app it is also important to keep in mind that the AI in your app is solving a problem and it doesn’t exist because it is cool. Thinking from the user’s perspective will allow you to assess the importance of a particular problem. A great AI app will not just help users do something faster, but enable them to do something they couldn’t do before. With the growing popularity and the need to speed up the development of intelligent apps, many companies ranging from huge tech giants to startups are providing AI solutions. In the future we will definitely see more developer tools coming into the market, making AI in apps a norm. 6 most commonly used Java Machine learning libraries 5 ways artificial intelligence is upgrading software engineering Machine Learning as a Service (MLaaS): How Google Cloud Platform, Microsoft Azure, and AWS are democratizing Artificial Intelligence

0
0
34787

article-image-how-ai-is-going-to-transform-the-data-center

Melisha Dsouza

13 Sep 2018

7 min read

How AI is going to transform the Data Center

Melisha Dsouza

13 Sep 2018

7 min read

According to Gartner analyst Dave Cappuccio, 80% of enterprises will have shut down their traditional data centers by 2025, compared to just 10% today. The figures are fitting considering the host of problems faced by traditional data centers. The solution to all these problems lies right in front of us- Incorporating Intelligence in traditional data centers. To support this claim, Gartner also predicts that by 2020, more than 30 percent of data centers that fail to implement AI and Machine Learning will cease to be operationally and economically viable. Across the globe, Data science and AI are influencing the design and development of the modern data centers. With the surge in the amount of data everyday, traditional data centers will eventually get slow and result in an inefficient output. Utilizing AI in ingenious ways, data center operators can drive efficiencies up and costs down. A fitting example of this is the tier-two automated control system implemented at Google to cool its data centers autonomously. The system makes all the cooling-plant tweaks on its own, continuously, in real-time- thus saving up to 30% of the plant’s energy annually. Source: DataCenter Knowledge AI has enabled data center operators to add more workloads on the same physical silicon architecture. They can aggregate and analyze data quickly and generate productive outputs, which is specifically beneficial to companies that deal with immense amounts of data like hospitals, genomic systems, airports, and media companies. How is AI facilitating data centers Let's look at some of the ways that Intelligent data centers will serve as a solution to issues faced by traditionally operated data centers. #1 Energy Efficiency The Delta Airlines data center outage in 2016, that was attributed to electrical-system failure over a three day period, cost the airlines around $150 million, grounding about 2,000 flights. This situation could have been easily averted had the data centers used Machine Learning for their workings. As data centers get ever bigger, more complex and increasingly connected to the cloud, artificial intelligence is becoming an essential tool for keeping things from overheating and saving power at the same time. According to the Energy Department’s U.S. Data Center Energy Usage Report, the power usage of data centers in the United States has grown at about a 4 percent rate annually since 2010 and is expected to hit 73 billion kilowatt-hours by 2020, more than 1.8 percent of the country’s total electricity use. Data centers also contribute about 2 percent of the world’s greenhouse gas emissions, AI techniques can do a lot to make processes more efficient, more secure and less expensive. One of the keys to better efficiency is keeping things cool, a necessity in any area of computing. Google and DeepMind (Alphabet Inc.’s AI division)use of AI to directly control its data center has reduced energy use for cooling by about 30 percent. #2 Server optimization Data centers have to maintain physical servers and storage equipment. AI-based predictive analysis can help data centers distribute workloads across the many servers in the firm. Data center loads can become more predictable and more easily manageable. Latest load balancing tools with built-in AI capabilities are able to learn from past data and run load distribution more efficiently. Companies will be able to better track server performance, disk utilization, and network congestions. Optimizing server storage systems, finding possible fault points in the system, improve processing times and reducing risk factors will become faster. These will in turn facilitate maximum possible server optimization. #3 Failure prediction / troubleshooting Unplanned downtime in a datacenter can lead to money loss. Datacenter operators need to quickly identify the root case of the failure, so they can prioritize troubleshooting and get the datacenter up and running before any data loss or business impact take place. Self managing datacenters make use of AI based deep learning (DL) applications to predict failures ahead of time. Using ML based recommendation systems, appropriate fixes can be inferred upon the system in time. Take for instance the HPE artificial intelligence predictive engine that identifies and solves trouble in the data center. Signatures are built to identify other users that might be affected. Rules are then developed to instigate a solution, which can be automated. The AI-machine learning solution, can quickly interject through the entire system and stop others from inheriting the same issue. #4 Intelligent Monitoring and storing of Data Incorporating machine learning, AI can take over the mundane job of monitoring huge amounts of data and make IT professionals more efficient in terms of the quality of tasks they handle. Litbit has developed the first AI-powered, data center operator, Dac. It uses a human-to-machine learning interface that combines existing human employee knowledge with real-time data. Incorporating over 11,000 pieces of innate knowledge, Dac has the potential to hear when a machine is close to failing, feel vibration patterns that are bad for HDD I/O, and spot intruders. Dac is proof of how AI can help monitor networks efficiently. Along with monitoring of data, it is also necessary to be able to store vast amounts of data securely. AI holds the potential to make more intelligent decisions on - storage optimization or tiering. This will help transform storage management by learning IO patterns and data lifecycles, helping storage solutions etc. Mixed views on the future of AI in data centers? Let’s face the truth, the complexity that comes with huge masses of data is often difficult to handle. Humans ar not as scalable as an automated solution to handle data with precision and efficiency. Take Cisco’s M5 Unified Computing or HPE’s InfoSight as examples. They are trying to alleviate the fact that humans are increasingly unable to deal with the complexity of a modern data center. One of the consequences of using automated systems is that there is always a possibility of humans losing their jobs and being replaced by machines at varying degrees depending on the nature of job roles. AI is predicted to open its doors to robots and automated machines that will soon perform repetitive tasks in the datacenters. On the bright side, organizations could allow employees, freed from repetitive and mundane tasks, to invest their time in more productive, and creative aspects of running a data center. In addition to new jobs, the capital involved in setting up and maintaining a data center is huge. Now add AI to the Datacenter and you have to invest double or maybe triple the amount of money to keep everything running smoothly. Managing and storing all of the operational log data for analysis also comes with its own set of issues. The log data that acts as the input to these ML systems becomes a larger data set than the application data itself. Hence firms need a proper plan in place to manage all of this data. Embracing AI in data centers would mean greater financial benefits from the outset while attracting more customers. It would be interesting to see the turnout of tech companies following Google’s footsteps and implementing AI in their data centers. Tech companies should definitely watch this space to take their data center operations up a notch. 5 ways artificial intelligence is upgrading software engineering Intelligent Edge Analytics: 7 ways machine learning is driving edge computing adoption in 2018 15 millions jobs in Britain at stake with Artificial Intelligence robots set to replace humans at workforce

0
0
34511

article-image-nvidia-leads-the-ai-hardware-race-but-which-of-its-gpus-should-you-use-for-deep-learning

Prasad Ramesh

29 Aug 2018

8 min read

NVIDIA leads the AI hardware race. But which of its GPUs should you use for deep learning?

Prasad Ramesh

29 Aug 2018

8 min read

For readers who are new to deep learning and who might be wondering what a GPU is, let’s start there. To make it simple, consider deep learning as nothing more than a set of calculations - complex calculations, yes, but calculations nonetheless. To run these calculations, you need hardware. Ordinarily, you might just use a normal processor like the CPU inside your laptop. However, this isn’t powerful enough to process at the speed at which deep learning computations need to happen. GPUs, however, can. This is because while a conventional CPU has only a few complex cores, a GPU can have thousands of simple cores. With a GPU, training a deep learning data set can take just hours instead of days. However, although it’s clear that GPUs have significant advantages over CPUs, there is nevertheless a range of GPUs available, each having their own individual differences. Selecting one is ultimately a matter of knowing what your needs are. Let’s dig deeper and find out how to go about shopping for GPUs… What to look for before choosing a GPU? There are a few specifications to consider before picking a GPU. Memory bandwidth: This determines the capacity of a GPU to handle large amounts of data. It is the most important performance metric, as with faster memory bandwidth more data can be processed at higher speeds. Number of cores: This indicates how fast a GPU can process data. A large number of CUDA cores can handle large datasets well. CUDA cores are parallel processors similar to cores in a CPU but their number is in thousands and are not suited for complex calculations that a CPU core can perform. Memory size: For computer vision projects, it is crucial for memory size to be as much as you can afford. But with natural language processing, memory size does not play such an important role. Our pick of GPU devices to choose from The go to choice here is NVIDIA; they have standard libraries that make it simple to set things up. Other graphics cards are not very friendly in terms of the libraries supported for deep learning. NVIDIA CUDA Deep Neural Network library also has a good development community. “Is NVIDIA Unstoppable In AI?” -Forbes “Nvidia beats forecasts as sales of graphics chips for AI keep booming” -SiliconANGLE AMD GPUs are powerful too but lack library support to get things running smoothly. It would be really nice to see some AMD libraries being developed to break the monopoly and give more options to the consumers. NVIDIA RTX 2080 Ti: The RTX line of GPUs are to be released in September 2018. The RTX 2080 Ti will be twice as fast as the 1080 Ti. Price listed on NVIDIA website for founder’s edition is $1,199. RAM: 11 GB Memory bandwidth: 616 GBs/second Cores: 4352 cores @ 1545 MHz NVIDIA RTX 2080: This is more cost efficient than the 2080 Ti at a listed price of $799 on NVIDIA website for the founder’s edition. RAM: 8 GB Memory bandwidth: 448 GBs/second Cores: 2944 cores @ 1710 MHz NVIDIA RTX 2070: This is more cost efficient than the 2080 Ti at a listed price of $599 on NVIDIA website. Note that the other versions of the RTX cards will likely be cheaper than the founder’s edition around a $100 difference. RAM: 8 GB Memory bandwidth: 448 GBs/second Cores: 2304 cores @ 1620 MHz NVIDIA GTX 1080 Ti: Priced at $650 on Amazon. This is a higher end option but offers great value for money, and can also do well in Kaggle competitions. If you need more memory but cannot afford the RTX 2080 Ti go for this. RAM: 11 GB Memory bandwidth: 484 GBs/second Cores: 3584 cores @ 1582 MHz NVIDIA GTX 1080: Priced at $584 on Amazon. This is a mid-high end option only slightly behind the 1080Ti. VRAM: 8 GB Memory bandwidth: 320 GBs/second Processing power: 2560 cores @ 1733 MHz NVIDIA GTX 1070 Ti: Priced at around $450 on Amazon. This is slightly less performant than the GTX 1080 but $100 cheaper. VRAM: 8 GB Memory bandwidth: 256 GBs/second Processing power: 2438 cores @ 1683 MHz NVIDIA GTX 1070: Priced at $380 on Amazon is currently the bestseller because of crypto miners. Somewhat slower than the 1080 GPUs but cheaper. VRAM: 8 GB Memory bandwidth: 256 GBs/second Processing power: 1920 cores @ 1683 MHz NVIDIA GTX 1060 6GB: Priced at around $290 on Amazon. Pretty cheap but the 6 GB VRAM limits you. Should be good for NLP but you’ll find the performance lacking in computer vision. VRAM: 6 GB Memory bandwidth: 216 GBs/second Processing power: 1280 cores @ 1708 MHz NVIDIA GTX 1050 Ti: Priced at around $200 on Amazon. This is the cheapest workable option. Good to get started with deep learning and explore if you’re new. VRAM: 4 GB Memory bandwidth: 112 GBs/second Processing power: 768 cores @ 1392 MHz NVIDIA Titan XP: The Titan XP is also an option but gives only a marginally better performance while being almost twice as expensive as the GTX 1080 Ti, it has 12 GB memory, 547.7 GB/s bandwidth and 3840 cores @ 1582 MHz. On a side note, NVIDIA Quadro GPUs are pretty expensive and don’t really help in deep learning they are more of use in CAD and working with heavy graphics production tasks. The graph below does a pretty good job of visualizing how all the GPUs above compare: Source: Slav Ivanov Blog, processing power is calculated as CUDA cores times the clock frequency Does the number of GPUs matter? Yes, it does. But how many do you really need? What’s going to suit the scale of your project without breaking your budget? 2 GPUs will always yield better results than just one - but it’s only really worth it if you need the extra power. There are two options you can take with multi-GPU deep learning. On the one hand, you can train several different models at once across your GPUs, or, alternatively distribute one single training model across multiple GPUs known as “multi-GPU training”. The latter approach is compatible with TensorFlow, CNTK, and PyTorch. Both of these approaches have advantages. Ultimately, it depends on how many projects you’re working on and, again, what your needs are. Another important point to bear in mind is that if you’re using multiple GPUs, the processor and hard disk need to be fast enough to feed data continuously - otherwise the multi-GPU approach is pointless. Source: NVIDIA website It boils down to your needs and budget, GPUs aren’t exactly cheap. Other heavy devices There are also other large machines apart from GPUs. These include the specialized supercomputer from NVIDIA, the DGX-2, and Tensor processing units (TPUs) from Google. The NVIDIA DGX-2 If you thought GPUs were expensive, let me introduce you to NVIDIA DGX-2, the successor to the NVIDIA DGX-1. It’s a highly specialized workstation; consider it a supercomputer that has been specially designed to tackle deep learning. The price of the DGX-2 is (*gasp*) $399,000. Wait, what? I could buy some new hot wheels for that, or Dual Intel Xeon Platinum 8168, 2.7 GHz, 24-cores, 16 NVIDIA GPUs, 1.5 terabytes of RAM, and nearly 32 terabytes of SSD storage! The performance here is 2 petaFLOPS. Let’s be real: many of us probably won’t be able to afford it. However, NVIDIA does have leasing options, should you choose to try it. Practically speaking, this kind of beast finds its use in research work. In fact, the first DGX-1 was gifted to OpenAI by NVIDIA to promote AI research. Visit the NVIDIA website for more on these monster machines. There are also personal solutions available like the NVIDIA DGX Workstation. TPUs Now that you’ve caught your breath after reading about AI dream machines, let’s look at TPUs. Unlike the DGX machines, TPUs run on the cloud. A TPU is what’s referred to as an application-specific integrated circuit (ASIC) that has been designed specifically for machine learning and deep learning by Google. Here’s the key stats: Cloud TPUs can provide up to 11.5 petaflops of performance in a single pod. If you want to learn more, go to Google’s website. When choosing GPUs you need to weigh up your options The GTX 1080 Ti is most commonly used by researchers and competitively for Kaggle, as it gives good value for money. Go for this if you are sure about what you want to do with deep learning. The GTX 1080 and GTX 1070 Ti are cheaper with less computing power, a more budget friendly option if you cannot afford the 1080 Ti. GTX 1070 saves you some more money but is slower. The GTX 1060 6GB and GTX 1050 Ti are good if you’re just starting off in the world of deep learning without burning a hole in your pockets. If you must have the absolute best GPU irrespective of the cost then the RTX 2080 Ti is your choice. It offers twice the performance for almost twice the cost of a 1080 Ti. Nvidia unveils a new Turing architecture: “The world’s first ray tracing GPU” Nvidia GPUs offer Kubernetes for accelerated deployments of Artificial Intelligence workloads Nvidia’s Volta Tensor Core GPU hits performance milestones. But is it the best?

0
0
34320

article-image-8-machine-learning-best-practices

Melisha Dsouza

02 Sep 2018

9 min read

8 Machine learning best practices [Tutorial]

Melisha Dsouza

02 Sep 2018

9 min read

Machine Learning introduces a huge potential to reduce costs and generate new revenue in an enterprise. Application of machine learning effectively helps in solving practical problems smartly within an organization. Machine learning automates tasks that would otherwise need to be performed by a live agent. It has made drastic improvements in the past few years, but many a time, a machine needs the assistance of a human to complete its task. This is why it is necessary for organizations to learn best practices in machine learning which you will learn in this article today. This article is an excerpt from a book written by Chiheb Chebbi titled Mastering Machine Learning for Penetration Testing Feature engineering in machine learning Feature engineering and feature selection are essential to every modern data science product, especially machine learning based projects. According to research, over 50% of the time spent building the model is occupied by cleaning, processing, and selecting the data required to train the model. It is your responsibility to design, represent, and select the features. Most machine learning algorithms cannot work on raw data. They are not smart enough to do so. Thus, feature engineering is needed, to transform data in its raw status into data that can be understood and consumed by algorithms. Professor Andrew Ng once said: "Coming up with features is difficult, time-consuming, requires expert knowledge. 'Applied machine learning' is basically feature engineering." Feature engineering is a process in the data preparation phase, according to the cross-industry standard process for data mining: The term Feature Engineering itself is not a formally defined term. It groups together all of the tasks for designing features to build intelligent systems. It plays an important role in the system. If you check data science competitions, I bet you have noticed that the competitors all use the same algorithms, but the winners perform the best feature engineering. If you want to enhance your data science and machine learning skills, I highly recommend that you visit and compete at www.kaggle.com: When searching for machine learning resources, you will face many different terminologies. To avoid any confusion, we need to distinguish between feature selection and feature engineering. Feature engineering transforms raw data into suitable features, while feature selection extracts necessary features from the engineered data. Featuring engineering is selecting the subset of all features, without including redundant or irrelevant features. Machine learning best practices Feature engineering enhances the performance of our machine learning system. We discuss some tips and best practices to build robust intelligent systems. Let's explore some of the best practices in the different aspects of machine learning projects. Information security datasets Data is a vital part of every machine learning model. To train models, we need to feed them datasets. While reading the earlier chapters, you will have noticed that to build an accurate and efficient machine learning model, you need a huge volume of data, even after cleaning data. Big companies with great amounts of available data use their internal datasets to build models, but small organizations, like startups, often struggle to acquire such a volume of data. International rules and regulations are making the mission harder because data privacy is an important aspect of information security. Every modern business must protect its users' data. To solve this problem, many institutions and organizations are delivering publicly available datasets, so that others can download them and build their models for educational or commercial use. Some information security datasets are as follows: The Controller Area Network (CAN) dataset for intrusion detection (OTIDS): http://ocslab.hksecurity.net/Dataset/CAN-intrusion-dataset The car-hacking dataset for intrusion detection: http://ocslab.hksecurity.net/Datasets/CAN-intrusion-dataset The web-hacking dataset for cyber criminal profiling: http://ocslab.hksecurity.net/Datasets/web-hacking-profiling The API-based malware detection system (APIMDS) dataset: http://ocslab.hksecurity.net/apimds-dataset The intrusion detection evaluation dataset (CICIDS2017): http://www.unb.ca/cic/datasets/ids-2017.html The Tor-nonTor dataset: http://www.unb.ca/cic/datasets/tor.html The Android adware and general malware dataset: http://www.unb.ca/cic/datasets/android-adware.html Use Project Jupyter The Jupyter Notebook is an open source web application used to create and share coding documents. I highly recommend it, especially for novice data scientists, for many reasons. It will give you the ability to code and visualize output directly. It is great for discovering and playing with data; exploring data is an important step to building machine learning models. Jupyter's official website is http://jupyter.org/: To install it using pip, simply type the following: python -m pip install --upgrade pip python -m pip install jupyter Speed up training with GPUs As you know, even with good feature engineering, training in machine learning is computationally expensive. The quickest way to train learning algorithms is to use graphics processing units (GPUs). Generally, though not in all cases, using GPUs is a wise decision for training models. In order to overcome CPU performance bottlenecks, the gather/scatter GPU architecture is best, performing parallel operations to speed up computing. TensorFlow supports the use of GPUs to train machine learning models. Hence, the devices are represented as strings; following is an example: "/device:GPU:0" : Your device GPU "/device:GPU:1" : 2nd GPU device on your Machine To use a GPU device in TensorFlow, you can add the following line: with tf.device('/device:GPU:0'): <What to Do Here> You can use a single GPU or multiple GPUs. Don't forget to install the CUDA toolkit, by using the following commands: Wget "http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.44-1_amd64.deb" sudo dpkg -i cuda-repo-ubuntu1604_8.0.44-1_amd64.deb sudo apt-get update sudo apt-get install cuda Install cuDNN as follows: sudo tar -xvf cudnn-8.0-linux-x64-v5.1.tgz -C /usr/local export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64" export CUDA_HOME=/usr/local/cuda Selecting models and learning curves To improve the performance of machine learning models, there are many hyper parameters to adjust. The more data that is used, the more errors that can happen. To work on these parameters, there is a method called GridSearchCV. It performs searches on predefined parameter values, through iterations. GridSearchCV uses the score() function, by default. To use it in scikit-learn, import it by using this line: from sklearn.grid_search import GridSearchCV Learning curves are used to understand the performance of a machine learning model. To use a learning curve in scikit-learn, import it to your Python project, as follows: from sklearn.learning_curve import learning_curve Machine learning architecture In the real world, data scientists do not find data to be as clean as the publicly available datasets. Real world data is stored by different means, and the data itself is shaped in different categories. Thus, machine learning practitioners need to build their own systems and pipelines to achieve their goals and train the models. A typical machine learning project respects the following architecture: Coding Good coding skills are very important to data science and machine learning. In addition to using effective linear algebra, statistics, and mathematics, data scientists should learn how to code properly. As a data scientist, you can choose from many programming languages, like Python, R, Java, and so on. Respecting coding's best practices is very helpful and highly recommended. Writing elegant, clean, and understandable code can be done through these tips: Comments are very important to understandable code. So, don't forget to comment your code, all of the time. Choose the right names for variables, functions, methods, packages, and modules. Use four spaces per indentation level. Structure your repository properly. Follow common style guidelines. If you use Python, you can follow this great aphorism, called the The Zen of Python, written by the legend, Tim Peters: "Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those!" Data handling Good data handling leads to successfully building machine learning projects. After loading a dataset, please make sure that all of the data has loaded properly, and that the reading process is performing correctly. After performing any operation on the dataset, check over the resulting dataset. Business contexts An intelligent system is highly connected to business aspects because, after all, you are using data science and machine learning to solve a business issue or to build a commercial product, or for getting useful insights from the data that is acquired, to make good decisions. Identifying the right problems and asking the right questions are important when building your machine learning model, in order to solve business issues. In this tutorial, we had a look at somes tips and best practices to build intelligent systems using Machine Learning. To become a master at penetration testing using machine learning with Python, check out this book Mastering Machine Learning for Penetration Testing Why TensorFlow always tops machine learning and artificial intelligence tool surveys Intelligent Edge Analytics: 7 ways machine learning is driving edge computing adoption in 2018 Tackle trolls with Machine Learning bots: Filtering out inappropriate content just got easy

0
0
34311

Akram Hussain

31 Oct 2014

3 min read

Python Data Stack

Akram Hussain

31 Oct 2014

3 min read

The Python programming language has grown significantly in popularity and importance, both as a general programming language and as one of the most advanced providers of data science tools. There are 6 key libraries every Python analyst should be aware of, and they are: 1 - NumPY NumPY: Also known as Numerical Python, NumPY is an open source Python library used for scientific computing. NumPy gives both speed and higher productivity using arrays and metrics. This basically means it's super useful when analyzing basic mathematical data and calculations. This was one of the first libraries to push the boundaries for Python in big data. The benefit of using something like NumPY is that it takes care of all your mathematical problems with useful functions that are cleaner and faster to write than normal Python code. This is all thanks to its similarities with the C language. 2 - SciPY SciPY: Also known as Scientific Python, is built on top of NumPy. SciPy takes scientific computing to another level. It’s an advanced form of NumPy and allows users to carry out functions such as differential equation solvers, special functions, optimizers, and integrations. SciPY can be viewed as a library that saves time and has predefined complex algorithms that are fast and efficient. However, there are a plethora of SciPY tools that might confuse users more than help them. 3 - Pandas Pandas is a key data manipulation and analysis library in Python. Pandas strengths lie in its ability to provide rich data functions that work amazingly well with structured data. There have been a lot of comparisons between pandas and R packages due to their similarities in data analysis, but the general consensus is that it is very easy for anyone using R to migrate to pandas as it supposedly executes the best features of R and Python programming all in one. 4 - Matplotlib Matplotlib is a visualization powerhouse for Python programming, and it offers a large library of customizable tools to help visualize complex datasets. Providing appealing visuals is vital in the fields of research and data analysis. Python’s 2D plotting library is used to produce plots and make them interactive with just a few lines of code. The plotting library additionally offers a range of graphs including histograms, bar charts, error charts, scatter plots, and much more. 5 - scikit-learn scikit-learn is Python’s most comprehensive machine learning library and is built on top of NumPy and SciPy. One of the advantages of scikit-learn is the all in one resource approach it takes, which contains various tools to carry out machine learning tasks, such as supervised and unsupervised learning. 6 - IPython IPython makes life easier for Python developers working with data. It’s a great interactive web notebook that provides an environment for exploration with prewritten Python programs and equations. The ultimate goal behind IPython is improved efficiency thanks to high performance, by allowing scientific computation and data analysis to happen concurrently using multiple third-party libraries. Continue learning Python with a fun (and potentially lucrative!) way to use decision trees. Read on to find out more.

0
0
34151

article-image-how-machine-learning-as-a-service-transforming-cloud

Vijin Boricha

18 Apr 2018

4 min read

How machine learning as a service is transforming cloud

Vijin Boricha

18 Apr 2018

4 min read

Machine learning as a service (MLaaS) is an innovation that is growing out of 2 of the most important tech trends - cloud and machine learning. It's significant because it enhances both. It makes cloud an even more compelling proposition for businesses. That's because cloud typically has three major operations: computing, networking and storage. When you bring machine learning into the picture, the data that cloud stores and processes can be used in radically different ways, solving a range of business problems. What is machine learning as a service? Cloud platforms have always competed to be the first or the best to provide new services. This includes platform as a service (PaaS) solutions, infrastructure as a service (IaaS) solutions and software as a service (SaaS) solutions. In essense, cloud providers like AWS and Azure provide sets of software to different things so their customers don't have to. Machine learning as a service is simply another instance of the services offered by cloud providers. It could include a wide range of features, from data visualization to predictive analytics and natural language processing. It makes running machine learning models easy, effectively automating some of the work that might have typically done manually by a data engineering team. Here are the biggest cloud providers who offer machine learning as a service: Google Cloud Platform Amazon Web Services Microsoft Azure IBM Cloud Every platform provides a different suite of services and features. It will ultimately depend on what's most important to you which one you choose. Let's take a look now at the key differences between these cloud providers' machine learning as a service offerings. Comparing the leading MLaaS products Google Cloud AI Google Cloud Platform has always provided their own services to help businesses grow. They provide modern machine learning services with pre-trained models and a service to generate your own tailored models. Majority of Google applications like Photos (image search), the Google app (voice search), and Inbox (Smart Reply) have been built using the same services that they provide to their users. Pros: Cheaper in comparison to other Cloud providers Provides IaaS and PaaS Solutions Cons: Google Prediction API is going to be discontinued (May 1st, 2018) Lacks a visual interface You'll need to know TensorFlow Amazon Machine Learning Amazon Machine Learning provides services for building ML models and generating predictions which help users develop robust, scalable, and cost-effective smart applications. With the help of Amazon Machine Learning you are able to use powerful machine learning technology without having any prior experience in machine learning algorithms and techniques. Pros: Provides versatile automated solutions It's accessible - users don't need to be machine learning experts Cons: The more you use, the more expensive it is Azure Machine Learning Studio Microsoft Azure provides you with Machine Learning Studio - a simple browser-based, drag-and-drop environment which functions without any kind of coding. You are provided with fully-managed cloud services that enable you to easily build, deploy and share predictive analytics solutions. Here you are also provided with a platform (Gallery) to share and contribute to the community. Pros: Consists of most versatile toolset for MLaaS You can contribute to and reuse machine learning solutions from the community Cons: Comparatively expensive A lot of manual work is required Watson Machine Learning Similar to the above platforms, IBM Watson Machine Learning is a service that helps users to create, train, and deploy self-learning models to integrate predictive capabilities within their applications. This platform provides automated and collaborative workflows to grow intelligent business applications. Pros: Automated workflows Data science skills is not necessary Cons: Comparatively limited APIs and services Lacks streaming analytics Selecting the machine learning as a service solution that's right for you There are so many machine learning as a service solutions out there that it's easy to get confused. The crucial step to take before you make a decision to purchase anything is to plan your business requirements. Think carefully not only about what you want to achieve, but what you already do too. You want your MLaaS solution to easily integrate into the way you currently work. You also don't want it to replicate any work you're currently doing that you're pretty happy with. It gets repeated so much but it remains as true as it has ever been - make sure your software decisions are fully aligned with your business needs. It's easy to get seduced by the promise of innovative new tools, but without the right alignment they're not going to help you at all.

0
0
34133

article-image-what-can-artificial-intelligence-do-for-the-aviation-industry

Guest Contributor

14 May 2019

6 min read

What can Artificial Intelligence do for the Aviation industry

Guest Contributor

14 May 2019

6 min read

0
0
34111

article-image-postgresql-committer-stephen-frost-shares-vision-postgresql-12-beyond

Sugandha Lahoti

04 Dec 2019

8 min read

PostgreSQL committer Stephen Frost shares his vision for PostgreSQL version 12 and beyond

Sugandha Lahoti

04 Dec 2019

8 min read

PostgreSQL version 12 was released in October this year and has earned a strong reputation for being reliable, feature robust, and performant. During the PostgreSQL Conference for the Asia Pacific, PostgreSQL major contributor and committer, Stephen Frost talked about a number of new features available in PostgreSQL 12 including Pluggable Storage, Partitioning and performance improvements as well as SQL features. In this post we have tried to cover a short synopsis of his Keynote. The full talk is available on YouTube. Want to learn how to build your own PostgreSQL applications? PostgreSQL 12 has an array of interesting features such as advanced indexing, high availability, database configuration, database monitoring, to efficiently manage and maintain your database. If you are a database developer and want to leverage PostgreSQL 12, we recommend you to go through our latest book Mastering PostgreSQL 12 - Third Edition written by Hans-Jürgen Schönig. This book examines in detail the newly released features in PostgreSQL 12 to help you build efficient and fault-tolerant PostgreSQL applications. [dropcap]S[/dropcap]tephen Frost is the CTO of Crunchy Data. As a PostgreSQL major contributor he has implemented ‘roles support’ in version 8.1 to replace the existing user/group system, SQL column-level privileges in version 8.4, and Row Level Security in PostgreSQL 9.5. He has also spoken at numerous conferences, including pgConf.EU, pgConf.US, PostgresOpen, SCALE and others. Stephen Frost on PostgreSQL 12 features Pluggable Storage This release introduces the pluggable table storage interface, which allows developers to create their own methods for storing data. Postgres before version 12 had one storage engine - one primary heap. All indexes were secondary indexes which means that they referred directly into pointers on disk. Also, this heap structure was row-based. Every row has a big header associated with the row which may cause issues when storing very narrow tables (which have two or fewer columns included in it.) PostgreSQL 12 now has the ability to have multiple different storage formats underneath - the pluggable storage. This new feature is going to be the basis for columnar storage coming up probably in v13 or v14. It's also going to be the basis for Z heap - which is an alternative heap that is going to allow in-place updates and uses an undo log instead of using the redo log that PostgreSQL has. Version 12 is building on the infrastructure for pluggable storage, and the team does not have anything user-facing yet. It's not going to be until v13 and later that they will actually have new storage mechanisms that are built on top of the pluggable storage feature. Partitioning improvements Postgres is adding in a whole bunch of new features to make working with partitions in Postgres easier to deal with. Postgres 12 has major improvements in declarative partitioning capability which makes working with partitions more effective and easier to deal with. Partition selection has dramatically improved especially when selecting from a few partitions out of a large set. Postgres 12 has the ability to ATTACH/DETATCH CONCURRENTLY. This is the ability to, on the fly, attach a partition and detach a partition without having to take very heavy locks. You can add new partitions to your partitioning scheme without having to take any outage or downtime or have any impact on the ongoing operations of the system. This release also increases the number of partitions. The initial declarative partitioning patch made planning slow when you got over a few hundred partitions. This is now fixed with a much faster planning methodology for dealing with partitions. Postgres 12 allows Multi-INSERT during COPY statements into partitioned tables. COPY is a way in which you can bulk load data into Postgres. This feature makes it much faster to COPY into partitioned tables. There is also a new function pg_partition_tree to display partition info. Performance improvements and SQL features Parallel Query with SERIALIZABLE Parallel Query has been in Postgres since version 9.6 but it did not work with the serializable isolation level. Serializable is actually the highest level of isolation that you can have inside of Postgres. With Postgres 12, you have the ability to have a parallel query with serializable and have that highest level of isolation. This increases the number of places that parallel query can be used. This also allows application authors to worry less about concurrency because serializable in Postgres provides true serializability that exists in very few databases. Faster float handling Postgres 12 has a new library for converting floating point into text. This provides a significant speedup for many workloads where you're doing text-based transfer of data. Although it may result in slightly different (possibly more correct) output. Partial de-TOAST/decompress a value Historically to access any compressed TOAST value, you had to decompress the whole thing into memory. This was not very ideal in situations where you wanted access to, only the front of it. Partial de-TOAST allows decompressing of a section of the TOAST value. This also gives a great improvement in performance for cases like: PostGIS geometry/geography- data at the front can be used for filtering Pulling just the start of a text string COPY FROM with WHERE Postgres 12 now has a WHERE clause supported by a COPY FROM statement. This allows you to filter data/records while importing. Earlier it was done using the file_fdw, but it was tedious as it required creating a foreign table. Create or Replace Aggregate This features allows an aggregate to either be created, if it does not exist, or replaced it it does. It makes extension upgrade scripts much simpler. This feature was requested specifically by the Postgres community. Inline Common Table Expressions Not having Inline CTEs was seen as an optimization barrier. From version 12, Postgres, by default, inlines the CTEs if it can. It also supports the old behavior so in the event that you actually want CTE to be an optimization barrier, you can still do that. You just have to specify WITH MATERIALIZED when you go to write your CTE. SQL/JSON improvements There is also progress made towards supporting the SQL/JSON standard Added a number of json_path functions json_path_exists json_path_match json_path_query json_path_query_array json_path_query_first Added new operators for working with json: jsonb @? Jsonpath - wrapper for jsonpath_exists jsonb @? Jsonpath - wrapper for jsonpath_predicate Index support should also be added for these operators soon Recovery.conf moved into postgresql.conf Recovery.conf is no longer available in Postgresql 12 and all options are moved to postgresql.conf. This allows changing recovery parameters via ALTER SYSTEM. This feature increases flexibility meaning that it allows changing primary via ALTER SYSTEM/reload. However, this is a disruptive change. Every high-level environment will change but it reduces the fragility of high-level solutions moving forward. A new pg_promote function is added to allow promoting a replica from SQL. Control SSL protocol With Postgres 12, you can now control SSL protocols. Older SSL protocols are required to be disabled for security reasons. They were enforced previously with FIPS mode. They are now addressed in CIS benchmark/ STIG updates. Covering GIST indexes GIST indexes can now also use INCLUDE. These are useful for adding columns to allow index-only queries. It also allows including columns that are not part of the search key. Add CSV output mode to psql Previously you could get CSV but you had to do that by taking your query inside a copy statement. Now you can use the new pset format for CSV output from psql. It returns data in each row in CSV format instead of tabular format. Add option to sample queries This is a new log_statement_sample_rate parameter which allows you to set log_min duration to be very very low or zero. Logging all statements is very expensive as it slows down the whole system and you end up having a backlog of processes trying to write into the logging system. The new log_statement_sample_rate parameter includes only a sample of those queries in the output rather than logging every query. The log_min_duration_statement excludes very fast queries. It helps with analysis in environments with lots of fast queries. New HBA option called clientcert=verify-full This new HBA option allows you to do a two-factor authentication where one of the factors is a certificate and the other one might be a password or something else (PAM, LDAP, etc). It gives you the ability to say that every user has to have a client-side certificate and that the client-side certificate must be validated by the server on connection and have to provide a password. It works with non-cert authentication methods and requires client-side certificates to be used. In his talk, Stephen also answered commonly asked questions about Postgres, watch the full video to know more. You can read about other performance and maintenance enhancements in PostgreSQL 12 on the official blog. To learn advanced PostgreSQL 12 concepts with real-world examples and sample datasets, go through the book Mastering PostgreSQL 12 - Third Edition by Hans-Jürgen Schönig. Introducing PostgREST, a REST API for any PostgreSQL database written in Haskell Percona announces Percona Distribution for PostgreSQL to support open source databases Wasmer’s first Postgres extension to run WebAssembly is here!

0
0
34104

article-image-what-makes-functional-programming-a-viable-choice-for-artificial-intelligence-projects

Prasad Ramesh

14 Sep 2018

7 min read

What makes functional programming a viable choice for artificial intelligence projects?

Prasad Ramesh

14 Sep 2018

7 min read

0
0
34046

Tech Guides - Data

How Serverless computing is making AI development easier

Top 5 tools for reinforcement learning

5 JavaScript machine learning libraries you need to know

Two popular Data Analytics methodologies every data professional should know: TDSP & CRISP-DM

Best Machine Learning Datasets for beginners

How Rolls Royce is applying AI and robotics for smart engine maintenance

7 AI tools mobile developers need to know

How AI is going to transform the Data Center

NVIDIA leads the AI hardware race. But which of its GPUs should you use for deep learning?

8 Machine learning best practices [Tutorial]

Trending Topics

Python Data Stack

How machine learning as a service is transforming cloud

What can Artificial Intelligence do for the Aviation industry

PostgreSQL committer Stephen Frost shares his vision for PostgreSQL version 12 and beyond

What makes functional programming a viable choice for artificial intelligence projects?

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access