Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-4-ways-use-machine-learning-enterprise-security

29 Aug 2017

6 min read

4 Ways You Can Use Machine Learning for Enterprise Security

29 Aug 2017

Cyber threats continue to cost companies money and reputation. Yet security seems to be undervalued. Or maybe it's just misunderstood. With a series of large-scale cyberattack events and the menace of ransomware, earlier WannaCry and now Petya, continuing to affect millions globally, it’s time you reimagined how your organization stays ahead of the game when it comes to software security. Fortunately, machine learning can help support a more robust, reliable and efficient security initiative. Here are just 4 ways machine learning can support your software security strategy. Revamp of your company’s endpoint protection with machine learning We have seen in the past how a single gap in endpoint protection resulted in serious data breaches. In May this year, Mexican fast food giant Chipotle learned the hard way when cybercriminals exploited the company's point of sale systems to steal credit card information. The Chiptole incident was a very real reminder for many retailers to patch critical endpoints on a regular basis. It is crucial to guard your company’s endpoints which are virtual front doors to your organization’s precious information. Your cybersecurity strategy must consider a holistic endpoint protection strategy to secure against a variety of threats, both known and unknown. Traditional endpoint security approaches are proving to be ineffective and costing businesses millions in terms of poor detection and wasted time. The changing landscape of the cybersecurity market brings with it its own set of unique challenges (Palo Alto Networks have highlighted some of these challenges in their whitepaper here). Sophisticated Machine Learning techniques can help fight back threats that aren’t easy to defend with traditional ways. One could achieve this by adopting any of the three ML approaches: Supervised machine learning, unsupervised machine learning and reinforcement learning. Establishing the right machine learning approach entails a significant understanding of your expectations from the endpoint protection product. You might consider checking on the speed, accuracy, and efficiency of the machine learning based endpoint protection solution with the vendor to make an informed choice of what you are opting for. We recommend the use of a supervised machine learning approach for endpoint protection as it’s a proven way of malware detection and it delivers accurate results. The only catch is that these algorithms require relevant data in sufficient quantity to work on and the training rounds need to be speedy and effective to guarantee efficient malware detection. Some of the popular ML-based endpoint protection options available in the market are Symantec Endpoint Protection 14, CrowdStrike, and TrendMicro’s XGen. Use machine learning techniques to predict security threats based on historical data Predictive analytics is no longer just restricted to data science. By adopting predictive analytics, you can take a proactive approach to cybersecurity too. Predictive analytics makes it possible to not only identify infections and threats after they have caused damage, but also to raise an alarm for any future incidents or attacks. Predictive analytics is a crucial part of the learning process for the system. With sophisticated detection techniques the system can monitor network activities and report real-time data. One incredibly effective technique organizations are now beginning to use is a combination of advanced predictive analytics with a red team approach. This enables organizations to think like the enemy and model a broad range of threats. This process mines and captures large sets of data which is then processed. The real value here is the ability to generate meaningful insights out of the large data set collected and then letting the red team work on processing and identifying potential threats. This is then used by the organization to evaluate its capabilities, to prepare for future threats and to mitigate potential risks. Harness the power of behavior analytics to detect security intrusions Behavior analytics is a highly trending area today in the cybersecurity space. Traditional systems such as antiviruses are skilled in identifying attacks based on historical data and matching signatures. Behavior analytics, on the other hand, detects anomalies and makes a judgement against what would be considered normal behaviour. As such, behavior analytics in enterprises is proving very effective when it comes to detecting intrusions that otherwise evade firewalls or antivirus software. It complements existing security measures such as firewall and antivirus rather than replacing them. Behavior analytics work well within private cloud and infrastructures and is able to detect threats within internal networks. One popular example is Enterprise Immune System, by the vendor Darktrace, which uses machine learning to detect abnormal behavior in the system. It helps IT staff narrow down their perimeter of search and look out for specific security events through a visual console. What’s really promising is that because Darktrace uses machine learning, the system is not just learning from events within internal systems, but from events happening globally as well. Use machine learning to close down IoT vulnerabilities Trying to manage large amounts of data and logs generated from millions of IoT devices manually could be overwhelming if your company relies on the Internet of Things. Many a time, IoT devices are directly connected to the network which means it is fairly easy for attackers and hackers to take advantage of your inadequately protected networks. It could therefore be next to impossible to build a secure IoT system, if you set out to identify and fix vulnerabilities manually. Machine learning can help you analyze and make sense of millions of data logs generated from IoT capable devices. Machine learning powered cybersecurity systems placed and seated directly inside your system can learn about security events as they happen. It can then monitor both incoming and outgoing IoT traffic in devices connected to the network and generate profiles for appropriate and inappropriate behavior inside your IoT ecosystem. This way the security system is able to react to even the slightest of irregularities and detect anomalies that were not experienced before. Currently, only a handful number of software and tools use Machine Learning or Artificial Intelligence for IoT security. But we are already seeing development on this front by major security vendors such as Symantec. Surveys carried out frequently on IoT continue to highlight security as a major barrier to IoT adoption and we are hopeful that Machine Learning will come to the rescue. Cyber crimes are evolving at a breakneck speed while businesses remain slow in adapting their IT security strategies to keep up with the times. Machine learning can help businesses make that leap to proactively address cyber threats and attacks by: Having an intelligent revamp of your company’s endpoint protection Investing in machine learning techniques that predict threats based on historical data Harnessing the power of behavior analytics to detect intrusions Using machine learning to close down IoT vulnerabilities And, that’s just the beginning. Have you used machine learning in your organization to enhance cybersecurity? Share with us your best practices and tips for using machine learning in cybersecurity in the comments below!

0
0
11584

article-image-are-distributed-networks-decentralized-systems-same

Amarabha Banerjee

31 Jul 2018

3 min read

Are distributed networks and decentralized systems the same?

Amarabha Banerjee

31 Jul 2018

3 min read

The emergence of Blockchain has paved way for the implementation of non-centralized network architecture. The seeds of distributed network architecture was sown back in 1964, by Paul Baran in his paper “On Distributed Networks“. Since then, there have been many attempts at implementing this architecture in network systems. The most recent implementation has been aided by the discovery of Blockchain technology, in 2009, by the anonymous Satoshi Nakamoto. Terminologies: Network: A collection of interlinked nodes that exchange information. Node: The most basic part of the network; for example, a user or computer. Link: The connection between two nodes. Server: A node that has connections to a relatively large amount of other nodes The mother of all the network architectures is a centralized Network. Here, the primary decision making control rests with one Node. The message is carried on by Sub Nodes. Distributed networks are in a way, a conglomeration of small centralized networks. It consists of multiple Nodes which themselves are miniature versions of centralized networks. Decentralized networks consist of individual nodes and every one of these Nodes are capable of independent decision making. Hence there is no central control in Decentralized networks. Source: Meshworld A common misconception is that Distributed and Decentralized systems are the same; they just have different nomenclature and slightly different functionalities. In a way this is true, but not completely. A distributed system still has one central control algorithm, that takes the final call over the process and protocol to be followed. As an example, let’s consider distributed digital ledgers. Each of these ledgers are independent network nodes. These ledgers get updated with individual transactions and other details and this information is not passed on to the other Nodes. This particular feature makes this system secure and decentralized. The other Nodes are not aware of the information stored. This is how a decentralized network behaves. Now the same system behaves a bit differently, when the Nodes communicate with each other, (in-case of Ethereum, by the movement of “Ether” between nodes). Although the individual Node’s information stays secure, the information about the state of the Nodes is passed on, finally to a central peer. Then the peer takes on the decision of which state to finally change to and what is the optimum process to change the state. This is decided by the votes of the individual nodes. The Nodes then change to the new state, preserving information about the previous state. This makes the system dynamic, secure and distributed, because although the Nodes get to vote based on their individual states, the final decision is taken by the centralized peer. This is a distributed system. Hence we can clearly state that decentralized systems are a subset of distributed systems, more independent and minus any central controlling authority. This presents a familiar question to us - are the current blockchain based apps purely decentralized, or are they just distributed systems with a central control? Could this be the reason why we have not yet reached the ultimate Cryptocurrency based alternative economy? Put differently, is the invisible central control hindering the evolution of blockchain based systems to a purely decentralized system and economy? Only time and more dedicated research, along with better practical implementation of decentralized applications will answer these questions. A brief history of Blockchain Blockchain can solve tech’s trust issues – Imran Bashir Partnership alliances of Kontakt.io and IOTA Foundation for IoT and Blockchain

0
0
11536

Janu Verma

08 Jun 2017

6 min read

What is transfer learning?

Janu Verma

08 Jun 2017

6 min read

Introduction and motivation In standard supervised machine learning, we need training data, i.e. a set of data points with known labels, and we build a model to learn the distinguishing properties that separate data points with different labels. This trained model can then be used to make label predictions for new data points. If we want to make predictions for another task (with different labels) in a different domain, we cannot use the model trained previously. We need to gather training data with the new task, and train a separate model. Transfer learning provides a framework to leverage the already existing model (based on some training data) in a related domain. We can transfer the knowledge gained in the previous model to the new domain (and data). For example, if we have built a model to detect pedestrians and vehicles in traffic images, and we wish to build a model for detecting pedestrians, cycles, and vehicles in the same data, we will have to train a new model with three classes because the previous model was trained to make two-class predictions. But we clearly have learned something in the two-class situation, e.g. discerning people walking from moving vechicles. In the transfer learning paradigm, we can use our learnings from the two-label classifier to the three-label classifier that we intend to construct. As such, we can already see that transfer learning has very high potential. In the words of Andrew Ng, a leading expert in machine learning, in his extremly popular NIPS 2016 tutorial, "Transfer learning will be next driver of machine learning success." Transfer learning in deep learning Transfer learning is particularly popular in deep learning. The reason for this is that it's very expensive to train deep neural networks, and they require huge amounts of data to be able to achieve their full potential. In fact, other recent successes of deep learning can be attributed to the availablity of a lot of data and stronger computational resources. But, other than a few large companies like Google, Facebook, IBM, and Microsoft, it's very difficult to accrue data and the computational machines required for training strong deep learning models. In such a situation, transfer learning comes to the rescue. Many pre-trained models, trained on a large amount of data, have been made available publically, along with the values of billions of parameters. You can use the pre-trained models on large data, and rely on transfer learning to build models for your specific case. Examples The most popular application of transfer learning is image classification using deep convolution neural networks (ConvNets). A bunch of high performing, state-of-the-art convolution neural network based image classifiers, trained on ImageNet data (1.2 million images with 100 categories), are available publically. Examples of such models include AlexNet, VGG16, VGG19, InceptionV3, and more, which takes months to train. I have personally used transfer learning to build image classifiers on top of VGG19 and InceptionV3. Another popular model is the pre-trained distributed word embeddings for millions of words, e.g word2vec, GloVe, FastText, etc. These are trained on all of Wikipedia, Google News, etc., and provide vector representations for a huge number of words. This can then be used in a text classification model. Strategies for transfer learning Transfer learning can be used in one the following four ways: Directly use pre-trained model: The pre-trained model can be directly used for a similar task. For example, you can use the InceptionV3 model by Google to make predictions about the categories of images. These models are already shown to have high accuracy. Fixed features: The knowledge gained in one model can be used to build features for the data points, and such features (fixed) are then fed to new models. For example, you can run the new images through a pre-trained ConvNet and the output of any layer can be used as a feature vector for this image. The features thus built can be used in a classifier for the desired situation. Similarly, you can directly use the word vectors in the text classification model. Fine-tuning the model: In this strategy, you can use the pre-trained network as your model while allowing for fine-tuning the network. For example, for the image classifier model, you can feed your images to the InceptionV3 model and use the pre-trained weights as an initialization (rather than random initialzation). The model will be trained on the much smaller user-provided data. The advantage of such a strategy is that weights can reach the global minima without much data and training. You can also make a portion (usually the begining layers) fixed, and only fine-tune the remaining layers. Combining models: Instead of re-training the top few layers of a pre-trained model, you can replace the top few layers by a new classifier, and train this combined network, while keeping the pre-trained portion fixed. Remarks It is not a good idea to fine-tune the pre-trained model if the data is too small and similar to the original data. This will result in overfitting. You can directly feed the data to the pre-trained model or train a simple classifier on the fixed features extracted from it. If the new data is large, it is a good idea to fine-tune the pre-trained model. In case the data is similar to the original, we can fine-tune only the top few layers, and fine-tuning will increase confidence in our predictions. If the data is very different, we will have to fine-tune the whole network. Conclusion Transfer learning allows someone without a large amount of data or computational capabilities to take advantage of the deep learning paradigm. It is an exciting research and application direction to use off-the-shelf pre-trained models and transfer them to novel domains. About the Author Janu Verma is a Researcher in the IBM T.J. Watson Research Center, New York. His research interests are in mathematics, machine learning, information visualization, computational biology and healthcare analytics. He has held research positions at Cornell University, Kansas State University, Tata Institute of Fundamental Research, Indian Institute of Science, and Indian Statistical Institute. He has written papers for IEEE Vis, KDD, International Conference on HealthCare Informatics, Computer Graphics and Applications, Nature Genetics, IEEE Sensors Journals etc. His current focus is on the development of visual analytics systems for prediction and understanding. He advises startups and companies on data science and machine learning in the Delhi-NCR area, email to schedule a meeting. Check out his personal website at http://jverma.github.io/.

0
0
11163

Sugandha Lahoti

04 Sep 2017

7 min read

TensorFire: Firing up Deep Neural Nets in your browsers

Sugandha Lahoti

04 Sep 2017

7 min read

Machine Learning is a powerful tool with applications in a wide variety of areas including image and object recognition, healthcare, language translation, and more. However, running ML tools requires complicated backends, complex architecture pipelines, and strict communication protocols. To overcome these obstacles, TensorFire, an in-browser DL library, is bringing the capabilities of machine learning to web browsers by running neural nets at blazingly fast speeds using GPU acceleration. It’s one more step towards democratizing machine learning using hardware and software already available with most people. How did in-browser deep learning libraries come to be? Deep Learning neural networks, a type of advanced machine learning, are probably one of the best approaches for predictive tasks. They are modular, can be tested efficiently and can be trained online. However, since neural nets make use of supervised learning (i.e. learning fixed mappings from input to output) they are useful only when large quantities of labelled training data and sufficient computational budget are available. They require installation of a variety of software, packages and libraries. Also, running a neural net has a suboptimal user experience as it opens a console window to show the execution of the net. This called for an environment that could make these models more accessible, transparent, and easy to customize. Browsers were a perfect choice as they are powerful, efficient, and have interactive UI frameworks. Deep Learning in-browser neural nets can be coded using JavaScript without any complex backend requirements. Once browsers came into play, in-browser deep learning libraries (read ConvNetJS, CaffeJS, MXNetJS etc.) have been growing in popularity. Many of these libraries work well. However, they leave a lot to be desired in terms of speed and easy access. TensorFire is the latest contestant in this race aiming to solve the problem of latency. What is TensorFire? It is a Javascript library which allows executing neural networks in web browsers without any setup or installation. It’s different from other existing in-browser libraries as it leverages the power of inbuilt GPUs of most modern devices to perform exhaustive calculations at much faster rates - almost 100x faster. Like TensorFlow, TensorFire is used to swiftly run ML & DL models. However, unlike TensorFlow which deploys ML models to one or more CPUs in a desktop, server, or mobile device, TensorFire utilizes GPUs irrespective of whether they support CUDA eliminating the need of any GPU-specific middleware. At its core, TensorFire is a JavaScript runtime and a DSL built on top of WebGL shader language for accelerating neural networks. Since, it runs in browsers, which are now used by almost everyone, it brings machine and deep learning capabilities to the masses. Why should you choose TensorFire? TensorFire is highly advantageous for running machine learning capabilities in the browsers due to four main reasons: 1.Speed They also utilize powerful GPUs (both AMD and Nvidia GPUs) built in modern devices to speed up the execution of neural networks. The WebGL shader language is used to easily write fast vectorized routines that operate on four-dimensional tensors. Unlike pure Javascript based libraries such as ConvNetJS, TensorFire uses WebGL shaders to run in parallel the computations needed to generate predictions from TensorFlow models. 2. Ease of use TensorFire also avoids shuffling of data between GPUs and CPUs by keeping as much data as possible on the GPU at a time, making it faster and easier to deploy.This means that even browsers that don’t fully support WebGL API extensions (such as the floating-point pixel types for textures) can be utilized to run deep neural networks.Since it has a low-precision approach, smaller models are easily deployed to the client resulting in fast prediction capabilities. TensorFire makes use of low-precision quantized tensors. 3. Privacy This is done by the website training a network on the server end and then distributing the weights to the client.This is a great fit for applications where the data is on the client-side and the deployment model is small.Instead of bringing data to the model, the model is delivered to users directly thus maintaining their privacy.TensorFire significantly improves latencies and simplifies the code bases on the server side since most computations happen on the client side. 4. Portability TensorFire eliminates the need for downloading, installing, and compiling anything as a trained model can be directly deployed into a web browser. It can also serve predictions locally from the browser. TensorFire eliminates the need to install native apps or make use of expensive compute farms. This means TensorFire based apps can have better reach among users. Is TensorFire really that good? TensorFire has its limitations. Using in-built browser GPUs for accelerating speed is both its boon and bane. Since GPUs are also responsible for handling the GUI of the computer, intensive GPU usage may render the browser unresponsive. Another issue is that although using TensorFire speeds up execution, it does not improve the compiling time. Also, the TensorFire library is restricted to inference building and as such cannot train models. However, it allows importing models pre-trained with Keras or TensorFlow. TensorFire is suitable for applications where the data is on the client-side and the deployed model is small. You can also use it in situations where the user doesn’t want to supply data to the servers. However, when both the trained model and the data are already established on the cloud, TensorFire has no additional benefit to offer. How is TensorFire being used in the real-world? TensorFire’s low-level APIs can be used for general purpose numerical computation running algorithms like PageRank for calculating relevance or Gaussian Elimination for inverting mathematical matrices in a fast and efficient way. Having capabilities of fast neural networks in the browsers allows for easy implementation of image recognition. TensorFire can be used to perform real-time client-side image recognition. It can also be used to run neural networks that apply the look and feel of one image into another, while making sure that the details of the original image are preserved. Deep Photo Style Transfer is an example. When compared with TensorFlow which required minutes to do the task, TensorFire took only few seconds. TensorFire also paves way for making tools and applications that can quickly parse and summarize long articles and perform sentiment analysis on their text. It can also enable running RNN in browsers to generate text with a character-by-character recurrent model. With TensorFire, neural nets running in browsers can be used for gesture recognition, distinguishing images, detecting objects etc. These techniques are generally employed using the SqueezeNet architecture - a small convolutional neural net that is highly accurate in its predictions with considerably fewer parameters. Neural networks in browsers can also be used for web-based games, or for user-modelling. This involves modelling some aspects of user behavior, or content of sites visited to provide a customized user experience. As TensorFire is written in JavaScript, it is readily available for use on the server side (available on Node.js) and thus can be used for server based applications as well. Since TensorFire is relatively new, its applications are just beginning to catch fire. With a plethora of features and advantages under its belt, TensorFire is poised to become the default choice for running in-browser neural networks. Because TensorFlow natively supports only CUDA, TensorFire may even outperform TensorFlow on computers that have non-Nvidia GPUs.

0
0
11002

article-image-level-your-companys-big-data-mesos

Timothy Chen

23 Dec 2015

5 min read

Level Up Your Company's Big Data with Mesos

Timothy Chen

23 Dec 2015

5 min read

In my last post I talked about how using a resource management platform can allow your Big Data workloads to be more efficient with less resources. In this post I want to continue the discussion with a specific resource management platform, which is Mesos. Introduction to Mesos Mesos is an Apache top-level project that provides an abstraction to your datacenter resources and an API to program against these resources to launch and manage your workloads. Mesos is able to manage your CPU, memory, disk, ports and other resources that the user can custom defines. Every application that wants to use resources in the datacenter to run tasks talks with Mesos is called a scheduler. It uses the scheduler API to receive resource offers and each scheduler can decide to use the offer, decline the offer to wait for future ones, or hold on the offer for a period of time to combine the resources. Mesos will ensure to provide fairness amongst multiple schedulers so no one scheduler can overtake all the resources. So how does your Big data frameworks benefit specifically by using Mesos in your datacenter? Autopilot your Big data frameworks The first benefit of running your Big data frameworks on top of Mesos, which by abstracting away resources and providing an API to program against your datacenter, is that it allows each Big data framework to self-manage itself without minimal human intervention. How does the Mesos scheduler API provide self management to frameworks? First we should understand a little bit more what does the scheduler API allows you to do. The Mesos scheduler API provides a set of callbacks whenever the following events occurs: New resources available, task status changed, slave lost, executor lost, scheduler registered/disconnected, etc. By reacting to each event with the Big data framework's specific logic it allows frameworks to deploy, handle failures, scale and more. Using Spark as an example, when a new Spark job is launched it launches a new scheduler waiting for resources from Mesos. When new resources are available it deploys Spark executors to these nodes automatically and provide Spark task information to these executors and communicate the results back to the scheduler. When some reason the task is terminated unexpectedly, the Spark scheduler receives the notification and can automatically relaunch that task on another node and attempt to resume the job. When the machine crashes, the Spark scheduler is also notified and can relaunch all the executors on that node to other available resources. Moreover, since the Spark scheduler can choose where to launch the tasks it can also choose the nodes that provides the most data locality to the data it is going to process. It can also choose to deploy the Spark executors in different racks to have more higher availability if it's a long running Spark streaming job. As you can see, by programming against an API allows lots of flexibility and self-managment for the Big data frameworks, and saves a lot of manually scripting and automation that needs to happen. Manage your resources among frameworks and users When there are multiple Big data frameworks sharing the same cluster, and each framework is shared with multiple users, providing a good policy around ensuring the important users and jobs gets executed becomes very important. Mesos allows you to specify roles, where multiple frameworks can belong to a role. Mesos then allows operators to specify weights among these roles, so that the fair share is enforced by Mesos to provide the resources according to the weight specified. For example, one might provide 70% resources to Spark and 30% resources to general tasks with the weighted roles in Mesos. Mesos also allows reserving a fixed amount of resources per agent to a specific role. This ensures that your important workload is guaranteed to have enough resources to complete its workload. There are more features coming to Mesos that also helps multi-tenancy. One feature is called Quota where it ensures over the whole cluster that a certain amount of resources is reserved instead of per agent. Another feature is called dynamic reservation, which allows frameworks and operators to reserve a certain amount of resources at runtime and can unreserve them once it's no longer necessary. Optimize your resources among frameworks Using Mesos also boosts your utilization, by allowing multiple tasks from different frameworks to use the same cluster and boosts utilization without having separate clusters. There are a number of features that are currently being worked on that will even boost the utilization even further. The first feature is called oversubscription, which uses the tasks runtime statistics to estimate the amount of resources that is not being used by these tasks, and offers these resources to other schedulers so more resources is actually being utilized. The oversubscription controller also monitors the tasks to make sure when the task is being affected by sharing resources, it will kill these tasks so it's no longer being affected. Another feature is called optimistic offers, which allows multiple frameworks to compete for resources. This helps utilization by allowing faster scheduling and allows the Mesos scheduler to have more inputs to choose how to best schedule its resources in the future. As you can see Mesos allows your Big data frameworks to be self-managed, more efficient and allows optimizations that are only possible by sharing the same resource management. If you're curious how to get started you can follow at the Mesos website or Mesosphere website that provides even simpler tools to use your Mesos cluster. Want more Big Data tutorials and insight? Both our Spark and Hadoop pages have got you covered. About the author Timothy Chen is a distributed systems engineer and entrepreneur. He works at Mesosphere and can be found on Github @tnachen.

0
0
10966

article-image-ai-tools-data-scientists-might-not-know

Amey Varangaonkar

22 Aug 2018

8 min read

5 artificial intelligence tools data scientists might not know

Amey Varangaonkar

22 Aug 2018

8 min read

With Artificial Intelligence going mainstream, it is not at all surprising to see the number of tools and platforms for AI development go up as well. Open source libraries such as Tensorflow, Keras and PyTorch are very popular today. Not just those - enterprise platforms such as Azure AI Platform, Google Cloud AI and Amazon Sagemaker are commonly used to build scalable production-grade AI applications. While you might be already familiar with these tools and frameworks, there are quite a few relatively unknown AI tools and services which can make your life as a data scientist much, much easier! In this article, we look at 5 such tools for AI development which you may or may not have heard of before. Wit.ai One of the most popular use-cases of Artificial Intelligence today is building bots that facilitate effective human-computer interaction. Wit.ai, a platform for building these conversational chatbots, finds applications across various platforms, including mobile apps, IoT as well as home automation. Used by over 150,000 developers across the world, this platform gives you the ability to build conversational UI that supports text categorization, classification, sentiment analysis and a whole host of other features. Why you should try this machine learning tool out There are a multitude of reasons why wit.ai is so popular among developers for creating conversational chatbots. Some of the major reasons are: Support for text as well as voice, which gives you more options and flexibility in the way you want to design your bots Support for multiple languages such as Python, Ruby and Node.js which facilitates better integration of your app with the website or the platform of your choice The documentation is very easy to follow Lots of built-in entities to ease the development of your chatbots Intel OpenVINO Toolkit Bringing together two of the most talked about technologies today, i.e. Artificial Intelligence and Edge Computing, we had to include Intel’s OpenVINO Toolkit in this list. Short for Open Visual Inference and Neural Network Optimization, this toolkit brings comprehensive computer vision and deep learning capabilities to the edge devices. It has proved to be an invaluable resource to industries looking to set up smart IoT systems for image recognition and processing using edge devices. The OpenVINO toolkit can be used with the commonly used popular frameworks such as OpenCV, Tensorflow as well as Caffe. It can be configured to leverage the power of the traditional CPUs as well as customized AI chips and FPGAs. Not just that, this toolkit also has support for the Vision Processing Unit, a processor developed specifically for machine vision. Why you should try this AI tool out Allows you to develop smart Computer Vision applications for IoT-specific use-cases Support for a large number of deep learning and image processing frameworks. Also, it can be used with the traditional CPUs as well as customized chips for AI/Computer Vision Its distributed capability allows you to develop scalable applications, which again is invaluable when deployed on edge devices You can know more about OpenVINO’s features and capabilities in our detailed coverage of the toolkit. Apache PredictionIO This one is for the machine learning engineers and data scientists looking to build large-scale machine learning solutions using the existing Big Data infrastructure. Apache PredictionIO is an open source, state-of-the-art Machine Learning server which can be easily integrated with the popular Big Data tools such as Apache Hadoop, Apache Spark and Elasticsearch to deploy smart applications. Source: PredictionIO System architecture As can be seen from the architecture diagram above, PredictionIO has modules that interact with the different components of the Big Data system and uses an App Server to communicate the results of the analysis to the outside devices. Why you should try this machine learning tool out Let’s you build production-ready models which can also be deployed as web services You can also leverage the machine learning capabilities of Apache Spark to build large-scale machine learning models Pre-built performance evaluation measures available to check the accuracy of your predictive models Most importantly, this tool helps you simplify your Big Data infrastructure without adding too many complexities IBM Snap ML A machine learning library that is 46 times faster than Tensorflow. If that’s not a reason to start using IBM’s Snap ML, what is? IBM have been taking some giant strides in the field of AI research in a bid to compete with the heavyweights in this space - mainly Google, Microsoft and Amazon. With Snap ML, they seem to have struck a goldmine. A library that can be used for high-speed machine learning models using the cutting edge CPU/GPU technology, Snap ML allows for agile development of models while scaling to process massive datasets. Why you should try this machine learning tool out It is insanely fast. Snap ML was used to train a logistic regression classifier on a terabyte-scale dataset in just under 100 seconds. It allows for GPU acceleration to avoid large data transfer overheads. With the enhanced GPU technology available today, Snap ML is one of the best tools you can have at your disposal to train models quickly and efficiently It allows for distributed model training and works on sparse data structures as well You should definitely check out our detailed coverage of Snap ML where we go into the depth of its features and understand why this is a very special tool. Crypto-ML It is common knowledge that cryptocurrency, especially Bitcoin, can be traded more efficiently and profitably by leveraging the power of machine learning. Large financial institutions and trading firms have been using the machine learning tools to great effect. However, it’s the individuals, on the other hand, who have relied on historical data and outdated techniques to forecast the trends. All that has now changed, thanks to Crypto-ML. Crypto-ML is a cryptocurrency trading platform designed specifically for individuals who want to get the most out of their investments in the most reliable, error-free ways. Using state-of-the-art deep learning techniques, Crypto-ML uses historical data to build models that predict future price movement. At the same time, it eliminates any human error or mistakes arising out of emotions. Why you should try this machine learning tool out No expertise in cryptocurrency trading is required if you want to use this tool Crypto-ML only makes use of historical data and builds data models to predict future prices without any human intervention Per the Crypto-ML website, the average gain on winning trades is close to 53%, whereas the average loss on losing trades is just close to 6%. If you are a data scientist or a machine learning developer with an interest in finance and cryptocurrency, this platform can also help you customize your own models for efficient trading. Here’s where you can read on how Crypto-ML works, in more detail. Other notable mentions Apart from the tools we mentioned above, there are a quite a few other tools that could not make it to the list, but deserve a special mention. Some of them are: ABBYY’s Real-time Recognition SDK for document recognition, language processing and data capturing is worth checking out. Vertex.ai’s PlaidML is an open source tool that allows you to build smart deep learning models across a variety of platforms. It leverages the power of Tile, a new machine learning language that facilitates tensor manipulation. Facebook recently open sourced MUSE, a Python library for efficient word embedding and other NLP tasks. This one’s worth keeping an eye on for sure! If you’re interested in browser-based machine learning, MachineLabs recently open sourced the entire code base of their machine learning platform. NVIDIA’s very own NVVL, their open source offering that provides GPU-accelerated video decoding for training deep learning models The vast ecosystem of tools and frameworks available for building smart, intelligent use-cases across various domains just points to the fact that AI is finding practical applications with every passing day. It is not an overstatement anymore to suggest that that AI is slowly becoming indispensable to businesses. This is not the end of it by any means either - expect to see more such tools spring to life in the near future, with some having game-changing, revolutionary consequences. So which tools are you planning to use for your machine learning / AI tasks? Is there any tool we missed out? Let us know! Read more Predictive Analytics with AWS: A quick look at Amazon ML Four interesting Amazon patents in 2018 that use machine learning, AR, and robotics How to earn $1m per year? Hint: Learn machine learning

0
1
10672

article-image-war-data-science-python-versus-r

Akram Hussain

30 Jun 2014

7 min read

The War on Data Science: Python versus R

Akram Hussain

30 Jun 2014

7 min read

Data science The relatively new field of data science has taken the world of big data by storm. Data science gives valuable meaning to large sets of complex and unstructured data. The focus is around concepts like data analysis and visualization. However, in the field of artificial intelligence, a valuable concept known as Machine Learning has now been adopted by organizations and is becoming a core area for many data scientists to explore and implement. In order to fully appreciate and carry out these tasks, data scientists are required to use powerful languages. R and Python currently dominate this field, but which is better and why? The power of R R offers a broad, flexible approach to data science. As a programming language, R focuses on allowing users to write algorithms and computational statistics for data analysis. R can be very rewarding to those who are comfortable using it. One of the greatest benefits R brings is its ability to integrate with other languages like C++, Java, C, and tools such as SPSS, Stata, Matlab, and so on. The rise to prominence as the most powerful language for data science was supported by R’s strong community and over 5600 packages available. However, R is very different to other languages; it’s not as easily applicable to general programming (not to say it can’t be done). R’s strength and its ability to communicate with every data analysis platform also limit its ability outside this category. Game dev, Web dev, and so on are all achievable, but there’s just no benefit of using R in these domains. As a language, R is difficult to adopt with a steep learning curve, even for those who have experience in using statistical tools like SPSS and SAS. The violent Python Python is a high level, multi-paradigm programming language. Python has emerged as one of the more promising languages of recent times thanks to its easy syntax and operability with a wide variety of different eco-systems. More interestingly, Python has caught the attention of data scientists over the years, and thanks to its object-oriented features and very powerful libraries, Python has become the go-to language for data science, many arguing it’s taken over R. However, like R, Python has its flaws too. One of the drawbacks in using Python is its speed. Python is a slow language and one of the fundamentals of data science is speed! As mentioned, Python is very good as a programming language, but it’s a bit like a jack of all trades and master of none. Unlike R, it doesn’t purely focus on data analysis but has impressive libraries to carry out such tasks. The great battle begins While comparing the two languages, we will go over four fundamental areas of data science and discuss which is better. The topics we will explore are data mining, data analysis, data visualization, and machine learning. Data mining: As mentioned, one of the key components to data science is data mining. R seems to win this battle; in the 2013 Data Miners Survey, 70% of data miners (from the 1200 who participated in the survey) use R for data mining. However, it could be argued that you wouldn’t really use Python to “mine” data but rather use the language and its libraries for data analysis and development of data models. Data analysis: R and Python boast impressive packages and libraries. Python, NumPy, Pandas, and SciPy’s libraries are very powerful for data analysis and scientific computing. R, on the other hand, is different in that it doesn’t offer just a few packages; the whole language is formed around analysis and computational statistics. An argument could be made for Python being faster than R for analysis, and it is cleaner to code sets of data. However, I noticed that Python excels at the programming side of analysis, whereas for statistical and mathematical programming R is a lot stronger thanks to its array-orientated syntax. The winner of this is debatable; for mathematical analysis, R wins. But for general analysis and programming clean statistical codes more related to machine learning, I would say Python wins. Data visualization: the “cool” part of data science. The phrase “A picture paints a thousand words” has never been truer than in this field. R boasts its GGplot2 package which allows you to write impressively concise code that produces stunning visualizations. However. Python has Matplotlib, a 2D plotting library that is equally as impressive, where you can create anything from bar charts and pie charts, to error charts and scatter plots. The overall concession of the two is that R’s GGplot2 offers a more professional feel and look to data models. Another one for R. Machine learning: it knows the things you like before you do. Machine learning is one of the hottest things to hit the world of data science. Companies such as Netflix, Amazon, and Facebook have all adopted this concept. Machine learning is about using complex algorithms and data patterns to predict user likes and dislikes. It is possible to generate recommendations based on a user’s behaviour. Python has a very impressive library, Scikit-learn, to support machine learning. It covers everything from clustering and classification to building your very own recommendation systems. However, R has a whole eco system of packages specifically created to carry out machine learning tasks. Which is better for machine learning? I would say Python’s strong libraries and OOP syntax might have the edge here. One to rule them all From the surface of both languages, they seem equally matched on the majority of data science tasks. Where they really differentiate is dependent on an individual’s needs and what they want to achieve. There is nothing stopping data scientists using both languages. One of the benefits of using R is that it is compatible with other languages and tools as R’s rich packagescan be used within a Python program using RPy (R from Python). An example of such a situation would include using the Ipython environment to carry out data analysis tasks with NumPy and SciPy, yet to visually represent the data we could decide to use the R GGplot2 package: the best of both worlds. An interesting theory that has been floating around for some time is to integrate R into Python as a data science library; the benefits of such an approach would mean data scientists have one awesome place that would provide R’s strong data analysis and statistical packages with all of Python’s OOP benefits, but whether this will happen remains to be seen. The dark horse We have explored both Python and R and discussed their individual strengths and flaws in data science. As mentioned earlier, they are the two most popular and dominant languages available in this field. However a new emerging language called Julia might challenge both in the future. Julia is a high performance language. The language is essentially trying to solve the problem of speed for large scale scientific computation. Julia is expressive and dynamic, it’s fast as C, it can be used for general programming (its focus is on scientific computing) and the language is easy and clean to use. Sounds too good to be true, right?

0
0
10670

article-image-5-ways-machine-learning-is-transforming-digital-marketing

Amey Varangaonkar

04 Jun 2018

7 min read

5 ways Machine Learning is transforming digital marketing

Amey Varangaonkar

04 Jun 2018

7 min read

The enterprise interest in Artificial Intelligence is surging. In an era of cut-throat competition where it’s either do or die, businesses have realized the transformative value of AI to gain an upper hand over their rivals. Given its direct contribution to business revenue, it comes as no surprise that marketing has become one of the major application areas of machine learning. Per Capgemini, 84% of marketing organizations are implementing Artificial Intelligence in 2018, in some capacity 3 out of the 4 organizations implementing AI techniques have managed to increase the sales of their products and services by 10% or more. In this article, we look at 5 innovative ways in which machine learning is being used to enhance digital marketing. Efficient lead generation and customer acquisition One of the major keys to drive business revenue is getting more customers on board who will buy your products or services repeatedly. Machine learning comes in handy to identify potential leads and convert those leads into customers. With the help of the pattern recognition techniques, it is possible to understand a particular lead’s behavioral and purchase trends. Through predictive analytics, it is then possible to predict if a particular lead will buy the product or not. Then, that lead is put into the marketing sales funnel to perform targeted marketing campaigns which may ultimately result into a purchase. A cautionary note here - with GDPR (General Data Protection Regulation) in place across the EU (European Union), there are restrictions in the manner AI algorithms can be used to make automated decisions based on the consumer data. This will make it imperative for the businesses to strictly follow the regulation and operate under its purview, or they could face heavy penalties. As long as businesses respect privacy and follow basic human decency such as asking for permission to use a person’s data or informing them about how their data will be used, marketers can reap the benefits of data driven marketing like never before. It all boils down to applying common sense while handling personal data, as one GDPR expert put it. But we all know how uncommon, that sense is! Customer churn prediction is now possible ‘Customer churn rate’ is a popular marketing term referring to the number of customers who opt out of a particular service offered by the company over a given time period. The churn time is calculated based on the customer’s last interaction with the service or the website. It is crucial to track the churn rate as it is a clear indicator of the progress - or the lack of it - that a business is making. Predicting the customer churn rate is difficult - especially for e-commerce businesses selling a product - but it is not impossible thanks to machine learning. By understanding the historical data and the user’s past website usage patterns, these techniques can help a business identify the customers who are most likely to churn out soon and when that is expected to happen. Appropriate measures can then be taken to retain such customers - by giving special offers and discounts, timely follow-up emails, and so on - without any human intervention. American entertainment giants Netflix make perfect use of churn prediction to keep the churn rate at just 9%, lower than any of the subscription streaming services out there today. Not just that, they also manage to market their services to drive more customer subscriptions. Dynamic pricing made easy In today’s competitive world, products need to be priced optimally. It has become imperative that companies define an extremely competitive and relevant pricing for their products, or else the customers might not buy them. On top of this, there are fluctuations in the demand and supply of the product, which can affect the product’s pricing strategy. With the use of machine learning algorithms, it is now possible to forecast the price elasticity by considering various factors such as the channel on which the product is sold. Other factors taken into consideration could be the sales period, the product’s positioning strategy or the customer demand. For example, eCommerce giants Amazon and eBay tweak their product prices on a daily basis. Their pricing algorithms take into account factors such as the product’s popularity among the customers, maximum discount that can be offered, and how often the customer has purchased from the website. This strategy of dynamic pricing is now being adopted by almost all the big retail companies even in their physical stores. There are specialized software available which are able to leverage machine learning techniques to set dynamic prices to the products. Competera is one such pricing platform which transforms retail through ongoing, timely, and error-free pricing for category revenue growth and improvements in customer loyalty tiers. To know more about how dynamic pricing actually works, check out this Competitoor article. Customer segmentation and radical personalization Every individual is different, and has unique preferences, likes and dislikes. With machine learning, marketers can segment users into different buyer groups based on a variety of factors such as their product preferences, social media activities, their Google search history and much more. For instance, there are machine learning techniques that can segment users based on who loves to blog about food, or loves to travel, or even which show they are most likely to watch on Netflix! The website can then recommend or market products to these customers accordingly. Affinio is one such platform used for segmenting customers based on their interests. Content and campaign personalization is another widely-recognized use-case of machine learning for marketing. Machine learning algorithms are used to build recommendation systems that take into consideration the user’s online behavior and website usage to analyse and recommend products that he/she is likely to buy. A prime example of this is Google’s remarketing strategy, which tries to reconnect with the customers who leave the website without buying anything by showing them relevant ads across different devices. The best part about recommendation systems is that they are able to recommend two completely different products to two customers with a different usage pattern. Incorporating them within the website has turned out to be a valuable strategy to increase the customer’s loyalty and the overall lifetime value. Improving customer experience Gone are the days when the customer who visited a website had to use the ‘Contact Me’ form in case of any query, and an executive would get back with the answer. These days, chatbots are integrated in almost every ecommerce website to answer ad-hoc customer queries, and even suggest them products that fit their criteria. There are live-chat features included in these chatbots as well, which allow the customers to interact with the chatbots and understand the product features before they buy any product. For example, IBM Watson has a really cool feature called the Tone Analyzer. It parses the feedback given by the customer and identifies the tone of the feedback - if it’s angry, resentful, disappointed, or happy. It is then possible to take appropriate measures to ensure that the disgruntled customer is satisfied, or to appreciate the customer’s positive feedback - whatever may be the case. Marketing will only get better with machine learning Highly accurate machine learning algorithms, better processing capabilities and cloud-based solutions are now making it possible for companies to get the most out of AI for their marketing needs. Many companies have already adopted machine learning to boost their marketing strategy, with major players such as Google and Facebook already leading the way. Safe to say many more companies - especially small and medium-sized businesses - are expected to follow suit in the near future. Read more How machine learning as a service is transforming cloud Microsoft Open Sources ML.NET, a cross-platform machine learning framework Active Learning : An approach to training machine learning models efficiently

0
30
10021

article-image-machine-learningweb-deeplearn-js

Savia Lobo

02 Jan 2018

4 min read

Machine Learning slings its web: Deeplearn.js is here!

Savia Lobo

02 Jan 2018

4 min read

Machine learning has been the talk of the town! With implementations in large number of organizations to carry out prediction and classification tasks. Machine learning is cutting edge in identifying data and processing it to generate meaningful insights based on predictive analytics. But to leverage machine learning, huge computational resources are required. While many may think of it as rocket science, Google has simplified machine learning access to everyone through Deeplearn.js - an initiative that allows ML to run entirely on a web browser. Deeplearn.js is an open source WebGL- accelerated JS library. This Google PAIR’s initiative (to study and redesign human interactions with ML) aims to make ML available for everyone. This implies that it will not be restricted to specific groups of people such as developers or any businesses implementing it. Deeplearn.js + browser: A perfect match? We can say browsers such as Chrome, Internet explorer, Safari, etc are an integral part of our life as it connects us with the world. Their accessibility feature is visible in PWAs’(Progressive Web Apps) wherein applications can run on browsers without the need to download them. In a similar way, machine learning can be carried out within browsers without the fuss of downloading or installing any computational resources. Wonder how? With Deeplearn.js! Deeplearn.js specifically written in Javascript, is exclusively tailored for machine learning to function on web browsers. It offers an interactive client-side platform which helps them carry out rapid prototyping and visualizations. Machine learning involves rapid computations with huge CPU requirements and is a complete mismatch for Javascript because of its speed limit. Deeplearn.js is a work-around that allows ML to be implemented using Javascript via the WebGL Javascript API. Additionally, you can use hardware accelerators such as GPUs via the webGL to perform faster and excellent computations with 2D and 3D graphics. Basic Mechanism - The structure of Deeplearn.js is a blend of Tensorflow and NumPy, which are Python-based packages for scientific computing. The NumPy acts as a quick execution model and the TensorFlow API provides a delayed execution model. Though TensorFlow is a fast and scalable framework widely used by researchers and developers. However, creating web applications on the browser with TensorFlow is difficult as it lacks runtime support to create web applications. Deeplearn.js allows TensorFlow model capabilities to be imported on the browser. By using the tools within Deeplearn.js, weights from the TensorFlow model can be exported. Opportunities for business - Traditional businesses shy away from using latest ML tools as computational resources are expensive and complicated. Also, due to the complexities in ML, there is a need to hire a technical expert. Through Deeplearn.js, firms can now easily access advanced ML tools and resources. It can not only help them solve data centric business problems but also additionally provide them with innovative strategies, increased competition and improved advantages to stay ahead of their competitors. Differentiating factor - Deeplearn.js is not the only inbrowser ML library. There are other competing frameworks such as ConvNetJS and Tensorfire, a much recent and almost identical framework to deeplearn.js. A unique feature that differentiates deeplearn.js is its capability to perform faster inference, along with full back propagation. Implementations with Deeplearn.js Performance RNN aids in generating music with expressive timing and dynamics. It has been successfully ported into the browser using the Deeplearn.js environment after being trained in TensorFlow. The training data used was the Yamaha e-Piano Competition dataset, which includes MIDI captures of ~1400 performances by skilled pianists. Teachable Machine is built using Deeplearn.js library. It allows users to teach a machine via a camera with live teaching and without any requirement to code. Faster Neural Style Transfer algorithm allows in-browser image style transfer. It transfers the style of an image into the content of another image. To explore other practical projects on Deeplearn.js, you may visit the GitHub repository here. Deeplearn.js, with the fusion of Machine learning has opened new opportunities and focus areas for businesses and non-developers. SME’s (Subject Matter Expertise) within a business can now grasp deeper insights on how to achieve desired results with Machine learning. The browser is home for many developments which are yet to be revealed in the future. Deeplearn.js truly is a milestone in bringing the web and ML a step closer. However being at the early stage, it would be exciting to see how it unfolds ML for anyone on the planet.

0
0
9847

article-image-mysteries-big-data-and-orient-db

Julian Ursell

30 Jun 2014

4 min read

The Mysteries of Big Data and the Orient … DB

Julian Ursell

30 Jun 2014

4 min read

Mapping the world of big data must be a lot like demystifying the antiquated concept of the Orient, trying to decipher a mass of unknowns. With the ever multiplying expanse of data and the natural desire of humans to simultaneously understand it—as soon as possible and in real time—technology is continually evolving to allow us to make sense of it, make connections between it, turn it into actionable insight, and act upon it physically in the real world. It’s a huge enterprise, and you’ve got to imagine with the masses of data collated years before on legacy database systems, without the capacity for the technological insight and analysis we have now, there are relationships within the data that remain undefined—the known unknowns, the unknown knowns, and the known knowns (that Rumsfeld guy was making sense you see?). It's fascinating to think what we might learn from the data we have already collected. There is a burning need these days to break down the mysteries of big data and developers out there are continually thinking of ways we can interpret it, mapping data so that it is intuitive and understandable. The major way developers have reconceptualized data in order to make sense of it is as a network connected tightly together by relationships. The obvious examples are Facebook or LinkedIn, which map out vast networks of people connected by various shared properties, such as education, location, interest, or profession. One way of mapping highly connectable data is by structuring data in the form of a graph, a design that has emerged in recent years as databases have evolved. The main progenitor of this data structure is Neo4j, which is far and away the leader in the field of graph databases, mobilized by a huge number of enterprises working with big data. Neo4j has cornered the market, and it's not hard to see why—it offers a powerful solution with heavy commercial support for enterprise deployments. In truth there aren't many alternatives out there, but alternatives exist. OrientDB is a hybrid graph document database that offers the unique flexibility of modeling data in the form of either documents, or graphs, while incorporating object-oriented programming as a way of encapsulating relationships. Again, it's a great example of developers imagining ways in which we can accommodate the myriad of different data types, and relationships that connect it all together. The real mystery of the Orient(DB) however, is the relatively low (visible) adoption of a database that offers both innovation, and reputedly staggering levels of performance (claims are that it can store up to 150,000 records a second). The question isn't just why it hasn't managed to dent a market essentially owned by Neo4j, but why, on its own merits, haven’t more developers opted for the database? The answer may in the end be vaguely related to the commercial drivers—outside of Europe it seems as if OrientDB has struggled to create the kind of traction that would push greater levels of adoption, or perhaps it is related to the considerable development and tuning of the project for use in production. Related to that, maybe OrientDB still has a way to go in terms of enterprise grade support for production. For sure it's hard to say what the deciding factor is here. In many ways it’s a simple reiteration of the level of difficulty facing startups and new technologies endeavoring to acquire adoption, and that the road to this goal is typically a long one. Regardless, what both Neo4j and OrientDB are valuable for is adapting both familiar and unfamiliar programming concepts in order to reimagine the way we represent, model, and interpret connections in data, mapping the information of the world.

0
0
9696

article-image-digital-wellbeing-trick-or-treat

Sugandha Lahoti

31 Oct 2018

2 min read

Digital wellbeing - Trick or Treat?

Sugandha Lahoti

31 Oct 2018

2 min read

Digital Wellbeing is coming into full view as Facebook, Instagram, Google's Android and Apple iOS 12 are all introducing digital wellbeing dashboards and features to their operating systems. Basically, Digital Wellbeing enables users to understand their digital habits, control the demands technology places on their attention, and focus on what actually matters. Google introduced a set of features named ‘Digital Wellbeing’ with it’s Android 9 Pie OS. The new features include a Dashboard, to monitor how long you’ve been using your phone and specific apps; App timer, to help users tap into the apps they are using and set a time limit on it for daily usage; Do Not Disturb to prevent users from hearing any kind of notification from text or emails and Wind down, which turns your screen to grayscale making the apps less tempting as your bedtime approaches. Apple went a step further than Google when it comes to parental controls. While Google's usage dashboard and limits seem primarily designed for users to limit their own behavior, Apple's will let parents remotely manage their kid's usage from their own devices. Facebook is also not far behind with a new tool dubbed, “Your Time on Facebook,” to help users manage their time spent in the Facebook app on each of the last seven days, as well as see their average time spent per day. However, there is no proven research on these features. Much of what we know is based not on peer-reviewed research but on anecdotal data. Sometimes educational apps and videos meant for young children also contain ads on topics which are irrelevant to the learning objective. These ads may potentially soil the mind of young children. There is a growing pressure from public interest groups for the FTC and other government bodies to launch an investigation against these apps and hold developers accountable for their practices. Overall, Digital Wellbeing features sound like a real step forward taken by these tech giants in making the phones less addictive. If done right, this would help users focus on what actually matters and may definitely prove to be a TREAT. But for now, we are reserving our judgement. Tech Titans, Acquisitions and Regulation – Trick or Treat? Edge computing – Trick or Treat? WebAssembly – Trick or Treat?

0
0
9674

article-image-5-things-that-matter-data-science-2018

Richard Gall

11 Dec 2017

4 min read

5 things that will matter in data science in 2018

Richard Gall

11 Dec 2017

4 min read

The world of data science is now starting to change quickly. This was arguably the year when discussions around AI and automation started to escalate, taking on more and more importance in the public sphere. But as interesting as all that is, there are nevertheless real people - like you - actually working with data not to rig elections or steal someone’s jobs but simply to make things better. Arguably, data science and analysis has never been under the spotlight to the extent it is today. Whereas a decade ago there was a whole lotta hope stored in the big data revolution, today there’s anxiety that we’re not doing enough with data, that we don’t have the right data. That makes it a challenging but important time to be working in the world of data. With that in mind, here are our 5 things that will matter in data science in 2018… Find out what 5 data science tools we think will matter most in 2018 here. 1. The ethical considerations in machine learning and artificial intelligence This is huge, but it can’t be ignored. At its heart, this is important because it highlights that there’s human agency at the heart of modern data science, that algorithms are things created and designed by the engineers behind them. But even more important than that, these ethical considerations will be important in 2018 because it will end up defining everyone’s relationship to data for decades to come. And yes, although legislative bodies may play a part in that, it’s also up to people actually working with data to contribute to that discussion about what data does, who uses it and why. That might sound like a lot of responsibility, but it makes things pretty exciting, no? 2. Greater alignment between data projects and business goals This has long been a challenge for just about every business and, indeed, anyone who works in data - architect, analyst or scientist. But as the data hype curve flattens out with more organizations taking advantage of the opportunities it offers, with budgets getting tighter and expectations higher than they have ever been, ensuring that data programs are delivering real value will be crucial in 2018. That means there will be more pressure on data pros to deliver; sharpening your commercial instincts will be essential, and could be your route to the next step in your career. 3. Automated machine learning If budgets are getting tighter and management expectations are higher than ever, the emergence of automated machine learning will be a godsend for 2018. Automated machine learning isn’t a threat to anyone’s job - it’s simply a way of making the steps of algorithm selection and optimization much faster. If you’ve ever lamented the time you’ve spent tweaking an algorithm only for it not to work as you wanted it to, only to move to a further iteration to find a similar problem, automated machine learning will automate away all those iterations. What this means is that you’ll be able to spend more time on value-adding activities that will never be automated away. And in turn this will make you a more valuable data scientist. 4. Taking advantage of cloud Cloud has been a big trend for some years now. But as a word on it’s own it’s always felt a bit abstract and amorphous. However, it’s once you start to see how it can be put into practice that you begin to see how potentially transformative it might be. In the case of machine learning, cloud becomes a vital solution in the battle for resources - it makes machine learning at scale more accessible to more people. The key tool here is Google’s cloud machine learning engine - it’s been built to make building machine learning models as straightforward as possible. When you look at this alongside automated machine learning, it’s possible to suggest that the data science skill set might change somewhat throughout 2018… 5. Better self-service BI 2018 is the year when all employees will need to be empowered by data. The idea that a specific team handles everything relating to data will end; using data will be crucial to a range of different stakeholders. This doesn’t mean the end of the data scientist - as said earlier, no one is going to be losing their jobs. But it does mean that self-service BI tools are going to take on greater importance than ever before in 2018. That means data scientists may have to start thinking more like data architects (especially if there’s no data architect in their organization), and taking into consideration how they make their work accessible and meaningful for stakeholders all around their organization.

0
0
9583

article-image-know-customer-envisaging-customer-sentiments-using-behavioral-analytics

Sugandha Lahoti

13 Nov 2017

6 min read

Know Your Customer: Envisaging customer sentiments using Behavioral Analytics

Sugandha Lahoti

13 Nov 2017

6 min read

“All the world’s a stage and the men and women are merely players.” Shakespeare may have considered men and women as mere players, but as large number of users are connected with smart devices and the online world, these men, and women—your customers—become your most important assets. Therefore, knowing your customer and envisaging their sentiments using Behavioral Analytics has become paramount. Behavioral analytics: Tracking user events Say, you order a pizza through an app on your phone. After customizing and choosing the crust size, type and ingredients, you land in the payment section. Suppose, instead of paying, you abandon the order altogether. Immediately you get an SMS and an email, alerting you that you are just a step away from buying your choice of pizza. So how does this happen? Behavior analytics runs in the background here. By tracking user navigation, it prompts the user to complete an order, or offer a suggestion. The rise of smart devices has enabled almost everything to transmit data. Most of this data is captured between sessions of user activity and is in the raw form. By user activity we mean social media interactions, amount of time spent on a site, user navigation path, click activity of a user, their responses to change in the market, purchasing history and much more. Some form of understanding is therefore required to make sense of this raw and scrambled data and generate definite patterns. Here’s where behavior analytics steps in. It goes through a user's entire e-commerce journey and focuses on understanding the what and how of their activities. Based on this, it predicts their future moves. This, in turn, helps to generate opportunities for businesses to become more customer-centric. Why Behavioral analytics over traditional analytics The previous analytical tools lacked a single architecture and simple workflow. Although they assisted with tracking clicks and page loads, they required a separate data warehouse and visualization tools. Thus, creating an unstructured workflow. Behavioral Analytics go a step beyond standard analytics by combining rule-based models with deep machine learning. Where the former tells what the users do, the latter reveals the how and why of their actions. Thus, they keep track of where customers click, which pages are viewed, how many continue down the process, who eliminates a website at what step, among other things. Unlike traditional analytics, behavioral analytics is an aggregator of data from diverse sources (websites, mobile apps, CRM, email marketing campaigns etc.) collected across various sessions. Cloud-based behavioral analytic platforms can intelligently integrate and unify all sources of digital communication into a complete picture. Thus, offering a seamless and structured view of the entire customer journey. Such behavioral analytic platforms typically capture real-time data which is in raw format. They then automatically filter and aggregate this data into a structured dataset. It also provides visualization tools to see and observe this data, all the while predicting trends. The aggregation of data is done in such a way that it allows querying this data in an unlimited number of ways for the business to utilize. So, they are helpful in analyzing retention and churn trends, trace abnormalities, perform multidimensional funnel analysis and much more. Let’s look at some specific use cases across industries where behavioral analytics is highly used. Analysing customer behavior in E-commerce E-commerce platforms are on the top of the ladder in the list of sectors, which can largely benefit by mapping their digital customer journey. Analytic strategies can track if a customer spends more time on a product page X over product page Y by displaying views and data pointers of customer activity in a structured format. This enables industries to resolve issues, which may hinder a page’s popularity, including slow loading pages, expensive products etc. By tracking user session, right from when they entered a platform to the point a sale is made, behavior analytics predicts future customer behavior and business trends. Some of the parameters considered include number of customers viewing reviews and ratings before adding an item to their cart, what similar products the customer sees, how often the items in the cart are deleted or added etc. Behavioral analytics can also identify top-performing products and help in building powerful recommendation engines. By analyzing changes in customer behavior over different demographical conditions or on the basis of regional differences.This helps achieve customer-to-customer personalization. KISSmetrics is a powerful analytics tool that provides detailed customer behavior information report for businesses to slice through and find meaningful insights. RetentionGrid provides color-coded visualizations and also provides multiple strategies tailormade for customers, based on customer segmentation and demographics. How can online gaming benefit from behavioral analysis Online gaming is a surging community with millions of daily active users. Marketers are always looking for ways to acquire customers and retain users. Monetization is another important focal point. This means not only getting more users to play but also to pay. Behavioral analytics keeps track of a user’s gaming session such as skill levels, amount of time spent at different stages, favorite features and activities within game-play, and drop-off points from the game. At an overall level, it tracks the active users, game logs, demographic data and social interaction between players over various community channels. On the basis of this data, a visualization graph is generated which can be used to drive market strategies such as identifying features that work, how to add additional players, or how to keep existing players engaged. Thus helping increase player retention and assisting game developers and marketers implement new versions based on player’s reaction. behavior analytics can also identify common characteristics of users. It helps in understanding what gets a user to play longer and in identifying the group of users most likely to pay based on common characteristics. All these help gaming companies implement right advertising and placement of content to the users. Mr Green’s casino launched a Green Gaming tool to predict a person’s playing behavior and on the basis of a gamer’s risk-taking behavior, they help generate personalized insights regarding their gaming. Nektan PLC has partnered with ‘machine learning’ customer insights firm Newlette. Newlette models analyze player behavior based on individual playing styles. They help in increasing player engagement and reduce bonus costs by providing the players with optimum offers and bonuses. The applications of behavioral analytics are not just limited to e-commerce or gaming alone. The security and surveillance domain uses behavioral analytics for conducting risk assessment of organizational resources and alerting against individual entities that are a potential threat. They do so by sifting through large amounts of company data and identifying patterns that portray irregularity or change. End-to-end monitoring of customer also helps app developers track customer adoption to new-feature development. It could also provide reports on the exact point where customers drop off and help in avoiding expensive technical issues. All these benefits highlight how customer tracking and knowing user behavior is an essential tool to drive a business forward. As Leo Burnett, the founder of a prominent advertising agency says “What helps people, helps business.”

0
1
9290

article-image-ibm-think-2018-key-takeaways-developers

Amey Varangaonkar

17 Apr 2018

5 min read

IBM Think 2018: 6 key takeaways for developers

Amey Varangaonkar

17 Apr 2018

5 min read

This year, IBM Think 2018 was hosted in Las Vegas from March 20 to 22. It was one of the most anticipated IBM events in 2018, with over 40,000 developers as well as technology and business leaders in attendance. Considered IBM’s flagship conference, Think 2018 combined previous conferences such as IBM InterConnect and World of Watson. IBM Think 2018: Key Takeaways IBM Watson Studio announced - A platform where data professionals in different roles can come together and build end-to-end Artificial Intelligence workflows Integration of IBM Watson with Apple's Core ML, for incorporating custom machine learning models into iOS apps IBM Blockchain platform announced, for Blockchain developers to build enterprise-grade decentralized applications Deep Learning as a Service announced as a part of the Watson Studio, allowing you to train deep learning models more efficiently Fabric for Deep Learning open-sourced, so that you can use the open source deep learning framework to train your models and then integrate them with the Watson Studio Neural Network Modeler announced for Watson Studio, a GUI tool to design neural networks efficiently, without a lot of manual coding IBM Watson Assistant announced, an AI-powered digital assistant, for automotive vehicles and hospitality Here are some of the announcements and key takeaways which have excited us, as well as the developers all around the world! IBM Watson Studio announced One of the biggest announcements of the event was the IBM Watson Studio - a premier tool that brings together data scientists, developers and data engineers to collaborate, build and deploy end-to-end data workflows. Right from accessing your data source to deploying accurate and high performance models, this platform does it all. It is just what enterprises need today to leverage Artificial Intelligence in order to accelerate research, and get intuitive insights from their data. IBM Watson Studio's Lead Product Manager, Armand Ruiz, gives a sneak-peek into what we can expect from Watson Studio. Collaboration with Apple Core ML IBM took their relationship with Apple to another level by announcing their collaboration to develop smarter iOS applications. IBM Watson’s Visual Recognition Service can be used to train custom Core ML machine learning models, which can be directly used by iOS apps. The latest announcement at IBM Think 2018 comes as no surprise to us, considering IBM had released new developer tools for enterprise development using the Swift language. IBM Watson Assistant announced IBM Think 2018 also announced the evolution of Watson Conversation to Watson Assistant, introducing new features and capabilities to deliver a more engaging and personalized customer experience. With this, IBM plans to take the concept of AI assistants for businesses on to a new level. Currently in the beta program, there are 2 domain-specific solutions available for use on top of Watson Assistant - namely Watson Assistant for Automotive and Watson Assistant for Hospitality. IBM Blockchain Platform Per Juniper Research, more than half of the world’s big corporations are considering adoption of or are already in the process of adopting Blockchain technology. This presents a serious opportunity for a developer centric platform that can be used to build custom decentralized networks. IBM, unsurprisingly, has identified this opportunity and come up with a Blockchain development platform of their own - the IBM Blockchain Platform. Recently launched as a beta, this platform offers a pay-as-you-use option for Blockchain developers to develop their own enterprise-grade Blockchain solutions without any hassle. Deep Learning as a Service Training a deep learning model is quite tricky, as it requires you to design the right kind of neural networks along with having the right hyperparameters. This is a significant pain point for the data scientists and machine learning engineers. To tackle this problem, IBM announced the release of Deep Learning as a Service as part of the Watson Studio. It includes the Neural Network Modeler (explained in detail below) to simplify the process of designing and training neural networks. Alternatively, using this service, you can leverage popular deep learning libraries and frameworks such as PyTorch, Tensorflow, Caffe, Keras to train your neural networks manually. In the process, IBM also open sourced the core functionalities of Deep Learning as a Service as a separate project - namely Fabric for Deep Learning. This allows models to be trained using different open source frameworks on Kubernetes containers, and also make use of the GPUs’ processing power. These models can then eventually be integrated to the Watson Studio. Accelerating deep learning with the Neural Network Modeler In a bid to reduce the complexities and the manual work that go into designing and training neural networks, IBM introduced a beta release of the Neural Network Modeler within the Watson Studio. This new feature allows you to design and model standardized neural network models without going into a lot of technical details, thanks to its intuitive GUI. With this announcement, IBM aims to accelerate the overall process of deep learning, so that the data scientists and machine learning developers can focus on the thinking more than operational side of things. At Think 2018, we also saw the IBM Research team present their annual ‘5 in 5’ predictions. This session highlighted the 5 key innovations that are currently in research, and are expected to change our lives in the near future. With these announcements, it’s quite clear that IBM are well in sync with the two hottest trends in the tech space today - namely Artificial Intelligence and Blockchain. They seem to be taking every possible step to ensure they’re right up there as the preferred choice of tool for data scientists and machine learning developers. We only expect the aforementioned services to get better and have more mainstream adoption with time, as most of these services are currently in the beta stage. Not just that, there’s scope for more improvements and addition of newer functionalities as they develop these platforms. What did you think of these announcements by IBM? Do let us know!

0
0
9187

Erol Staveley

18 Jan 2016

7 min read

Data Science Is the New Alchemy

Erol Staveley

18 Jan 2016

7 min read

Every day I come into work and sit opposite Greg. Greg (in my humble opinion) is a complete badass. He directly turns information that we’ve had hanging around for years and years into actual currency. Single handedly, he generates more direct revenue than any one individual in the business. When we were shuffling seating positions not too long ago (we now have room for that standing desk I’ve always wanted ❤), we were afraid to turn off his machine in fear of losing thousands upon thousands of dollars. I remember somebody saying “guys, we can’t unplug Skynet”. Nobody fully knows how it works. Nobody except Greg. We joked that by turning off his equipment, we’d ruin Greg's on-the-side Bitcoin mining gig that he was probably running off the back of the company network. We then all looked at one another in a brief moment of silence. We were all thinking the same thing — it wouldn’t surprise any of us if Greg was actually doing this. We wouldn’t know any better. To many, what Greg does is like modern day alchemy. In reality, Greg is a data scientist — an increasingly crucial role that helps businesses deliver more meaningful, relevant interactions with their customers. I like to think of them more as new-age alchemists, who wield keyboards instead of perfectly choreographed vials and alembics. This week - find out how to become a data alchemist with R. Save 50% on some of our top titles... or pick up any 5 for $50! Find them all here! Content might have been king a few years back. Now, it’s data. Everybody wants more — and the people who can actually make sense of it all. By surveying 20,000 developers, we found out just how valuable these roles are to businesses of all shapes and sizes. Let’s take a look. Every Kingdom Needs an Alchemist Even within quite a technical business, Greg’s work lends a fresh perspective on what it is other developers want from our content. Putting the value of direct revenue generation to one side, the insight we’ve derived from purchasing patterns and user behaviour is incredibly valuable. We’re constantly challenging our own assumptions, and spending more time looking at what our customers are actually doing. We’re not alone in taking this increasingly data-driven approach. In general, the highest data science salaries are paid by large enterprises. This isn’t too surprising considering that’s where the real troves of precious data reside. At such scale, the aggregation and management of data alone can warrant the recruitment of specialised teams. On average though, SMEs are not too far behind when it comes to how much they’re willing to pay for top talent. Average salary by company size. Apache Spark was a particularly important focus going forward for folks in the Enterprise segment. What’s clear is that data science isn’t just for big businesses any more. It’s for everybody. We can see that in the growth of data-related roles for SMEs. We’re paying more attention to data because it represents the actions of our customers, but also because we’ve just got more of it lying around all over the place. Irrespective of company size, the range of industries we captured (and classified) was colossal. Seems like everybody needs an alchemist these days. They Double as Snake Charmers When supply is low and demand is high in a particular job market, we almost always see people move to fill the gap. It’s a key driver of learning. After all, if you’re trying to move to a new role, you’re likely to be developing new skills. It’s no surprise that Python is the go-to choice for data science. It’s an approachable language with some great introductory resources out there on the market like Python for Secret Agents. It also has a fantastic ecosystem of data science libraries and documentation that can help you get up and running quite quickly. Percentage of respondents who said they used a given technology. When looking at roles in more detail, you see strong patterns between technologies used. For example, those using Python were most likely to also be using R. When you dive deeper into the data you start to notice a lot of crossover between certain segments. It was at this point where we were able to also start seeing the relationships between certain technologies in specific segments. For example, the Financial sector was more likely to use R, and also paid (on average) higher salaries to those who had a more diverse technical background. Alchemists Have Many Forms Back at a higher level, what was really interesting is the natural technology groupings that started to emerge between four very distinct ‘types’ of data alchemist. “What are they?”, I hear you ask. The Visualizers Those who bring data to life. They turn what otherwise would be a spreadsheet or a copy-and-paste pie chart into delightful infographics and informative dashboards. Welcome to the realm of D3.js and Tableau. The Wranglers The SME all-stars. They aggregate, clean and process data with Python whilst leveraging the functionality of libraries like pandas to their full potential. A jack of all trades, master of all. The Builders Those who use Hadoop and other OS tools to deploy and maintain large-scale data projects. They keep the world running by building robust, scalable data platforms. The Architects Those who harness the might of the enterprise toolchain. They co-ordinate large scale Oracle and Microsoft deployments, the sheer scale of which would break the minds of mere mortals. Download the Full Report With 20,000 developers taking part overall, our most recent data science survey contains plenty of juicy information about real-world skills, salaries and trends. Packtpub.com In a Land of Data, the Alchemist is King We used to have our reports delivered in Excel. Now we have them as notebooks on Jupyter. If it really is a golden age for developers, data scientists must be having a hard time keeping their inbox clear of all the recruitment spam. What’s really interesting going forward is that the volume of information we have to deal with is only going to increase. Once IoT really kicks off and wearables become more commonly accepted (the sooner the better if you’re Apple), businesses of all sizes will find dealing with data overload to be a key growing pain — regardless of industry. Plenty of web services and platforms are already popping up, promising to deliver ‘actionable insight’ to everybody who can spare the monthly fees. This is fine for standardised reporting and metrics like bounce rate and conversion, but not so helpful if you’re working with a product that’s unique to you. Greg’s work doesn’t just tell us how we can improve our SEO. It shows us how we can make our products better without having to worry about internal confirmation bias. It helps us better serve our customers. That’s why present-day alchemists like Greg, are heroes.

0
0
9070

Tech Guides - Data

4 Ways You Can Use Machine Learning for Enterprise Security

Are distributed networks and decentralized systems the same?

What is transfer learning?

TensorFire: Firing up Deep Neural Nets in your browsers

Level Up Your Company's Big Data with Mesos

5 artificial intelligence tools data scientists might not know

The War on Data Science: Python versus R

5 ways Machine Learning is transforming digital marketing

Machine Learning slings its web: Deeplearn.js is here!

The Mysteries of Big Data and the Orient … DB

Trending Topics

Digital wellbeing - Trick or Treat?

5 things that will matter in data science in 2018

Know Your Customer: Envisaging customer sentiments using Behavioral Analytics

IBM Think 2018: 6 key takeaways for developers

Data Science Is the New Alchemy

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access