Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-what-is-pytorch-and-how-does-it-work

18 Sep 2018

7 min read

What is PyTorch and how does it work?

18 Sep 2018

PyTorch is a Python-based scientific computing package that uses the power of graphics processing units. It is also one of the preferred deep learning research platforms built to provide maximum flexibility and speed. It is known for providing two of the most high-level features; namely, tensor computations with strong GPU acceleration support and building deep neural networks on a tape-based autograd systems. There are many existing Python libraries which have the potential to change how deep learning and artificial intelligence are performed, and this is one such library. One of the key reasons behind PyTorch’s success is it is completely Pythonic and one can build neural network models effortlessly. It is still a young player when compared to its other competitors, however, it is gaining momentum fast. A brief history of PyTorch Since its release in January 2016, many researchers have continued to increasingly adopt PyTorch. It has quickly become a go-to library because of its ease in building extremely complex neural networks. It is giving a tough competition to TensorFlow especially when used for research work. However, there is still some time before it is adopted by the masses due to its still “new” and “under construction” tags. PyTorch creators envisioned this library to be highly imperative which can allow them to run all the numerical computations quickly. This is an ideal methodology which fits perfectly with the Python programming style. It has allowed deep learning scientists, machine learning developers, and neural network debuggers to run and test part of the code in real time. Thus they don’t have to wait for the entire code to be executed to check whether it works or not. You can always use your favorite Python packages such as NumPy, SciPy, and Cython to extend PyTorch functionalities and services when required. Now you might ask, why PyTorch? What’ so special in using it to build deep learning models? The answer is quite simple, PyTorch is a dynamic library (very flexible and you can use as per your requirements and changes) which is currently adopted by many of the researchers, students, and artificial intelligence developers. In the recent Kaggle competition, PyTorch library was used by nearly all of the top 10 finishers. Some of the key highlights of PyTorch includes: Simple Interface: It offers easy to use API, thus it is very simple to operate and run like Python. Pythonic in nature: This library, being Pythonic, smoothly integrates with the Python data science stack. Thus it can leverage all the services and functionalities offered by the Python environment. Computational graphs: In addition to this, PyTorch provides an excellent platform which offers dynamic computational graphs, thus you can change them during runtime. This is highly useful when you have no idea how much memory will be required for creating a neural network model. PyTorch Community PyTorch community is growing in numbers on a daily basis. In the just short year and a half, it has shown some great amount of developments that have led to its citations in many research papers and groups. More and more people are bringing PyTorch within their artificial intelligence research labs to provide quality driven deep learning models. The interesting fact is, PyTorch is still in early-release beta, but the way everyone is adopting this deep learning framework at a brisk pace shows its real potential and power in the community. Even though it is in the beta release, there are 741 contributors on the official GitHub repository working on enhancing and providing improvements to the existing PyTorch functionalities. PyTorch doesn’t limit to specific applications because of its flexibility and modular design. It has seen heavy use by leading tech giants such as Facebook, Twitter, NVIDIA, Uber and more in multiple research domains such as NLP, machine translation, image recognition, neural networks, and other key areas. Why use PyTorch in research? Anyone who is working in the field of deep learning and artificial intelligence has likely worked with TensorFlow before, Google’s most popular open source library. However, the latest deep learning framework - PyTorch solves major problems in terms of research work. Arguably PyTorch is TensorFlow’s biggest competitor to date, and it is currently a much favored deep learning and artificial intelligence library in the research community. Dynamic Computational graphs It avoids static graphs that are used in frameworks such as TensorFlow, thus allowing the developers and researchers to change how the network behaves on the fly. The early adopters are preferring PyTorch because it is more intuitive to learn when compared to TensorFlow. Different back-end support PyTorch uses different backends for CPU, GPU and for various functional features rather than using a single back-end. It uses tensor backend TH for CPU and THC for GPU. While neural network backends such as THNN and THCUNN for CPU and GPU respectively. Using separate backends makes it very easy to deploy PyTorch on constrained systems. Imperative style PyTorch library is specially designed to be intuitive and easy to use. When you execute a line of code, it gets executed thus allowing you to perform real-time tracking of how your neural network models are built. Because of its excellent imperative architecture and fast and lean approach it has increased overall PyTorch adoption in the community. Highly extensible PyTorch is deeply integrated with the C++ code, and it shares some C++ backend with the deep learning framework, Torch. Thus allowing users to program in C/C++ by using an extension API based on cFFI for Python and compiled for CPU for GPU operation. This feature has extended the PyTorch usage for new and experimental use cases thus making them a preferable choice for research use. Python-Approach PyTorch is a native Python package by design. Its functionalities are built as Python classes, hence all its code can seamlessly integrate with Python packages and modules. Similar to NumPy, this Python-based library enables GPU-accelerated tensor computations plus provides rich options of APIs for neural network applications. PyTorch provides a complete end-to-end research framework which comes with the most common building blocks for carrying out everyday deep learning research. It allows chaining of high-level neural network modules because it supports Keras-like API in its torch.nn package. PyTorch 1.0: The path from research to production We have been discussing all the strengths PyTorch offers, and how these make it a go-to library for research work. However, one of the biggest downsides is, it has been its poor production support. But this is expected to change soon. PyTorch 1.0 is expected to be a major release which will overcome the challenges developers face in production. This new iteration of the framework will merge Python-based PyTorch with Caffe2 allowing machine learning developers and deep learning researchers to move from research to production in a hassle-free way without the need to deal with any migration challenges. The new version 1.0 will unify research and production capabilities in one framework thus providing the required flexibility and performance optimization for research and production. This new version promises to handle tasks one has to deal with while running the deep learning models efficiently on a massive scale. Along with the production support, PyTorch 1.0 will have more usability and optimization improvements. With PyTorch 1.0, your existing code will continue to work as-is, there won’t be any changes to the existing API. If you want to stay updated with all the progress to PyTorch library, you can visit the Pull Requests page. The beta release of this long-awaited version is expected later this year. Major vendors like Microsoft and Amazon are expected to provide complete support to the framework across their cloud products. Summing up, PyTorch is a compelling player in the field of deep learning and artificial intelligence libraries, exploiting its unique niche of being a research-first library. It overcomes all the challenges and provides the necessary performance to get the job done. If you’re a mathematician, researcher, student who is inclined to learn how deep learning is performed, PyTorch is an excellent choice as your first deep learning framework to learn. Read more Can a production-ready Pytorch 1.0 give TensorFlow a tough time? A new geometric deep learning extension library for Pytorch releases! Top 5 tools for reinforcement learning

0
0
102423

article-image-5-key-reinforcement-learning-principles-explained-by-ai-expert

Packt Editorial Staff

10 Dec 2019

10 min read

5 key reinforcement learning principles explained by AI expert, Hadelin de Ponteves

Packt Editorial Staff

10 Dec 2019

10 min read

0
0
100402

article-image-write-python-code-or-pythonic-code

Aaron Lazar

08 Aug 2018

5 min read

Do you write Python Code or Pythonic Code?

Aaron Lazar

08 Aug 2018

5 min read

If you’re new to Programming, and Python in particular, you might have heard the term Pythonic being brought up at tech conferences, meetups and even at your own office. You might have also wondered why the term and whether they’re just talking about writing Python code. Here we’re going to understand what the term Pythonic means and why you should be interested in learning how to not just write Python code, rather write Pythonic code. What does Pythonic mean? When people talk about pythonic code, they mean that the code uses Python idioms well, that it’s natural or displays fluency in the language. In other words, it means the most widely adopted idioms that are adopted by the Python community. If someone said you are writing un-pythonic code, they might actually mean that you are attempting to write Java/C++ code in Python, disregarding the Python idioms and performing a rough transcription rather than an idiomatic translation from the other language. Okay, now that you have a theoretical idea of what Pythonic (and unpythonic) means, let’s have a look at some Pythonic code in practice. Writing Pythonic Code Before we get into some examples, you might be wondering if there’s a defined way/method of writing Pythonic code. Well, there is, and it’s called PEP 8. It’s the official style guide for Python. Example #1 x=[1, 2, 3, 4, 5, 6] result = [] for idx in range(len(x)); result.append(x[idx] * 2) result Output: [2, 4, 6, 8, 10, 12] Consider the above code, where you’re trying to multiply some elements, “x” by 2. So, what we did here was, we created an empty list to store the results. We would then append the solution of the computation into the result. The result now contains a function which is 2 multiplied by each of the elements. Now, if you were to write the same code in a Pythonic way, you might want to simply use list comprehensions. Here’s how: x=[1, 2, 3, 4, 5, 6] [(element * 2) for element in x] Output: [2, 4, 6, 8, 10] You might have noticed, we skipped the entire for loop! Example #2 Let’s make the previous example a bit more complex, and place a condition that the elements should be multiplied by 2 only if they are even. x=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] result = [] for idx in range(len(x)); if x[idx] % 2 == 0; result.append(x[idx] * 2) else; result.append(x[idx]) result Output: [1, 4, 3, 8, 5, 12, 7, 16, 9, 20] We’ve actually created an if else statement to solve this problem, but there is a simpler way of doing things the Pythonic way. [(element * 2 if element % 2 == 0 else element) for element in x] Output: [1, 4, 3, 8, 5, 12, 7, 16, 9, 20] If you notice what we’ve done here, apart from skipping multiple lines of code, is that we used the if-else statement in the same sentence. Now, if you wanted to perform filtering, you could do this: x=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] [element * 2 for element in x if element % 2 == 0] Output: [4, 8, 12, 16, 20] What we’ve done here is put the if statement after the for declaration, and Voila! We’ve achieved filtering. If you’re using a nice IDE like Jupyter Notebooks or PyCharm, they will help you format your code as per the PEP 8 suggestions. Why should you write Pythonic code? Well firstly, you’re saving loads of time writing humongous piles of cowdung code, so you’re obviously becoming a smarter and more productive programmer. Python is a pretty slow language, and when you’re trying to do something in Python, which is acquired from another language like Java or C++, you’re going to worsen things. With idiomatic, Pythonic code, you’re improving the speed of your programs. Moreover, idiomatic code is far easier to comprehend and understand for other developers who are working on the same code. It helps a great deal when you’re trying to refactor someone else’s code. Fearing Pythonic idioms Well, I don’t mean the idioms themselves are scary. Rather, quite a few developers and organisations have begun discriminating on the basis of whether someone can or cannot write Pythonic code. This is wrong, because, at the end of the day, though the PEP 8 exists, the idea of the term Pythonic is different for different people. To some it might mean picking up a new style guide and improving the way you code. To others, it might mean being succinct and not repeating themselves. It’s time we stopped judging people on whether they can or can’t write Pythonic code and instead, we should appreciate when someone is able to present readable, easily maintainable and succinct code. If you find them writing a bit of clumsy code, you can choose to talk to them about improving their design considerations. And the world will be a better place! If you’re interested in learning how to write more succinct and concise Python code, check out these resources: Learning Python Design Patterns - Second Edition Python Design Patterns [Video] Python Tips, Tricks and Techniques [Video]

0
2
73524

article-image-building-a-scalable-postgresql-solution

Natasha Mathur

14 Apr 2019

12 min read

Building a scalable PostgreSQL solution

Natasha Mathur

14 Apr 2019

12 min read

The term Scalability means the ability of a software system to grow as the business using it grows. PostgreSQL provides some features that help you to build a scalable solution but, strictly speaking, PostgreSQL itself is not scalable. It can effectively utilize the following resources of a single machine: It uses multiple CPU cores to execute a single query faster with the parallel query feature When configured properly, it can use all available memory for caching The size of the database is not limited; PostgreSQL can utilize multiple hard disks when multiple tablespaces are created; with partitioning, the hard disks could be accessed simultaneously, which makes data processing faster However, when it comes to spreading a database solution to multiple machines, it can be quite problematic because a standard PostgreSQL server can only run on a single machine. In this article, we will look at different scaling scenarios and their implementation in PostgreSQL. The requirement for a system to be scalable means that a system that supports a business now, should also be able to support the same business with the same quality of service as it grows. This article is an excerpt taken from the book 'Learning PostgreSQL 11 - Third Edition' written by Andrey Volkov and Salahadin Juba. The book explores the concepts of relational databases and their core principles. You’ll get to grips with using data warehousing in analytical solutions and reports and scaling the database for high availability and performance. Let's say a database can store 1 GB of data and effectively process 100 queries per second. What if with the development of the business, the amount of data being processed grows 100 times? Will it be able to support 10,000 queries per second and process 100 GB of data? Maybe not now, and not in the same installation. However, a scalable solution should be ready to be expanded to be able to handle the load as soon as it is needed. In scenarios where it is required to achieve better performance, it is quite common to set up more servers that would handle additional load and copy the same data to them from a master server. In scenarios where high availability is required, this is also a typical solution to continuously copy the data to a standby server so that it could take over in case the master server crashes. Scalable PostgreSQL solution Replication can be used in many scaling scenarios. Its primary purpose is to create and maintain a backup database in case of system failure. This is especially true for physical replication. However, replication can also be used to improve the performance of a solution based on PostgreSQL. Sometimes, third-party tools can be used to implement complex scaling scenarios. Scaling for heavy querying Imagine there's a system that's supposed to handle a lot of read requests. For example, there could be an application that implements an HTTP API endpoint that supports the auto-completion functionality on a website. Each time a user enters a character in a web form, the system searches in the database for objects whose name starts with the string the user has entered. The number of queries can be very big because of the large number of users, and also because several requests are processed for every user session. To handle large numbers of requests, the database should be able to utilize multiple CPU cores. In case the number of simultaneous requests is really large, the number of cores required to process them can be greater than a single machine could have. The same applies to a system that is supposed to handle multiple heavy queries at the same time. You don't need a lot of queries, but when the queries themselves are big, using as many CPUs as possible would offer a performance benefit—especially when parallel query execution is used. In such scenarios, where one database cannot handle the load, it's possible to set up multiple databases, set up replication from one master database to all of them, making each them work as a hot standby, and then let the application query different databases for different requests. The application itself can be smart and query a different database each time, but that would require a special implementation of the data-access component of the application, which could look as follows: Another option is to use a tool called Pgpool-II, which can work as a load-balancer in front of several PostgreSQL databases. The tool exposes a SQL interface, and applications can connect there as if it were a real PostgreSQL server. Then Pgpool-II will redirect the queries to the databases that are executing the fewest queries at that moment; in other words, it will perform load-balancing: Yet another option is to scale the application together with the databases so that one instance of the application will connect to one instance of the database. In that case, the users of the application should connect to one of the many instances. This can be achieved with HTTP load-balancing: Data sharding When the problem is not the number of concurrent queries, but the size of the database and the speed of a single query, a different approach can be implemented. The data can be separated into several servers, which will be queried in parallel, and then the result of the queries will be consolidated outside of those databases. This is called data sharding. PostgreSQL provides a way to implement sharding based on table partitioning, where partitions are located on different servers and another one, the master server, uses them as foreign tables. When performing a query on a parent table defined on the master server, depending on the WHERE clause and the definitions of the partitions, PostgreSQL can recognize which partitions contain the data that is requested and would query only these partitions. Depending on the query, sometimes joins, grouping and aggregation could be performed on the remote servers. PostgreSQL can query different partitions in parallel, which will effectively utilize the resources of several machines. Having all this, it's possible to build a solution when applications would connect to a single database that would physically execute their queries on different database servers depending on the data that is being queried. It's also possible to build sharding algorithms into the applications that use PostgreSQL. In short, applications would be expected to know what data is located in which database, write it only there, and read it only from there. This would add a lot of complexity to the applications. Another option is to use one of the PostgreSQL-based sharding solutions available on the market or open source solutions. They have their own pros and cons, but the common problem is that they are based on previous releases of PostgreSQL and don't use the most recent features (sometimes providing their own features instead). One of the most popular sharding solutions is Postgres-XL, which implements a shared-nothing architecture using multiple servers running PostgreSQL. The system has several components: Multiple data nodes: Store the data A single global transaction monitor (GTM): Manages the cluster, provides global transaction consistency Multiple coordinator nodes: Supports user connections, builds query-execution plans, and interacts with the GTM and the data nodes Postgres-XL implements the same API as PostgreSQL, therefore the applications don't need to treat the server in any special way. It is ACID-compliant, meaning it supports transactions and integrity constraints. The COPY command is also supported. The main benefits of using Postgres-XL are as follows: It can scale to support more reading operations by adding more data nodes It can scale for to support more writing operations by adding more coordinator nodes The current release of Postgres-XL (at the time of writing) is based on PostgreSQL 10, which is relatively new The main downside of Postgres-XL is that it does not provide any high-availability features out of the box. When more servers are added to a cluster, the probability of the failure of any of them increases. That's why you should take care with backups or implement replication of the data nodes themselves. Postgres-XL is open source, but commercial support is available. Another solution worth mentioning is Greenplum. It's positioned as an implementation of a massive parallel-processing database, specifically designed for data warehouses. It has the following components: Master node: Manages user connections, builds query execution plans, manages transactions Data nodes: Store the data and perform queries Greenplum also implements the PostgreSQL API, and applications can connect to a Greenplum database without any changes. It supports transactions, but support for integrity constraints is limited. The COPY command is supported. The main benefits of Greenplum are as follows: It can scale to support more reading operations by adding more data nodes. It supports column-oriented table organization, which can be useful for data-warehousing solutions. Data compression is supported. High-availability features are supported out of the box. It's possible (and recommended) to add a secondary master that would take over in case a primary master crashes. It's also possible to add mirrors to the data nodes to prevent data loss. The drawbacks are as follows: It doesn't scale to support more writing operations. Everything goes through the single master node and adding more data nodes does not make writing faster. However, it's possible to import data from files directly on the data nodes. It uses PostgreSQL 8.4 in its core. Greenplum has a lot of improvements and new features added to the base PostgreSQL code, but it's still based on a very old release; however, the system is being actively developed. Greenplum doesn't support foreign keys, and support for unique constraints is limited. There are commercial and open source editions of Greenplum. Scaling for many numbers of connections Yet another use case related to scalability is when the number of database connections is great. However, when a single database is used in an environment with a lot of microservices and each has its own connection pool, even if they don't perform too many queries, it's possible that hundreds or even thousands of connections are opened in the database. Each connection consumes server resources and just the requirement to handle a great number of connections can already be a problem, without even performing any queries. If applications don't use connection pooling and open connections only when they need to query the database and close them afterwards, another problem could occur. Establishing a database connection takes time—not too much, but when the number of operations is great, the total overhead will be significant. There is a tool, named PgBouncer, that implements a connection-pool functionality. It can accept connections from many applications as if it were a PostgreSQL server and then open a limited number of connections towards the database. It would reuse the same database connections for multiple applications' connections. The process of establishing a connection from an application to PgBouncer is much faster than connecting to a real database because PgBouncer doesn't need to initialize a database backend process for the session. PgBouncer can create multiple connection pools that work in one of the three modes: Session mode: A connection to a PostgreSQL server is used for the lifetime of a client connection to PgBouncer. Such a setup could be used to speed up the connection process on the application side. This is the default mode. Transaction mode: A connection to PostgreSQL is used for a single transaction that a client performs. That could be used to reduce the number of connections at the PostgreSQL side when only a few translations are performed simultaneously. Statement mode: A database connection is used for a single statement. Then it is returned to the pool and a different connection is used for the next statement. This mode is similar to the transaction mode, though more aggressive. Note that multi-statement transactions are not possible when statement mode is used. Different pools can be set up to work in different modes. It's possible to let PgBouncer connect to multiple PostgreSQL servers, thus working as a reverse proxy. The way PgBouncer could be used is represented in the following diagram: PgBouncer establishes several connections to the database. When an application connects to PgBouncer and starts a transaction, PgBouncer assigns an existing database connection to that application, forwards all SQL commands to the database, and delivers the results back. When the transaction is finished, PgBouncer will dissociate the connections, but not close them. If another application starts a transaction, the same database connection could be used. Such a setup requires configuring PgBouncer to work in transaction mode. PostgreSQL provides several ways to implement replication that would maintain a copy of the data from a database on another server or servers. This can be used as a backup or a standby solution that would take over in case the main server crashes. Replication can also be used to improve the performance of a software system by making it possible to distribute the load on several database servers. In this article, we discussed the problem of building scalable solutions based on PostgreSQL utilizing the resources of several servers. We looked at scaling for querying, data sharding, as well as scaling for many numbers of connections. If you enjoyed reading this article and want to explore other topics, be sure to check out the book 'Learning PostgreSQL 11 - Third Edition'. Handling backup and recovery in PostgreSQL 10 [Tutorial] Understanding SQL Server recovery models to effectively backup and restore your database Saving backups on cloud services with ElasticSearch plugins

0
0
70342

article-image-what-are-generative-adversarial-networks-gans-and-how-do-they-work

Richard Gall

11 Sep 2018

3 min read

What are generative adversarial networks (GANs) and how do they work? [Video]

Richard Gall

11 Sep 2018

3 min read

Generative adversarial networks, or GANs, are a powerful type of neural network used for unsupervised machine learning. Made up of two competing models which run in competition with one another, GANs are able to capture and copy variations within a dataset. They’re great for image manipulation and generation, but they can also be deployed for tasks like understanding risk and recovery in healthcare and pharmacology. GANs are actually pretty new - they were first introduced by Ian Goodfellow in 2014. Goodfellow developed them to tackle some of the issues with similar neural networks, including the Boltzmann machine and autoencoders. Both the Boltzmann machine and autoencoders use the Markov Decision Chain which has a pretty high computational cost. This efficiency gives engineers significant gains - which you need if you’re working at the cutting edge of artificial intelligence. How do Generative Adversarial Networks work? Let's start with a simple analogy. You have a painting - say the Mona Lisa - and we have a master forger who wants to create a duplicate painting. The forger does this by learning how the original painter - Leonardo Da Vinci - produced the painting. Meanwhile, you have an investigator trying to capture the forger and ‘second guess’ the rules the forger is learning. To map this onto the architecture of a GAN, the forger is the generator network, which learns the distribution of classes while the investigator is the discriminator network, which learning the boundaries between those classes - the formal ‘shape’ of the dataset. Applications of GANs Generative adversarial networks are used for a number of different applications. One of the best examples is a Google Brain project back in 2016 - researchers used GANs to develop a method of encryption. This project used 3 neural networks - Alice, Bob, and Eve. Alice’s job was to send an encrypted message to Bob. Bob’s job was to decode that message, while Eve’s job was to intercept it. To begin with Alice’s messages were easily intercepted by Eve. However, thanks to Eve’s adversarial work, Alice began to develop its own encryption strategy - it took 15,000 runs for Alice to successfully encrypt a message that could be deciphered by Bob that Eve couldn’t intercept. Elsewhere, GANs are also being used in fields such as drug research. The neural networks can be trained on the existing drugs and suggest new synthetic chemical structures that improve on drugs that already exist. Generative adversarial networks: the cutting edge of artificial intelligence As we’ve seen, GANs offer some really exciting opportunities in artificial intelligence. There are two key advantages you need to remember: GANs solve the problem of generating data when you don’t have enough to begin with and they require no human supervision. This is crucial when you think about the cutting edge of artificial intelligence, both in terms of the efficiency of running the models, and the real-world data we want to use - which could be poor quality or have privacy and confidentiality issues, as much healthcare data does.

0
0
59896

article-image-how-will-ai-impact-job-roles-in-cybersecurity

Melisha Dsouza

25 Sep 2018

7 min read

How will AI impact job roles in Cybersecurity

Melisha Dsouza

25 Sep 2018

7 min read

"If you want a job for the next few years, work in technology. If you want a job for life, work in cybersecurity." -Aaron Levie, chief executive of cloud storage vendor Box The field of cybersecurity will soon face some dire, but somewhat conflicting, views on the availability of qualified cybersecurity professionals over the next four or five years. Global Information Security Workforce Study from the Center for Cyber Safety and Education, predicts a shortfall of 1.8 million cybersecurity workers by 2022. The cybersecurity workforce gap will hit 1.8 million by 2022. On the flipside, Cybersecurity Jobs Report, created by the editors of Cybersecurity Ventures highlight that there will be 3.5 million cybersecurity job openings by 2021. Cybercrime will feature more than triple the number of job openings over the next 5 years. Living in the midst of a digital revolution caused by AI- we can safely say that AI will be the solution to the dilemma of “what will become of human jobs in cybersecurity?”. Tech enthusiasts believe that we will see a new generation of robots that can work alongside humans and complement or, maybe replace, them in ways not envisioned previously. AI will not make jobs easier to accomplish, but also bring about new job roles for the masses. Let’s find out how. Will AI destroy or create jobs in Cybersecurity? AI-driven systems have started to replace humans in numerous industries. However, that doesn’t appear to be the case in cybersecurity. While automation can sometimes reduce operational errors and make it easier to scale tasks, using AI to spot cyberattacks isn’t completely practical because such systems yield a large number of false positives. It lacks the contextual awareness which can lead to attacks being wrongly identified or missed completely. As anyone who’s ever tried to automate something knows, automated machines aren’t great at dealing with exceptions that fall outside of the parameters to which they have been programmed. Eventually, human expertise is needed to analyze potential risks or breaches and make critical decisions. It’s also worth noting that completely relying on artificial intelligence to manage security only leads to more vulnerabilities - attacks could, for example, exploit the machine element in automation. Automation can support cybersecurity professionals - but shouldn’t replace them Supported by the right tools, humans can do more. They can focus on critical tasks where an automated machine or algorithm is inappropriate. In the context of cybersecurity, artificial intelligence can do much of the 'legwork' at scale in processing and analyzing data, to help inform human decision making. Ultimately, this isn’t a zero-sum game - humans and AI can work hand in hand to great effects. AI2 Take, for instance, the project led by the experts at the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Lab. AI2 (Artificial Intelligence + Analyst Intuition) is a system that combines the capabilities of AI with the intelligence of human analysts to create an adaptive cybersecurity solution that improves over time. The system uses the PatternEx machine learning platform, and combs through data looking for meaningful, predefined patterns. For instance, a sudden spike in postback events on a webpage might indicate an attempt at staging a SQL injection attack. The top results are then presented to a human analyst, who will separate any false positives and flags legitimate threats. The information is then fed into a virtual analyst that uses human input to learn and improve the system’s detection rates. On future iterations, a more refined dataset is presented to the human analyst, who goes through results and once again "teaches" the system to make better decisions. AI2 is a perfect example that shows man and machine can complement each other’s strengths to create something even more effective. It’s worth remembering that in any company that uses AI for cybersecurity, automated tools and techniques require significant algorithm training, data markup. New cybersecurity job roles and the evolution of the job market The bottom line of this discussion is that- AI will not destroy cybersecurity jobs, but it will drastically change them. The primary focus of many cybersecurity jobs can be going through the hundreds of security tools available, determining what tools and techniques are most appropriate for their organization’s needs. Of course, as systems move to the cloud, these decisions will already be made because cloud providers will offer in-built security solutions. This means that the number of companies that will need a full staff of cybersecurity experts will be drastically reduced. Instead, companies will need more individuals that understand issues like the potential business impact and risk of different projects and architectural decisions. This demands a very different set of skills and knowledge compared to the typical current cybersecurity role - it is less directly technical and will require more integration with other key business decision makers. AI can provide assistance, but it can’t offer easy answers. Humans and AI working together Companies concerned with cybersecurity legal compliance and effective real-world solutions should note that cybersecurity and information technology professionals are best-suited for tasks such as risk analysis, policy formulation and cyber attack response. Human intervention is can help AI systems to learn and evolve. Take the example of Spain-based antivirus company called Panda Security, that had a number of people reverse-engineering malicious code and writing signatures. In today's times, to keep pace with overflowing amounts of data, the company would need hundreds of thousands of engineers to deal with malicious code. Enter AI and only a small team of engineers are required-- to look at more than 200,000 new malware samples per day. Is AI going to steal cybersecurity engineers their jobs? So what about former employees that used to perform this job? Have they been laid off? The answer is a straight No! But they will need to upgrade their skill set. In the world of cybersecurity, AI is going to create new jobs, as it throws up new problems to be analyzed and solved. It’s going to create what are being called "new collar" jobs - this is something that IBM’s hiring strategy has already taken into account. Once graduates enter the IBM workforce, AI enters the equation to help them get a fast start. Even the Junior analysts can have the ability to investigate a new malware infecting mobile phones of employees. AI would quickly research the new malware impacting the phones, and identify the characteristics reported by others and to provide a recommended course of action. This would relieve analysts from the manual work of going through reams of data and lines of code - in theory, it should make their job more interesting and more fun. Artificial intelligence and the human workforce, then, aren’t in conflict when it comes to cybersecurity. Instead, they can complement each other to create new job opportunities that will test the skills of the upcoming generation, and lead experienced professionals into new and maybe more interesting directions. It will be interesting to see how cybersecurity workforce makes use of AI in the future. Intelligent Edge Analytics: 7 ways machine learning is driving edge computing adoption in 2018 15 millions jobs in Britain at stake with AI robots set to replace humans at workforce 5 ways artificial intelligence is upgrading software engineering

0
0
59760

article-image-what-are-the-challenges-of-adopting-ai-powered-tools-in-sales-how-salesforce-can-help

Guest Contributor

24 Aug 2019

8 min read

What are the challenges of adopting AI-powered tools in Sales? How Salesforce can help

Guest Contributor

24 Aug 2019

8 min read

Artificial intelligence is a hot topic for many industries. When it comes to sales, the situation gets complicated. According to the latest Salesforce State of Sales report, just 21% of organizations use AI in sales today, while its adoption in sales is expected to grow 155% by 2020. Let’s explore what keeps sales teams from implementing AI and how to overcome these challenges to unlock new opportunities. Why do so few teams adopt AI in Sales There are a few reasons behind such a low rate of AI application in sales. First, some teams don’t feel they are prepared to integrate AI into their existing strategies. Second, AI technologies are often applied in a hectic way: many businesses have high expectations of AI and concentrate mostly on its benefits rather than contemplating possible difficulties upfront. Such an approach rarely results in positive business transformation. Here are some common challenges that businesses need to overcome to turn their sales AI projects into success stories. Businesses don’t know how to apply AI in their workflow Problem: Different industries call for different uses of AI. Still, companies tend to buy AI platforms to use them for the same few popular tasks, like predictions based on historical data or automatic data logging. In reality, the business type and direction should dictate what AI solution will best fit the needs of an organization. For example, in e-commerce, AI can serve dynamic product recommendations on the basis of the customer’s previous purchases or views. Teams relying on email marketing can use AI to serve personalized email content as well as optimize send times. Solution: Let a sales team participate in AI onboarding. Prior to setup, gain insight into your sales reps’ daily routine, needs, and pains. Then, get their feedback continuously during the actual AI implementation. Such a strategy will ensure the sales team benefits from a tailored, rather than a generic, AI system. AI requires data businesses don’t have Problem: AI is most efficient when fed with huge amounts of data. It’s true, a company with a few hundred leads per week will train AI for better predictions than the company with the same amount of leads per month. Frequently, companies assume they don’t have so much data or they cannot present it in a suitable format to train an AI algorithm. Solution: In reality, AI can be trained with incomplete and imperfect data. Instead of trying to integrate the whole set of data prior to implementing AI, it’s possible to use it with data subsets, like historical purchase data or promotional campaign analytics. Plus, AI can improve the quality of data by predicting missing elements or identifying possible errors. Businesses lack skills to manage AI platforms Problem: AI is a sophisticated algorithm that requires special skills to implement and use it. Thus, sales teams need to be augmented with specialized knowledge in data management, software optimization, and integration. Otherwise, AI tools can be used incorrectly and thus provide little value. Solution: There are two ways of solving this problem. First, it’s possible to create a new team of big data, machine learning, and analytics experts to run AI implementation and coordinate it with the sales team. This option is rather time-consuming. Second, it’s possible to buy an AI-driven platform, like Salesforce, for example, that includes both out-of-the-box features as well as plenty of customization opportunities. Instead of hiring new specialists to manage the platform, you can reach out to Salesforce consultants who will help you select the best-fit plan, configure, and implement it. If your requirements go beyond the features available by default, then it’s possible to add custom functionality. How AI can change the sales of tomorrow When you have a clear vision of the AI implementation challenges and understand how to overcome them, it’s time to make use of AI-provided benefits. A core benefit of any AI system is its ability to analyze large amounts of data across multiple platforms and then connect the dots, i.e. draw actionable conclusions. To illustrate these AI opportunities, let’s take Salesforce, one of the most popular solutions in this domain today, and see how its AI technology, Einstein, can enhance a sales workflow. Time-saving and productivity boost Administrative work eats up sales reps’ time that they can spend selling. That’s why many administrative tasks should be automated. Salesforce Einstein can save time usually wasted on manual data entry by: Automating contact creation and update Activity logging Generating lead status reports Syncing emails and calendars Scheduling meetings Efficient lead management When it comes to leads, sales reps tend to base their lead management strategies on gut feeling. In spite of its importance, intuition cannot be the only means of assessing leads. The approach should be more holistic. AI has unmatched abilities to analyze large amounts of information from different sources to help score and prioritize leads. In combination with sales reps’ intuition, such data can bring lead management to a new level. For example, Einstein AI can help with: Scoring leads based on historical data and performance metrics of the best customers Classifying opportunities in terms of their readiness to convert Tracking reengaged opportunities and nurturing them Predictive forecasting AI is well-known for its predictive capabilities that help sales teams make smarter decisions without running endless what-if scenarios. AI forecasting builds sales models using historical data. Such models anticipate possible outcomes of multiple scenarios common in sales reps’ work. Salesforce Einstein, for example, can give the following predictions: Prospects most likely to convert Deals most likely to close Prospects or deals to target New leads Opportunities to upsell or cross-sell The same algorithm can be used for forecasting sales team performance during a specified period of time and taking proactive steps based on those predictions. What’s more, sales intelligence is shifting from predictive to prescriptive, where prescriptive AI does not recommend but prescribes exact actions to be taken by sales reps to achieve a particular outcome. Watching out for pitfalls of AI in sales While AI promises to fulfil sales reps’ advanced requests, there are still some fears and doubts around it. First of all, as a rising technology, AI still carries ethical issues related to its safe and legitimate use in the workplace, such as those of the integrity of autonomous AI-driven decisions and legitimate origin of data fed to algorithms. While the full-fledged legal framework is yet to be worked out, governments have already stepped in. For example, the High-Level Expert Group on AI of the European Commission came up with the Ethics Guidelines for Trustworthy Artificial Intelligence covering every aspect from human oversight and technical robustness to data privacy and non-discrimination. In particular, non-discrimination relates to potential bias,, such as algorithmic bias that comes from human bias when sourcing data, and the one where correlation does not equal causation. Thus, AI-driven analysis should be incorporated in decision-making cautiously as just one of the many sources of insights. AI won’t replace a human mind⁠—the data still needs to be processed critically. When it comes to sales, another common concern is that AI will take sales reps’ jobs. Yes, some tasks that are deemed monotonous and time-consuming are indeed taken over by AI automation. However, it is actually a blessing as AI does not replace jobs but augments them. This way, sales reps can have more time on their hands to complete more creative and critical tasks. It's true, however, that employers would need people who know how to work with AI technologies. It means either ongoing training or new hires, which can be rather costly. The stakes are high, though. To keep up with the fast-changing world, one has to bargain their way to success, finding one’s way around current limitations and challenges. In a nutshell AI is key to boosting sales team performance. However, successful AI integration into sales and marketing strategies requires teams to overcome challenges posed by sophisticated AI technologies. Such popular AI-driven platforms like Salesforce help sales reps get hold of the AI potential as well as enjoy vast opportunities for saving time and increasing productivity. Author Bio Valerie Nechay is MarTech and CX Observer at Iflexion, a Denver-based custom software development provider. Using her writing powers, she's translating complex technologies into fascinating topics and shares them with the world. Now her focus is on Salesforce implementation how-tos, challenges, insights, and shortcuts, as well as broader applications of enterprise tech for business development. IBM halt sales of Watson AI tool for drug discovery amid tepid growth: STAT report. Salesforce Einstein team open sources TransmogrifAI, their automated machine learning library How to create sales analysis app in Qlik Sense using DAR method [Tutorial]

0
0
58315

article-image-how-everyone-at-netflix-uses-jupyter-notebooks-from-data-scientists-machine-learning-engineers-to-data-analysts

Bhagyashree R

18 Aug 2018

4 min read

How everyone at Netflix uses Jupyter notebooks from data scientists, machine learning engineers, to data analysts

Bhagyashree R

18 Aug 2018

4 min read

Netflix uses a variety of tools to do data analysis. One of the big ways that data scientists and engineers at Netflix interact with their data is through Jupyter notebooks. In addition to providing execution environments to users, Netflix invests in various parts of the Jupyter ecosystem and tooling. They are “reimagining what a notebook can be, who can use it, and what they can do with it.” Netflix aims to provide personalized content to their 130 million viewers. For this every day more than 1 trillion events are written into a streaming ingestion pipeline. To support this, they’ve built an industry-leading data platform which is flexible, powerful, and complex. There are so many diverse users of this platform, such as analytics engineers, data engineers, and data scientists, requiring different sets of tools and languages. To help the platform scale, they wanted to minimize the number of tools and the solution to this was the open-source tool: Jupyter notebooks. Why Jupyter notebook is so compelling for Netflix? These are the functionalities provided by notebook that benefits Netflix’s data scientists and engineers: Standard messaging API: The Jupyter protocol provides a standard messaging API with the kernels that act as computational engines. It separates where the content is written and where the content is executed. This makes it language agnostic. Editable file format: It provides an editable file format that stores the code and results together. Web-based UI: It is web-based which helps interactively writing and running code as well as visualizing outputs. How Netflix uses Jupyter Notebooks? The following are some of the use cases they use Jupyter notebooks for: Data access: Notebooks were first introduced for workflows and their adoption grew among the data scientists. Seeing this, Netflix decided to leverage its versatility and architecture for general data access. Notebooks provide an user-friendly interface for interactively running code, exploring the outputs, and visualizing data all from a single cloud-based development environment. Notebook Templates: They introduced parameterized notebooks, which allow the use of parameters in the code and take values as input at runtime. These templates help: Data scientists to run an experiment with different coefficients and summarize the results Data engineers to execute data quality audits Data analysts to share prepared queries and visualizations Software engineers to email the results of a troubleshooting script Scheduling notebooks: Next they are using notebooks for creating a unifying layer for scheduling workflows. Notebooks are used for interactive work and allows smooth move to scheduling that work to run recurrently. Many users create an entire workflow in a notebook and just copy/paste it into separate files for scheduling when they’re ready to deploy it. Notebook infrastructure: The three fundamental components of the infrastructure are: storage, compute, and interface. Source: Netflix Tech Blog Storage: The Netflix Data Platform is made of Amazon S3 and EFS for cloud storage, which notebooks treat as virtual filesystems. Each user has a home directory on EFS containing a personal workspace for notebooks. This workspace is for storing any notebook created or uploaded by a user. When a user launches a notebook interactively, all the reading and writing happens at the workspace. Compute: All the jobs on the data platform run on containers including queries, pipelines and notebooks. A container with reasonable default resources is provisioned when a user launches a notebook. Users can request more resources if they find that the provided resources are not enough. A unified execution environment with a prepared container image is provided, which has common libraries and an array of default kernels preinstalled. The orchestration and environments are managed with Titus, their container management platform. Interface: They are using nteract, a React-based frontend for Jupyter notebooks, which emphasizes simplicity and composability as core design principles.They’re also introducing native support for parameterization, which makes it easier to schedule notebooks and create reusable templates. Netflix is planning to make investments in both the frontend and backend to improve the overall notebook experience. This year they are also sponsoring JupyterCon. To read more about how Jupyter is offering value to Netflix read Netflix’s original post at Medium. 10 reasons why data scientists love Jupyter notebooks What’s new in Jupyter Notebook 5.3.0 Netflix open sources Zuul 2 cloud gateway

0
0
55124

article-image-python-tensorflow-excel-and-more-data-professionals-reveal-their-top-tools

Amey Varangaonkar

06 Jun 2018

4 min read

Python, Tensorflow, Excel and more - Data professionals reveal their top tools

Amey Varangaonkar

06 Jun 2018

4 min read

Data professionals are constantly on the lookout for the best tools to simplify their data science tasks - be it data acquisition, machine learning, or visualizing the results of the analysis. With so much on their plate already, having robust, efficient tools in the arsenal helps them a lot in reducing the procedural complexities. Not just that, the time taken to do these tasks is considerably reduced as well. But what tools do data professionals rely on to make their lives easier? Thanks to the Skill-up 2018 survey that we recently conducted, we have some interesting observations to share with you! Read the Skill Up report in full. Sign up to our weekly newsletter and download the PDF for free. Key Takeaways Python is the most widely used programming language by data professionals Python finds a wide adoption across all spectrums of data science - including data analysis, machine learning, deep learning and data visualization Excel continues to be favored by the data professionals because of its effectiveness and simplicity R is slowly falling behind Python in the race to Data Science supremacy Now, let’s look at these observations, in more depth. Python continues its ascension as the top dog Python’s rise in popularity as well as adoption over the last 3 years has been quite staggering, to say the least. Python’s ease of use, powerful analytical and machine learning capabilities as well as its applications outside of data science make it quite a popular language in the tech community. It thus comes as no surprise that it stood out from the others and was the undisputed choice of language for the data pros. R, on the other hand, seems to be finding it difficult to play catch-up to Python, with less than half the number of votes - despite being the tool of choice for many statisticians and researchers. Is the paradigm shift well and truly on? Is Python edging R out for good? Source: Packt Skill-Up Survey 2018 It is interesting to see SQL as the number 2, but considering the number of people working with databases these days it doesn’t come as a surprise. Also, JavaScript is preferred more than Java, indicating the rising need for web-based dashboards for effective Business Intelligence. Data professionals still love Excel, but Python libraries are taking over Microsoft Excel has traditionally been a highly popular tool for data analysis, especially when dealing with data with hundreds and thousands of records. Excel’s perfect setting for data manipulation and charting continues to be the reason why people still use it for basic-level data analysis, as indicated by our survey. Almost 53% of the respondents prefer having Excel in their analysis toolkit for their day to day tasks. Top libraries, tools and frameworks used by data professionals (Source: Packt Skill-Up Survey 2018) The survey also indicated Python’s rising dominance in the data science domain, with 8 out of the 10 most-used tools for data analysis being Python-based. Python’s offerings for data wrangling, scientific computing, machine learning and deep learning make its libraries the obvious choice for data professionals. Here’s a quick look at 15 useful Python libraries to make the above-mentioned data science tasks easier. Tensorflow and PyTorch are in demand AI’s popularity is soaring with every passing day as it finds applications across all types of industries and business domains. In our survey, we found machine learning and deep learning to be two of the most valuable skills to have for any data scientist, as can be seen from the word cloud below: Word cloud for the most valued skills by data professionals (Source: Packt Skill-Up Survey) Python’s two popular deep learning frameworks - Tensorflow and PyTorch have thus gained a lot of attention and adoption in the recent times. Along with Keras - another Python library - these two libraries are the most used frameworks used by data scientists and ML developers for building efficient machine learning and deep learning models. Which language/libraries do you use for your everyday Data Science tasks? Do you agree with your peers’ choice of tools? Feel free to let us know! Read more Data cleaning is the worst part of data analysis, say data scientists 30 common data science terms explained Top 10 deep learning frameworks

0
0
52340

article-image-is-youtubes-ai-algorithm-evil

Amarabha Banerjee

30 Sep 2018

6 min read

Is YouTube's AI Algorithm evil?

Amarabha Banerjee

30 Sep 2018

6 min read

YouTube is at the center of content creation, content distribution, and advertising activities for some time now. The impact of YouTube can be estimated from the 1.8 billion YouTube users worldwide. While the YouTube video hosting concept has been a great success story for content creators, the video viewing and recommendation model has been in the middle of a brewing controversy lately. The Controversy Logan Paul was already a top rated YouTube star when he stumbled across a hanging dead body in a Japanese forest which is famous as a suicide spot. After the initial shock and awe, Logan Paul seemed quite amused and commented “Dude, his hands are purple,” then he turned to his friends and giggled. “You ever stand next to a dead guy?”. This particular instance was a shocking moment for YouTubers all across the globe. Disapproving reactions had poured in and the video was taken down 24 hours later by YouTube. In those 24 hours, the video managed to garner 6 million views. Even after the furious backlash, users complained that they were still seeing recommendations of Logan Paul’s videos. That brought the emphasis back on the recommendation system that YouTube uses. YouTube Video Recommendation Back in 2005, when YouTube first started out, it had a uniform homepage for all users. This meant that every YouTube user would see the same homepage and the creators who would feature there, would get a huge boost in their viewership. Their selection was based on their subscriber count, views and user engagement metrics e.g. likes, comments, shares etc. This inspired other users to become creators and start contributing content to become a part of the YouTube family. In 2006, YouTube was bought by Google. Their policies and homepage started evolving gradually. As ads started showing on YouTube videos, the scenario changed quite quickly. Also, with the rapid rise in the number of users, Google had thought it to be a good idea to curate the homepage as per each user’s watch history, subscriptions, and likes. This was a good move in principle since it helped the users to see what they wanted to see. As a part of their next level innovation, a machine learning model was created to suggest or recommend videos to users. The goal of this deep neural network based recommendation engine was to increase watch time of every video so that users stay longer on the platform. What did it change and How When Youtube’s machine learning algorithm shows a few videos in your feed as “Recommended for you”, it predicts what you want to see from your watch history and watch history of similar users. If you interact with any of these videos and watch it for a certain amount of time, the recommendation engine considers it as a success and starts curating a list based on your interactions with its suggested videos. The more data it gathers about your choices and watch history, the more confident it becomes of its own video decisions. The major goal of Youtube’s recommendation engine is to attract your attention and get you hooked to the platform to get more watch time. More watch time means more revenue and more scope for targeted ads. What this changes, is the fundamental concept of choice and the exercising of user discretion. The moment the YouTube Algorithm considers watch time as the most important metric to recommend videos to you, less importance goes into the organic interactions on YouTube, which includes liking, commenting and subscribing to videos and channels. Users get to see video recommendations based on the YouTube Algorithm’s user understanding and its goal of maximizing watch time, with less importance given to user choices. Distorted Reality and YouTube This attention maximizing model is the fundamental working mechanism of mostly all social media networks. But YouTube has not been implicated in the accusation of distorting reality and spreading the fake news as much as Facebook has been in mainstream media. But times are changing and so are the viewpoints related to YouTube’s influence on the global population and its ability to manipulate important public opinion. Guillaume Chaslot, a 36-year-old French computer programmer with a Ph.D. in artificial intelligence, was one of those engineers who was in the core team to develop and perfect the YouTube algorithm. In his own words “YouTube is something that looks like reality, but it is distorted to make you spend more time online. The recommendation algorithm is not optimizing for what is truthful, or balanced, or healthy for democracy.” Chaslot explains that the algorithm never stays the same. It is constantly changing the weight it gives to different signals; the viewing patterns of a user, for example, or the length of time a video is watched before someone clicks away.” Chaslot was fired by Google in 2013 over performance issues. His claim was that he wanted to bring about a change in the approach of the YouTube algorithm to make it more aligned with democratic values instead of being devoted to just increasing the watch time. Where are we headed I am not qualified or righteous enough to answer the direct question - is YouTube good or bad. YouTube creates opportunities for millions of creators worldwide to showcase their talent and present it to a global audience without worrying about country or boundaries. This itself is a huge power for an internet application. But the crucial point to remember here is whether YouTube is using this power to just make the users glued to the screen. Do they really care if you are seeing divisive content or prejudiced flat earther conspiracies as recommended videos? The algorithm can be tweaked to include parameters which will remove unintended bias such as whether a video is propagating fake news or influencing voters minds in an unlawful way. But that is near impossible as machines lack morality or empathy or even common sense. To incorporate humane values such as honesty and morality into an AI system is like creating an AI that is more human than a machine. This is why machine augmented human intelligence will play a more and more crucial role in the near future. The possibilities are endless, be it good or bad. Whether we progress or digress, might not be in our hands anymore. But what might be in our hands is to come together to put effective checkpoints to identify and course correct scenarios where algorithms rule wild. Sex robots, artificial intelligence, and ethics: How desire shapes and is shaped by algorithms Like newspapers, Google algorithms are protected by the First amendment California replaces cash bail with algorithms

0
0
52162

article-image-dl-frameworks-tensorflow-vs-cntk

Aaron Lazar

30 Oct 2017

6 min read

The Deep Learning Framework Showdown: TensorFlow vs CNTK

Aaron Lazar

30 Oct 2017

6 min read

The question several Deep Learning engineers may ask themselves is: Which is better, TensorFlow or CNTK? Well, we're going to answer that question for you, taking you through a closely fought match between the two most exciting frameworks. So, here we are, ladies and gentlemen, it's fight night and it's a full house. In the Red corner, weighing in at two hundred and seventy pounds of Python and topping out at over ten thousand frames per second; managed by the American tech giant, Google; we have the mighty, the beefy, TensorFlow! In the Blue corner, weighing in at two hundred and thirty pounds of C++ muscle, we have, one of the top toolkits that can comfortably scale beyond a single machine. Managed by none other than Microsoft, it's fast, it's furious, it's CNTK aka the Microsoft Cognitive Toolkit! And we're into Round One… TensorFlow and CNTK are looking quite menacingly at each other and are raging to take down their opponents. TensorFlow seems pleased that its compile times are considerably faster than its successor, Theano. Although, it looks like happiness came a tad bit soon. CNTK, light and bouncy on it's feet, comes straight out of nowhere with a whopping seventy thousand frames/second upper cut, knocking TensorFlow to the floor. TensorFlow looks like it's in no mood to give up anytime soon. It makes itself so simple to use and understand that even students can pick it up and start training their own models. This isn't the case with CNTK, as it begs to shed its complexity. On the other hand, CNTK seems to be thrashing TensorFlow in terms of 3D convolution, where CNTK can clearly recognize images from streaming content. TensorFlow also tries its best to run LSTM RNNs, but in vain. The crowd keeps cheering on… Wait a minute...are they calling out for TensorFlow? Yes they are! There's hardly any cheering for CNTK. This is embarrassing! Looks like its community support can't match up to TensorFlow's. And ladies and gentlemen, that does make a difference - we can see TensorFlow improving on several fronts and gradually getting back in the game! TensorFlow huffs and puffs as it tries to prove that it's not just about deep learning and that it has tools in the pocket that can support other algorithms such as reinforcement learning. It conveniently whips out the TensorBoard, and drops CNTK to the floor with its beautiful visualizations. TensorFlow now has the upper hand and is trying hard to pin CNTK to the floor and tries to use its R support to finish it off. But CNTK tactfully breaks loose and leaves TensorFlow on the floor - still not ready to be used in production. And there goes the bell for Round One! Both fighters look exhausted but you can see a faint twinkle in TensorFlow's eye, primarily because it survived Round One. Google seems to be working hard to prep it for Round Two and is making several improvements in terms of speed, flexibility and majorly making it ready for production. Meanwhile, Microsoft boosts CNTK's spirits with a shot of Python APIs in its blood. As it moves towards reaching version 2.0, there are a lot of improvements to CNTK, wherein, Microsoft has ensured that it's not left behind, like having a backend for Keras, which puts it on par with TensorFlow. Moreover, there seem to be quite a few experimental features that it looks ready to enter the ring with, like the Java API for example. It's the final round and boy, are these two into a serious stare-down! The referee waves them in and off they are. CNTK needs to get back at TensorFlow. Comfortably supporting multiple GPUs and CPUs out of the box, across both the Microsoft and Linux platforms, it has an advantage over TensorFlow. Is it going to use that trump card? Yes it is! A thousand GPUs and a hundred machines in, and CNTK is raining blows on TensorFlow. TensorFlow clearly drops the ball when it comes to multiple machines, and it rather complicates things. It's high time that TensorFlow turned the tables. Lo and behold! It shows off its mobile deep learning capabilities with TensorFlow Lite, clearly flipping CNTK flat on its back. This is revolutionary and a tremendous breakthrough for TensorFlow! CNTK, however, is clearly the people's choice when it comes to language compatibility. With support for C++, Python, C#/.NET and now Java, it's clearly winning in this area. Round Two is coming to an end, ladies and gentlemen and it's a neck to neck battle out there. We're not sure the judges are going to be able to choose a clear winner, from the looks of it. And…. there goes the bell! While the scores are being tallied, we go over to the teams and some spectators for some gossip on the what's what of deep learning. Did you know having multiple machine support is a huge advantage? It increases speed and efficiency by almost 10 times! That's something! We also got to know that TensorFlow is training hard and is picking up positives from its rival, CNTK. There are also rumors about a new kid called MXNet (read about it here), that has APIs in R, Python and even in Julia! This makes it one helluva framework in terms of flexibility and speed. In fact, AWS is already implementing it while Apple also is rumored to be using it. Clearly, something to watch out for. And finally, the judges have made their decision. Ladies and gentlemen, after two rounds of sheer entertainment, we have the results... TensorFlow CNTK Processing speed 0 1 Learning curve 1 0 Production readiness 0 1 Community support 1 0 CPU, GPU computation support 0 1 Mobile deep learning 1 0 Multiple language compatibility 0 1 It's a unanimous decision and just as we thought, CNTK is the heavyweight champion! CNTK clearly beat TensorFlow in terms of performance, because of its flexibility, speed and ability to use in production! As a Deep Learning engineer, should you be wanting to use one of these frameworks in your tasks, you should check out their features thoroughly, test them out with a test dataset and then implement them to your actual data. After all, it's the choices we make that define a win or a loss - simplicity over resource utilisation, or speed over platform, we must choose our tools wisely. For more information on the kind of tests that both the tools have been put through, read the Research Paper presented by Shaohuai Shi, Qiang Wang, Pengfei Xu and Xiaowen Chu from the Department of Computer Science, Hong Kong Baptist University and these benchmarks.

0
1
50232

article-image-what-is-aiops-why-going-to-be-important

Aaron Lazar

19 Apr 2018

4 min read

What is AIOps and why is it going to be important?

Aaron Lazar

19 Apr 2018

4 min read

Woah, woah, woah! Wait a minute! First there was that game SpecOps that I usually sucked at, then there came ITOps and DevOps that took the world by storm, now there’s another something-Ops?? Well, believe it or not, there is, and they’re calling it AIOps. What does AIOps stand for? AIOps basically means Artificial Intelligence for IT Operations. It means IT operations are enhanced by using analytics and machine learning to analyze the data that’s collected from various IT operations tools and devices. This helps in spotting and reacting to issues in real time. Coined by Gartner, the term has grown in popularity over the past year. Gartner believes that AIOps will be a major transformation for ITOps professionals mainly due to the fact that traditional IT operations cannot cope with the modern digital transformation. Why is AIOps important? With the massive and rapid shift towards cloud adoption, automation and continuous improvement, AIOps is here to take care of the new entrants into the digital ecosystem - Machine agents, artificial intelligence, IoT devices, etc. These new entrants are impossible to service and maintain by humans and with billions of devices connected together, the only way forward is to employ algorithms that tackle known problems. Some of the solutions it provides are maintaining high availability and monitoring performance, event correlation and analysis, automation and IT service management. How does AIOps work? As depicted in Gartner’s diagram, there are two primary components to AIOps. Big Data Machine Learning Data is gathered from the enterprise. You then implement a comprehensive analytics and machine learning strategy alongside the combined IT data (monitoring data + job logs + tickets + incident logs). The processed data yields continuous insights, continuous improvements and fixes. It bridges three different IT disciplines to accomplish its goals: Service management Performance management, and Automation To put it simply, it is a strategic focus. It argues for a new approach in a world where big data and machine learning have changed everything. How to move from ITOps to AIOps Machine Learning Most of AIOps will involve supervised learning and professionals will need a good understanding of the underlying algorithms. Now don’t get me wrong, they don’t need to be full blown data scientists to build the system, but just having sufficient knowledge to be able to train the system to pick up anomalies. Auditing these systems to ensure they’re performing the tasks as per the initial vision is necessary and this will go hand in hand with scripting them. Understanding modern application technologies With the rise of Agile software development and other modern methodologies, AIOps professionals are expected to know all about microservices, APIs, CI/CD, containers, etc. With the giant leaps that cloud development is taking, it is expected to gain visibility into cloud deployments, with an emphasis on cost and performance. Security Security is critical, for example, it’s important for personnel to understand how to engage a denial of service attack or maybe a ransomware attack, like the ones we’ve seen in the recent past. Training machines to detect/predict such events is pertinent to AIOps. The key tools in AIOps There are a wide variety of AIOps platforms available in the market that bring AI and Intelligence to IT Operations. One of the most noteworthy ones is Splunk, which has recently incorporated AI for intelligence driven operations. Another one is the Moogsoft AIOps platform, that is quite similar to Splunk. BMC also has entered the fray, launching TrueSight 11, their AIOps platform that promises to address use cases to improve performance and capacity management, the service desk, and application development disciplines. Gartner has a handy list of top platforms. If you’re planning the transition from ITOps, do check out the list. Companies like Frankfurt Cargo Services and Revtrak have already added the AI to their Ops. So, are you going to make the transition? According to Gartner, 40% of large enterprises would have made the transition to AIOps by 2022. If you’re one of them, I recommend you do it for the right reasons, but don’t do it overnight. The transition needs to be gradual and well planned. The first thing you need to do is getting your enterprise data together. If you don’t have sufficient data that’s worthy of analysis, AIOps isn’t going to help you much. Read more: Bridging the gap between data science and DevOps with DataOps.

0
0
49058

article-image-5-examples-of-artificial-intelligence-in-web-apps

Sugandha Lahoti

20 Aug 2018

7 min read

5 examples of Artificial Intelligence in Web apps

Sugandha Lahoti

20 Aug 2018

7 min read

Modern day web app development is increasingly focused on building a customer-facing front-end presence with the use of Artificial Intelligence. Web apps, use Artificial Intelligence not just for intelligent automation, but also for building recommendation engines, website implementation, and image recognition, among other application areas. In this post, we look at five key areas, illustrated by real-world examples, where web apps are employing Artificial intelligence to automate some part of their system. Recommendation Engines of Amazon and Netflix Curating content based on the user’s context is one of the most widely used AI features in web apps. Amazon, for instance, uses item-based collaborative filtering for product classification. Amazon’s recommendation system uses a combination of goods-based recommendation (users are recommended for those similar to what they liked in the past) and buddy-based recommendation (users are recommended things which their Facebook friends like.) Not just for their recommendation system, Amazon has been using AI for multiple tasks. Their AI Management Strategy is called The Flywheel, where one part of Amazon acts as a catalyst for AI and machine learning growth in other areas. Read more: Four interesting Amazon patents in 2018 that use machine learning, AR, and robotics Another popular example is Netflix, who revamped their recommendation algorithm based on visual impressions. One of their research projects indicated that the artwork was not only the biggest influencer to a viewer's decision to watch content, but it also drew over 82% of their focus while browsing Netflix. This made them develop a new image recommendation algorithm which works in real time to project the image it thinks the user will respond to. They use implicit (user behavior) and Explicit data (user activity) and then feed this data to machine learning algorithms to figure out the relevant content for each user. For each title, users get the image with the highest rank based on their profile. Side by side, it continues collecting data from its 100 million other subscribers to improve its engine’s performance. Read more: What software stack does Netflix use? Google and Microsoft using Image recognition Image recognition can serve multiple uses for web apps including object and pattern recognition, locating duplicates (exact or partial), image search by fragments, and more. Two such unique applications of image recognition are Google’s Quickdraw and Microsoft’s Captionbot.ai. Quick Draw is Google’s AI-powered web app game, where users have to draw an everyday object that a neural network tries to recognize. Players are given 20 seconds to draw a random item, and Google’s neural network tries to match it with other 50 million hand-drawn sketches by other players to identify the correct one. Quickdraw aims to generate the world’s largest doodling data set, which is shared publicly to help further machine learning research. The data preserves user privacy by collecting only anonymous metadata, including timestamp, country code, whether or not the drawing was recognized, and which word the drawing corresponded to. This dataset was used in SketchRNN, a neural network that can draw words and interpolate between drawings. Another image recognition web app is Microsoft’s Captionbot.ai. The system can automatically generate a caption for an uploaded photograph. Users can rate how accurately it has detected what was on display. The algorithm learns from the rating, to make the captions more accurate. It uses three separate services to process the images. The Computer Vision API identifies the components of the photo, then mixes it with data from the Bing Image API, and runs any faces it spots through Emotion API. The Emotion API analyses facial expressions to detect anger, contempt, disgust, fear, and other traits. Based on the results from these APIs, the caption is generated. Google Docs powered by Natural Language Processing Modern Web apps can also be fueled with cognitive capabilities to make them stand apart from other apps. Instances of this include transforming human speech to text or conversing with people in natural language. One such example of a web app which includes natural language processing is Google Docs. Google Docs and Slides have an Explore feature to show text, images, and other features relevant to the document that a user is working on at any given point. Docs can also use natural language to search through data and reports, and automatically generate formulas in Sheets. Google Docs recently incorporated an AI grammar checker, announced at Google Cloud Next. It uses a machine translation algorithm to recognize errors and suggest corrections as users type. Google Docs can also be integrated with Natural Language API to recognize the sentiment of selected text in a Google Doc and highlight it based on that sentiment. Web-based artificial intelligence Chatbots Web-based chatbots are just like app-based chatbots albeit they interact with users in the website browser. They use AI techniques such as natural language understanding and pattern recognition to store and distinguish between the context of the information provided and elicit a suitable response for future replies. An example of web-based chatbots are the Live Chat bots where the conversation with a visitor on a website is automated using a chatbot. Many live chat software companies are already experimenting with chatbots. Examples include the Operator bot used by Intercom, a company building customer messaging platform or Driftbot by Drift which gives your website a personal assistant. Read More: Top 4 chatbot development frameworks for developers Another example, are AI based chatbots which help in creating full websites. Right Click is a startup that introduced an A.I.-powered chatbot which uses Artificial Intelligence in a conversational interface to create websites. It asks general questions during the conversation like “What industry you belong to?” and “Why do you want to make a website?” and creates customized templates as per the given answers. Similarly, Wix’s Artificial Intelligence Design bot can tailor websites by learning about each person’s or business’ own needs. Web-based code helpers using AI Intelligent coding assistants are gaining popularity with their ability to understand the code and provide right suggestions at the right time. They can analyze code on the web and give fast and smart completions. Codota for Chrome is a smart web-based IDE which can build predictive models of code and suggest code completions and related content based on the current context present in the code. It combines program analysis, natural language processing, and machine learning to learn from the code. Users can look for Codota’s Icon on every code snippet on their browsers - in GitHub, StackOverflow and others. Another example is Deep Cognition’s Deep Learning Studio – Cloud. It is not exactly an IDE, but it features AI-powered drag & drop interface to help design deep learning models with ease. It features assisted modeling, for automated tensor size calculations and real-time validation. It also has AutoML feature to automatically build a neural network. [dropcap]E[/dropcap]ven though AI is a great choice to enhance your web apps, an important facet to keep in mind is ensuring fairness, accuracy, and transparency of your web apps. For instance, web apps powered by natural language should not discriminate people based on caste, color, or creed or hurt user sentiments. Similarly, those using neural networks for recognizing images should ensure the filtering of obscene images. Creating such types of artificial intelligence systems would require a hybrid of designers, programmers, ML engineers, and researchers. This collective group will have a good grasp of user experience, will be comfortable thinking in abstracts and algorithms, and equally well versed with the social impacts of artificial intelligence. Read More: 20 lessons on bias in machine learning systems by Kate Crawford at NIPS 2017 Uber introduces Fusion.js, a plugin-based web development framework for high-performance apps. Electron Fiddle: A ‘code playground’ for experimenting with cross-platform native apps. Warp: Rust’s new web framework for implementing WAI (Web Application Interface)

0
0
48928

article-image-5-ways-artificial-intelligence-is-transforming-the-gaming-industry

Amey Varangaonkar

01 Dec 2017

7 min read

5 Ways Artificial Intelligence is Transforming the Gaming Industry

Amey Varangaonkar

01 Dec 2017

7 min read

Imagine yourself playing a strategy game, like Age of Empires perhaps. You are in a world that looks real and you are pitted against the computer, and your mission is to protect your empire and defeat the computer, at the same time. What if you could create an army of soldiers who could explore the map and attack the enemies on their own, based on just a simple command you give them? And what if your soldiers could have real, unscripted conversations with you as their commander-in-chief to seek instructions? And what if the game’s scenes change spontaneously based on your decisions and interactions with the game elements, like a movie? Sounds too good to be true? It’s not far-fetched at all - thanks to the rise of Artificial Intelligence! The gaming industry today is a market worth over a hundred billion dollars. The Global Games Market Report says that about 2.2 billion gamers across the world are expected to generate an incredible $108.9 billion in game revenue by the end of 2017. As such, gaming industry giants are seeking newer and more innovative ways to attract more customers and expand their brands. While terms like Virtual Reality, Augmented Reality and Mixed Reality come to mind immediately as the future of games, the rise of Artificial Intelligence is an equally important stepping stone in making games smarter and more interactive, and as close to reality as possible. In this article, we look at the 5 ways AI is revolutionizing the gaming industry, in a big way! Making games smarter While scripting is still commonly used for control of NPCs (Non-playable character) in many games today, many heuristic algorithms and game AIs are also being incorporated for controlling these NPCs. Not just that, the characters also learn from the actions taken by the player and modify their behaviour accordingly. This concept can be seen implemented in Nintendogs, a real-time pet simulation video game by Nintendo. The ultimate aim of the game creators in the future will be to design robust systems within games that understand speech, noise and other sounds within the game and tweak the game scenario accordingly. This will also require modern AI techniques such as pattern recognition and reinforcement learning, where the characters within the games will self-learn from their own actions and evolve accordingly. The game industry has identified this and some have started implementing these ideas - games like F.E.A.R and The Sims are a testament to this. Although the adoption of popular AI techniques in gaming is still quite limited, their possible applications in the near-future has the entire gaming industry buzzing. Making games more realistic This is one area where the game industry has grown leaps and bounds over the last 10 years. There have been incredible advancements in 3D visualization techniques, physics-based simulations and more recently, inclusion of Virtual Reality and Augmented Reality in games. These tools have empowered game developers to create interactive, visually appealing games which one could never imagine a decade ago. Meanwhile, gamers have evolved too. They don’t just want good graphics anymore; they want games to resemble reality. This is a massive challenge for game developers, and AI is playing a huge role in addressing this need. Imagine a game which can interpret and respond to your in-game actions, anticipate your next move and act accordingly. Not the usual scripts where an action X will give a response Y, but an AI program that chooses the best possible alternative to your action in real-time, making the game more realistic and enjoyable for you. Improving the overall gaming experience Let’s take a real-world example here. If you’ve played EA Sports’ FIFA 17, you may be well-versed with their Ultimate Team mode. For the uninitiated, it’s more of a fantasy draft, where you can pick one of the five player choices given to you for each position in your team, and the AI automatically determines the team chemistry based on your choices. The team chemistry here is important, because the higher the team chemistry, the better the chances of your team playing well. The in-game AI also makes the playing experience better by making it more interactive. Suppose you’re losing a match against an opponent - the AI reacts by boosting your team’s morale through increased fan chants, which in turn affects player performances positively. Gamers these days pay a lot of attention to detail - this not only includes the visual appearance and the high-end graphics, but also how immersive and interactive the game is, in all possible ways. Through real-time customization of scenarios, AI has the capability to play a crucial role in taking the gaming experience to the next level. Transforming developer skills The game developer community have always been innovators in adopting cutting edge technology to hone their technical skills and creativity. Reinforcement Learning, a sub-set of Machine Learning, and the algorithm behind the popular AI computer program AlphaGo, that beat the world’s best human Go player is a case in point. Even for the traditional game developers, the rising adoption of AI in games will mean a change in the way games are developed. In an interview with Gamasutra, AiGameDev.com’s Alex Champandard says something interesting: “Game design that hinges on more advanced AI techniques is slowly but surely becoming more commonplace. Developers are more willing to let go and embrace more complex systems.” It’s safe to say that the notion of Game AI is changing drastically. Concepts such as smarter function-based movements, pathfinding, inclusion of genetic algorithms and rule-based AI such as fuzzy logic are being increasingly incorporated in games, although not at a very large scale. There are some implementation challenges currently as to how academic AI techniques can be brought more into games, but with time these AI algorithms and techniques are expected to embed more seamlessly with traditional game development skills. As such, in addition to knowledge of traditional game development tools and techniques, game developers will now have to also skill up on these AI techniques to make smarter, more realistic and more interactive games. Making smarter mobile games The rise of the mobile game industry today is evident from the fact that close to 50% of the game revenue in 2017 will come from mobile games - be it smartphones or tablets. The increasingly high processing power of these devices has allowed developers to create more interactive and immersive mobile games. However, it is important to note that the processing power of the mobile games is yet to catch up to their desktop counterparts, not to mention the lack of a gaming console, which is beyond comparison at this stage. To tackle this issue, mobile game developers are experimenting with different machine learning and AI algorithms to impart ‘smartness’ to mobile games, while still adhering to the processing power limits. Compare today’s mobile games to the ones 5 years back, and you’ll notice a tremendous shift in terms of the visual appearance of the games, and how interactive they have become. New machine learning and deep learning frameworks & libraries are being developed to cater specifically to the mobile platform. Google’s TensorFlow Lite and Facebook’s Caffe2 are instances of such development. Soon, these tools will come to developers’ rescue to build smarter and more interactive mobile games. In Conclusion Gone are the days when games were just about entertainment and passing time. The gaming industry is now one of the most profitable industries of today. As it continues to grow, the demands of the gaming community and the games themselves keep evolving. The need for realism in games is higher than ever, and AI has an important role to play in making games more interactive, immersive and intelligent. With the rate at which new AI techniques and algorithms are developing, it’s an exciting time for game developers to showcase their full potential. Are you ready to start building AI for your own games? Here are some books to help you get started: Practical Game AI Programming Learning game AI programming with Lua

0
0
48549

article-image-dask-library-scalable-analytics-python

Amey Varangaonkar

22 May 2018

6 min read

Introducing Dask: The library that makes scalable analytics in Python easier

Amey Varangaonkar

22 May 2018

6 min read

Python’s rise as the preferred language of choice in Data Science is unprecedented, but not really unexpected. Apart from being a general-purpose language which can be used for a variety of tasks - from scripting to networking, Python offers a rich suite of libraries for general data science tasks such as scientific computing, data visualization, and more. However, one big challenge faced by the data scientists is that these packages are not designed for scale. This is crucial in today’s Big Data era where tons of data needs to be processed and analyzed on the go. A platform which supports the existing Python ecosystem and allows it to scale across multiple machines and clusters without affecting the performance was conspicuously missing. Enter Dask. What is Dask? Dask is a flexible parallel computing library written in Python for analytics, designed mainly to offer scalability and enhanced power to the existing packages and libraries. It allows the users to integrate their existing Python-based projects written in popular libraries such as NumPy, SciPy, pandas, and more. Architecture is demonstrated in the diagram below: Architecture (Image courtesy: Slideshare) The 2 key components of Dask that interact with the Python libraries are: Dynamic task schedulers - which takes care of the intensive computational workloads ‘Big Data’ Dask collections - consisting of dataframes, parallel arrays and interfaces that allow for the computations to run on distributed environments Why use Dask? Given there are already quite a few distributed platforms for large-scale data processing such as Apache Spark, Apache Storm, Flink and so on, why and when should one go for Dask? What are the advantages offered by this Python library? Let us take a look at the 4 major reasons to prefer Dask for distributed, scalable analytics in Python: Easy to get started: If you are an existing Python user, you must have already worked with popular Python packages such as NumPy, SciPy, matplotlib, scikit-learn, pandas, and more. Dask offers a similar, intuitive interface and since it is a part of the bigger Python ecosystem, getting started with Dask is very easy. It uses the existing Python APIs to switch between the popular packages and their Dask-equivalents, so you don’t have to spend a lot of time in porting the code. For absolute beginners, using Dask for scalable analytics would be an easier and logical option to pursue, once they have grasped the fundamentals of Python and the associated libraries. Scales up and down quite easily: You can run your project on Dask on a single machine, or on a cluster with thousands of cores without essentially affecting the speed and performance of your code. Dask uses the multi-core CPUs within a single system optimally to process hundreds of terabytes of data without the need for additional hardware. Similarly, for moderate to large datasets spanning 100+ gigabytes which often don’t fit into a single storage device, the computing power of the clusters can be coupled with Dask for effective analytics. Supports complex applications: Many companies tend to tackle complex computations by introducing custom codes that run on popular Big Data tools such as Hadoop MapReduce and Apache Spark. However, with the help of the dynamic task schedule feature of Dask, it is now possible to run and process complex applications without introducing any additional code. Dask is solely responsible for the smooth handling of various tasks such as network communication, load balancing and diagnostics, among the others. Clear, responsive, real-time feedback: One of the most important features of Dask is its user-friendliness. Dask provides a real-time dashboard that highlights the key metrics of the processing task undertaken by the user - such as the current progress of your project, memory consumption and more. It also offers an in-built IPython kernel that allows the user to investigate the ongoing computation with just a terminal. How Dask compares with Apache Spark Apache Spark is one of the most popular and widely used Big Data tools for distributed data processing and analytics. Dask and Apache Spark have many features in common, prompting us and many other developers to ask the question - which tool is better? While Spark has been around for quite some and has many standard, stable features over years of development, Dask is quite new and is still being improved as a tool. We summarize the important differences between Dask and Apache Spark in the table below: CriteriaApache SparkDaskPrimary languageScalaPythonScaleSupports a single node to thousands of nodes in the clusterSupports a single node to thousands of nodes in the clusterEcosystemAll-in-one self-sufficient ecosystemIntegration with popular libraries within the Python ecosystemFlexibilityLowHighStream processingBuilt-in module called Spark Streaming presentReal-time interface which is pretty low-level, requires more work than Apache SparkGraph processingPossible with GraphX moduleNot possibleMachine learningUses the Spark MLlib moduleIntegrates with scikit-learn and XGBoostPopularityVery high, commonly used tool in the Big Data ecosystemFairly new tool but has already found its place in the pandas, scikit-learn and Jupyter stack You can read a detailed comparison of Apache Spark and Dask on the official Dask documentation page. What we can expect from Dask As we saw from the comparison above, it is fairly easy to port an existing Python project using several high-profile Python libraries such as NumPy, scikit-learn and more. Python developers and data scientists will appreciate the high flexibility and complex computational capabilities offered by Dask. The limited stream processing and graph processing features are big areas of improvement, but we can expect some developments in this domain in the near future. Even though Dask is still relatively new, it looks very promising due to its close affinity with the Python ecosystem. With Python’s clout rising, many people would prefer a Python-based data processing tool which works at scale, without having to switch to an external Big Data framework. Dask may well be the superhero to come to the developers’ rescue, in such cases. You can learn more about the latest developments in Dask on their official GitHub page. Read more Is Apache Spark today’s Hadoop? Apache Spark 2.3 now has native Kubernetes support! Should you move to Python 3? 7 Python experts’ opinions

0
0
48499

Tech Guides - Data

What is PyTorch and how does it work?

5 key reinforcement learning principles explained by AI expert, Hadelin de Ponteves

Do you write Python Code or Pythonic Code?

Building a scalable PostgreSQL solution

What are generative adversarial networks (GANs) and how do they work? [Video]

How will AI impact job roles in Cybersecurity

What are the challenges of adopting AI-powered tools in Sales? How Salesforce can help

How everyone at Netflix uses Jupyter notebooks from data scientists, machine learning engineers, to data analysts

Python, Tensorflow, Excel and more - Data professionals reveal their top tools

Is YouTube's AI Algorithm evil?

Trending Topics

The Deep Learning Framework Showdown: TensorFlow vs CNTK

What is AIOps and why is it going to be important?

5 examples of Artificial Intelligence in Web apps

5 Ways Artificial Intelligence is Transforming the Gaming Industry

Introducing Dask: The library that makes scalable analytics in Python easier

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access