Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-deploy-self-service-business-intelligence-qlik-sense

31 May 2018

7 min read

Best practices for deploying self-service BI with Qlik Sense

31 May 2018

As part of a successful deployment of Qlik Sense, it is important IT recognizes self-service Business Intelligence to have its own dynamics and adoption rules. The various use cases and subsequent user groups thus need to be assessed and captured. Governance should always be present but power users should never get the feeling that they are restricted. Once they are won over, the rest of the traction and the adoption of other user types is very easy. In this article, we will look at the most important points to keep in mind while deploying self-service with Qlik Sense. The following excerpt is taken from the book Mastering Qlik Sense, authored by Martin Mahler and Juan Ignacio Vitantonio. This book demonstrates useful techniques to design useful and highly profitable Business Intelligence solutions using Qlik Sense. Here's the list of points to be kept in mind: Qlik Sense is not QlikView Not even nearly. The biggest challenge and fallacy is that the organization was sold, by Qlik or someone else, just the next version of the tool. It did not help at all that Qlik itself was working for years on Qlik Sense under the initial product name Qlik.Next. Whatever you are being told, however, it is being sold to you, Qlik Sense is at best the cousin of QlikView. Same family, but no blood relation. Thinking otherwise sets the wrong expectation so the business gives the wrong message to stakeholders and does not raise awareness to IT that self-service BI cannot be deployed in the same fashion as guided analytics, QlikView in this case. Disappointment is imminent when stakeholders realize Qlik Sense cannot replicate their QlikView dashboards. Simply installing Qlik Sense does not create a self-service BI environment Installing Qlik Sense and giving users access to the tool is a start but there is more to it than simply installing it. The infrastructure requires design and planning, data quality processing, data collection, and determining who intends to use the platform to consume what type of data. If data is not available and accessible to the user, data analytics serve no purpose. Make sure a data warehouse or similar is in place and the business has a use case for self-service data analytics. A good indicator for this is when the business or project works with a lot of data, and there are business users who have lots of Excel spreadsheets lying around analyzing it in different ways. That’s your best case candidate for Qlik Sense. IT to monitor Qlik Sense environment rather control IT needs to unlearn to learn new things and the same applies when it comes to deploying self-service. Create a framework with guidelines and principles and monitor that users are following it, rather than limiting them in their capabilities. This framework needs to have the input of the users as well and to be elastic. Also, not many IT professionals agree with giving away too much power to the user in the development process, believing this leads to chaos and anarchy. While the risk is there, this fear needs to be overcome. Users love data analytics, and they are keen to get the help of IT to create the most valuable dashboard possible and ensure it will be well received by a wide audience. Identifying key users and user groups is crucial For a strong adoption of the tool, IT needs to prepare the environment and identify the key power users in the organization and to win them over to using the technology. It is important they are intensively supported, especially in the beginning, and they are allowed to drive how the technology should be used rather than having principles imposed on them. Governance should always be present but power users should never get the feeling they are restricted by it. Because once they are won over, the rest of the traction and the adoption of other user types is very easy. Qlik Sense sells well–do a lot of demos Data analytics, compelling visualizations, and the interactivity of Qlik Sense is something almost everyone is interested in. The business wants to see its own data aggregated and distilled in a cool and glossy dashboard. Utilize the momentum and do as many demos as you can to win advocates of the technology and promote a consciousness of becoming a data-driven culture in the organization. Even the simplest Qlik Sense dashboards amaze people and boost their creativity for use cases where data analytics in their area could apply and create value. Promote collaboration Sharing is caring. This not only applies to insights, which naturally are shared with the excitement of having found out something new and valuable, but also to how the new insight has been derived. People keep their secrets on the approach and methodology to themselves, but this is counterproductive. It is important that applications, visualizations, and dashboards created with Qlik Sense are shared and demonstrated to other Qlik Sense users as frequently as possible. This not only promotes a data-driven culture but also encourages the collaboration of users and teams across various business functions, which would not have happened otherwise. They could either be sharing knowledge, tips, and tricks or even realizing they look at the same slices of data and could create additional value by connecting them together. Market the success of Qlik Sense within the organization If Qlik Sense has had a successful achievement in a project, tell others about it. Create a success story and propose doing demos of the dashboard and its analytics. IT has been historically very bad in promoting their work, which is counterproductive. Data analytics creates value and there is nothing embarrassing about boasting about its success; as Muhammad Ali suggested, it’s not bragging if it’s true. Introduce guidelines on design and terminology Avoiding the pitfalls of having multiple different-looking dashboards by promoting a consistent branding look across all Qlik Sense dashboards and applications, including terminology and best practices. Ensure the document is easily accessible to all users. Also, create predesigned templates with some sample sheets so the users duplicate them and modify them to their liking and extend them, applying the same design. Protect less experienced users from complexities Don’t overwhelm users if they have never developed in their life. Approach less technically savvy users in a different way by providing them with sample data and sample templates, including a library of predefined visualizations, dimensions, or measures (so-called Master Key Items). Be aware that what is intuitive to Qlik professionals or power users is not necessarily intuitive to other users – be patient and appreciative of their feedback, and try to understand how a typical business user might think. For a strong adoption of the tool, IT needs to prepare the environment and identify the key power users in the organization and win them over to using the technology. It is important they are intensively supported, especially in the beginning, and they are allowed to drive how the technology should be used rather than having principles imposed on them. If you found the excerpt useful, make sure you check out the book Mastering Qlik Sense to learn more of these techniques on efficient Business Intelligence using Qlik Sense. Read more How Qlik Sense is driving self-service Business Intelligence Overview of a Qlik Sense® Application’s Life Cycle What we learned from Qlik Qonnections 2018

0
0
39763

article-image-top-4-business-intelligence-tools

Ed Bowkett

04 Dec 2014

4 min read

Top 4 Business Intelligence Tools

Ed Bowkett

04 Dec 2014

4 min read

With the boom of data analytics, Business Intelligence has taken something of a front stage in recent years, and as a result, a number of Business Intelligence (BI) tools have appeared. This allows a business to obtain a reliable set of data, faster and easier, and to set business objectives. This will be a list of the more prominent tools and will list advantages and disadvantages of each. Pentaho Pentaho was founded in 2004 and offers a suite, among others, of open source BI applications under the name, Pentaho Business Analytics. It has two suites, enterprise and community. It allows easy access to data and even easier ways of visualizing this data, from a variety of different sources including Excel and Hadoop and it covers almost every platform ranging from mobile, Android and iPhone, through to Windows and even Web-based. However with the pros, there are cons, which include the Pentaho Metadata Editor in Pentaho, which is difficult to understand, and the documentation provided offers few solutions for this tool (which is a key component). Also, compared to other tools, which we will mention below, the advanced analytics in Pentaho need improving. However, given that it is open source, there is continual improvement. Tableau Founded in 2003, Tableau also offers a range of suites, focusing on three products: Desktop, Server, and Public. Some benefits of using Tableau over other products include ease of use and a pretty simple UI involving drag and drop tools, which allows pretty much everyone to use it. Creating a highly interactive dashboard with various sources to obtain your data from is simple and quick. To sum up, Tableau is fast. Incredibly fast! There are relatively few cons when it comes to Tableau, but some automated features you would usually expect in other suites aren’t offered for most of the processes and uses here. Jaspersoft As well as being another suite that is open source, Jaspersoft ships with a number of data visualization, data integration, and reporting tools. Added to the small licensing cost, Jaspersoft is justifiably one of the leaders in this area. It can be used with a variety of databases including Cassandra, CouchDB, MongoDB, Neo4j, and Riak. Other benefits include ease of installation and the functionality of the tools in Jaspersoft is better than most competitors on the market. However, the documentation has been claimed to have been lacking in helping customers dive deeper into Jaspersoft, and if you do customize it the customer service can no longer assist you if it breaks. However, given the functionality/ability to extend it, these cons seem minor. Qlikview Qlikview is one of the oldest Business Intelligence software tools in the market, having been around since 1993, it has multiple features, and as a result, many pros and cons that include ones that I have mentioned for previous suites. Some advantages of Qlikview are that it takes a very small amount of time to implement and it’s incredibly quick; quicker than Tableau in this regard! It also has 64-bit in-memory, which is among the best in the market. Qlikview also has good data mining tools, good features (having been in the market for a long time), and a visualization function. These aspects make it so much easier to deal with than others on the market. The learning curve is relatively small. Some cons in relation to Qlikview include that while Qlikview is easy to use, Tableau is seen as the better suite to use to analyze data in depth. Qlikview also has difficulties integrating map data, which other BI tools are better at doing. This list is not definitive! It lays out some open source tools that companies and individuals can use to help them analyze data to prepare business performance KPIs. There are other tools that are used by businesses including Microsoft BI tools, Cognos, MicroStrategy, and Oracle Hyperion. I’ve chosen to explore some BI tools that are quick to use out of the box and are incredibly popular and expanding in usage.

0
0
39711

article-image-why-jvm-java-virtual-machine-for-deep-learning

Guest Contributor

10 Nov 2019

5 min read

Why use JVM (Java Virtual Machine) for deep learning

Guest Contributor

10 Nov 2019

5 min read

Deep learning is one of the revolutionary breakthroughs of the decade for enterprise application development. Today, majority of organizations and enterprises have to transform their applications to exploit the capabilities of deep learning. In this article, we will discuss how to leverage the capabilities of JVM (Java virtual machine) to build deep learning applications. Entreprises prefer JVM Major JVM languages used in enterprise are Java, Scala, Groovy and Kotlin. Java is the most widely used programming language in the world. Nearly all major enterprises in the world use Java in some way or the other. Enterprises use JVM based languages such as Java to build complex applications because JVM features are optimal for production applications. JVM applications are also significantly faster and require much fewer resources to run compared to their counterparts such as Python. Java can perform more computational operations per second compared to Python. Here is an interesting performance benchmarking for the same. JVM optimizes performance benchmarks Production applications represent a business and are very sensitive to performance degradation, latency, and other disruptions. Application performance is estimated from latency/throughput measures. Memory overload and high resource usage can influence above said measures. Applications that demand more resources or memory require good hardware and further optimization from the application itself. JVM helps in optimizing performance benchmarks and tune the application to the hardware’s fullest capabilities. JVM can also help in avoiding memory footprints in the application. We have discussed on JVM features so far, but there’s an important context on why there’s a huge demand for JVM based deep learning in production. We’re going to discuss that next. Python is undoubtedly the leading programming language used in deep learning applications. For the same reason, the majority of enterprise developers i.e, Java developers are forced to switch to a technology stack that they’re less familiar with. On top of that, they need to address compatibility issues and deployment in a production environment while integrating neural network models. DeepLearning4J, deep learning library for JVM Java Developers working on enterprise applications would want to exploit deployment tools like Maven or Gradle for hassle-free deployments. So, there’s a demand for a JVM based deep learning library to simplify the whole process. Although there are multiple deep learning libraries that serve the purpose, DL4J (Deeplearning4J) is one of the top choices. DL4J is a deep learning library for JVM and is among the most popular repositories on GitHub. DL4J, developed by the Skymind team, is the first open-source deep learning library that is commercially supported. What makes it so special is that it is backed by ND4J (N-Dimensional Arrays for Java) and JavaCPP. ND4J is a scientific computational library developed by the Skymind team. It acts as the required backend dependency for all neural network computations in DL4J. ND4J is much faster in computations than NumPy. JavaCPP acts as a bridge between Java and native C++ libraries. ND4J internally depends on JavaCPP to run native C++ libraries. DL4J also has a dedicated ETL component called DataVec. DataVec helps to transform the data into a format that a neural network can understand. Data analysis can be done using DataVec just like Pandas, a popular Python data analysis library. Also, DL4J uses Arbiter component for hyperparameter optimization. Arbiter finds the best configuration to obtain good model scores by performing random/grid search using the hyperparameter values defined in a search space. Why choose DL4J for your deep learning applications? DL4J is a good choice for developing distributed deep learning applications. It can leverage the capabilities of Apache Spark and Hadoop to develop high performing distributed deep learning applications. Its performance is equivalent to Caffe in case multi-GPU hardware is used. We can use DL4J to develop multi-layer perceptrons, convolutional neural networks, recurrent neural networks, and autoencoders. There are a number of hyperparameters that can be adjusted to further optimize the neural network training. The Skymind team did a good job in explaining the important basics of DL4J on their website. On top of that, they also have a gitter channel to discuss or report bugs straight to their developers. If you are keen on exploring reinforcement learning further, then there’s a dedicated library called RL4J (Reinforcement Learning for Java) developed by Skymind. It can already play doom game! DL4J combines all the above-mentioned components (DataVec, ND4J, Arbiter and RL4J) for the deep learning workflow thus forming a powerful software suite. Most importantly, DL4J enables productionization of deep learning applications for the business. If you are interested to learn how to develop real-time applications on DL4J, checkout my new book Java Deep Learning Cookbook. In this book, I show you how to install and configure Deeplearning4j to implement deep learning models. You can also explore recipes for training and fine-tuning your neural network models using Java. By the end of this book, you’ll have a clear understanding of how you can use Deeplearning4j to build robust deep learning applications in Java. Author Bio Rahul Raj has more than 7 years of IT industry experience in software development, business analysis, client communication and consulting for medium/large scale projects. He has extensive experience in development activities comprising requirement analysis, design, coding, implementation, code review, testing, user training, and enhancements. He has written a number of articles about neural networks in Java and is featured by DL4J and Official Java community channel. You can follow Rahul on Twitter, LinkedIn, and GitHub. Top 6 Java Machine Learning/Deep Learning frameworks you can’t miss 6 most commonly used Java Machine learning libraries Deeplearning4J 1.0.0-beta4 released with full multi-datatype support, new attention layers, and more!

0
0
39583

article-image-what-is-streaming-analytics-and-why-is-it-important

Amey Varangaonkar

05 Oct 2017

5 min read

Say hello to Streaming Analytics

Amey Varangaonkar

05 Oct 2017

5 min read

In this data-driven age, businesses want fast, accurate insights from their huge data repositories in the shortest time span — and in real time when possible. These insights are essential — they help businesses understand relevant trends, improve their existing processes, enhance customer satisfaction, improve their bottom line, and most importantly, build, and sustain their competitive advantage in the market. Doing all of this is quite an ask - one that is becoming increasingly difficult to achieve using just the traditional data processing systems where analytics is limited to the back-end. There is now a burning need for a newer kind of system where larger, more complex data can be processed and analyzed on the go. Enter: Streaming Analytics Streaming Analytics, also referred to as real-time event processing, is the processing and analysis of large streams of data in real-time. These streams are basically events that occur as a result of some action. Actions like a transaction or a system failure, or a trigger that changes the state of a system at any point in time. Even something as minor or granular as a click would then constitute as an event, depending upon the context. Consider this scenario - You are the CTO of an organization that deals with sensor data from wearables. Your organization would have to deal with terabytes of data coming in on a daily basis, from thousands of sensors. One of your biggest challenges as a CTO would be to implement a system that processes and analyzes the data from these sensors as it enters the system. Here’s where streaming analytics can help you by giving you the ability to derive insights from your data on the go. According to IBM, a streaming system demonstrates the following qualities: It can handle large volumes of data It can handle a variety of data and analyze it efficiently — be it structured or unstructured, and identifies relevant patterns accordingly It can process every event as it occurs unlike traditional analytics systems that rely on batch processing Why is Streaming Analytics important? The humongous volume of data that companies have to deal with today is almost unimaginable. Add to that the varied nature of data that these companies must handle, and the urgency with which value needs to be extracted from this data - it all makes for a pretty tricky proposition. In such scenarios, choosing a solution that integrates seamlessly with different data sources, is fine-tuned for performance, is fast, reliable, and most importantly one that is flexible to changes in technology, is critical. Streaming analytics offers all these features - thereby empowering organizations to gain that significant edge over their competition. Another significant argument in favour of streaming analytics is the speed at which one can derive insights from the data. Data in a real-time streaming system is processed and analyzed before it registers in a database. This is in stark contrast to analytics on traditional systems where information is gathered, stored, and then the analytics is performed. Thus, streaming analytics supports much faster decision-making than the traditional data analytics systems. Is Streaming Analytics right for my business? Not all organizations need streaming analytics, especially those that deal with static data or data that hardly change over longer intervals of time, or those that do not require real-time insights for decision-making. For instance, consider the HR unit of a call centre. It is sufficient and efficient to use a traditional analytics solution to analyze thousands of past employee records rather than run it through a streaming analytics system. On the other hand, the same call centre can find real value in implementing streaming analytics to something like a real-time customer log monitoring system. A system where customer interactions and context-sensitive information are processed on the go. This can help the organization find opportunities to provide unique customer experiences, improve their customer satisfaction score, alongside a whole host of other benefits. Streaming Analytics is slowly finding adoption in a variety of domains, where companies are looking to get that crucial competitive advantage - sensor data analytics, mobile analytics, business activity monitoring being some of them. With the rise of Internet of Things, data from the IoT devices is also increasing exponentially. Streaming analytics is the way to go here as well. In short, streaming analytics is ideal for businesses dealing with time-critical missions and those working with continuous streams of incoming data, where decision-making has to be instantaneous. Companies that obsess about real-time monitoring of their businesses will also find streaming analytics useful - just integrate your dashboards with your streaming analytics platform! What next? It is safe to say that with time, the amount of information businesses will manage is going to rise exponentially, and so will the nature of this information. As a result, it will get increasingly difficult to process volumes of unstructured data and gain insights from them using just the traditional analytics systems. Adopting streaming analytics into the business workflow will therefore become a necessity for many businesses. Apache Flink, Spark Streaming, Microsoft's Azure Stream Analytics, SQLstream Blaze, Oracle Stream Analytics and SAS Event Processing are all good places to begin your journey through the fleeting world of streaming analytics. You can browse through this list of learning resources from Packt to know more. Learning Apache Flink Learning Real Time processing with Spark Streaming Real Time Streaming using Apache Spark Streaming (video) Real Time Analytics with SAP Hana Real-Time Big Data Analytics

0
0
39459

Aaron Lazar

23 Apr 2018

5 min read

Why is Hadoop dying?

Aaron Lazar

23 Apr 2018

5 min read

Hadoop has been the definitive big data platform for some time. The name has practically been synonymous with the field. But while its ascent followed the trajectory of what was referred to as the 'big data revolution', Hadoop now seems to be in danger. The question is everywhere - is Hadoop dying out? And if it is, why is it? Is it because big data is no longer the buzzword it once was, or are there simply other ways of working with big data that have become more useful? Hadoop was essential to the growth of big data When Hadoop was open sourced in 2007, it opened the door to big data. It brought compute to data, as against bringing data to compute. Organisations had the opportunity to scale their data without having to worry too much about the cost. It obviously had initial hiccups with security, the complexity of querying and querying speeds, but all that was taken care off, in the long run. Still, although querying speeds remained quite a pain, however that wasn’t the real reason behind Hadoop dying (slowly). As cloud grew, Hadoop started falling One of the main reasons behind Hadoop's decline in popularity was the growth of cloud. There cloud vendor market was pretty crowded, and each of them provided their own big data processing services. These services all basically did what Hadoop was doing. But they also did it in an even more efficient and hassle-free way. Customers didn't have to think about administration, security or maintenance in the way they had to with Hadoop. One person’s big data is another person’s small data Well, this is clearly a fact. Several organisations that used big data technologies without really gauging the amount of data they actually would need to process, have suffered. Imagine sitting with 10TB Hadoop clusters when you don’t have that much data. The two biggest organisations that built products on Hadoop, Hortonworks and Cloudera, saw a decline in revenue in 2015, owing to their massive use of Hadoop. Customers weren’t pleased with nature of Hadoop’s limitations. Apache Hadoop v Apache Spark Hadoop processing is way behind in terms of processing speed. In 2014 Spark took the world by storm. I’m going to let you guess which line in the graph above might be Hadoop, and which might be Spark. Spark was a general purpose, easy to use platform that was built after studying the pitfalls of Hadoop. Spark was not bound to just the HDFS (Hadoop Distributed File System) which meant that it could leverage storage systems like Cassandra and MongoDB as well. Spark 2.3 was also able to run on Kubernetes; a big leap for containerized big data processing in the cloud. Spark also brings along GraphX, which allows developers to view data in the form of graphs. Some of the major areas Spark wins are Iterative Algorithms in Machine Learning, Interactive Data Mining and Data Processing, Stream processing, Sensor data processing, etc. Machine Learning in Hadoop is not straightforward Unlike MLlib in Spark, Machine Learning is not possible in Hadoop unless tied with a 3rd party library. Mahout used to be quite popular for doing ML on Hadoop, but its adoption has gone down in the past few years. Tools like RHadoop, a collection of 3 R packages, have grown for ML, but it still is nowhere comparable to the power of the modern day MLaaS offerings from cloud providers. All the more reason to move away from Hadoop, right? Maybe. Hadoop is not only Hadoop The general misconception is that Hadoop is quickly going to be extinct. On the contrary, the Hadoop family consists of YARN, HDFS, MapReduce, Hive, Hbase, Spark, Kudu, Impala, and 20 other products. While e folks may be moving away from Hadoop as their choice for big data processing, they will still be using Hadoop in some form or the other. As with Cloudera and Hortonworks, though the market has seen a downward trend, they’re in no way letting go of Hadoop anytime soon, although they have shifted part of their processing operations to Spark. Is Hadoop dying? Perhaps not... In the long run, it’s not completely accurate to say that Hadoop is dying. December last year brought with it Hadoop 3.0, which is supposed to be a much improved version of the framework. Some of the most noteworthy features are its improved shell script, more powerful YARN, improved fault tolerance with erasure coding, and many more. Although, that hasn’t caused any major spike in adoption, there are still users who will adopt Hadoop based on their use case, or simply use another alternative like Spark along with another framework from the Hadoop family. So, Hadoop’s not going away anytime soon. Read More Pandas is an effective tool to explore and analyze data - Interview insights

0
1
39249

article-image-machine-learning-as-a-service-mlaas-how-google-cloud-platform-microsoft-azure-and-aws-are-democratizing-artificial-intelligence

Bhagyashree R

07 Sep 2018

13 min read

Machine Learning as a Service (MLaaS): How Google Cloud Platform, Microsoft Azure, and AWS are democratizing Artificial Intelligence

Bhagyashree R

07 Sep 2018

13 min read

0
0
39144

article-image-2018-year-of-graph-databases

Amey Varangaonkar

04 May 2018

5 min read

2018 is the year of graph databases. Here's why.

Amey Varangaonkar

04 May 2018

5 min read

With the explosion of data, businesses are looking to innovate as they connect their operations to a whole host of different technologies. The need for consistency across all data elements is now stronger than ever. That’s where graph databases come in handy. Because they allow for a high level of flexibility when it comes to representing your data and also while handling complex interactions within different elements, graph databases are considered by many to be the next big trend in databases. In this article, we dive deep into the current graph database scene, and list out 3 top reasons why graph databases will continue to soar in terms of popularity in 2018. What are graph databases, anyway? Simply put, graph databases are databases that follow the graph model. What is a graph model, then? In mathematical terms, a graph is simply a collection of nodes, with different nodes connected by edges. Each node contains some information about the graph, while edges denote the connection between the nodes. How are graph databases different from the relational databases, you might ask? Well, the key difference between the two is the fact that graph data models allow for more flexible and fine-grained relationships between data objects, as compared to relational models. There are some more differences between the graph data model and the relational data model, which you should read through for more information. Often, you will see that graph databases are without a schema. This allows for a very flexible data model, much like the document or key/value store database models. A unique feature of the graph databases, however, is that they also support relationships between the data objects like a relational database. This is useful because it allows for a more flexible and faster database, which can be invaluable to your project which demands a quicker response time. Image courtesy DB-Engines The rise in popularity of the graph database models over the last 5 years has been stunning, but not exactly surprising. If we were to drill down the 3 key factors that have propelled the popularity of graph databases to a whole new level, what would they be? Let’s find out. Major players entering the graph database market About a decade ago, the graph database family included just Neo4j and a couple of other less-popular graph databases. More recently, however, all the major players in the industry such as Oracle (Oracle Spatial and Graph), Microsoft (Graph Engine), SAP (SAP Hana as a graph store) and IBM (Compose for JanusGraph) have come up with graph offerings of their own. The most recent entrant to the graph database market is Amazon, with Amazon Neptune announced just last year. According to Andy Jassy, CEO of Amazon Web Services, graph databases are becoming a part of the growing trend of multi-model databases. Per Jassy, these databases are finding increased adoption on the cloud as they support a myriad of useful data processing methods. The traditional over-reliance on relational databases is slowly breaking down, he says. Rise of the Cypher Query Language With graph databases slowly getting mainstream recognition and adoption, the major companies have identified the need for a standard query language for all graph databases. Similar to SQL, Cypher has emerged as a standard and is a widely-adopted alternative to write efficient and easy to understand graph queries. As of today, the Cypher Query Language is used in popular graph databases such as Neo4j, SAP Hana, Redis graph and so on. The OpenCypher project, the project that develops and maintains Cypher, has also released Cypher for popular Big Data frameworks like Apache Spark. Cypher’s popularity has risen tremendously over the last few years. The primary reason for this is the fact that like SQL, Cypher’s declarative nature allows users to state the actions they want performed on their graph data without explicitly specifying them. Finding critical real-world applications Graph databases were in the news as early as 2016, when the Panama paper leaks were revealed with the help of Neo4j and Linkurious, a data visualization software. In more recent times, graph databases have also found increased applications in online recommendation engines, as well as for performing tasks that include fraud detection and managing social media. Facebook’s search app also uses graph technology to map social relationships. Graph databases are also finding applications in virtual assistants to drive conversations - eBay’s virtual shopping assistant is an example. Even NASA uses the knowledge graph architecture to find critical data. What next for graph databases? With growing adoption of graph databases, we expect graph-based platforms to soon become the foundational elements of many corporate tech stacks. The next focus area for these databases will be practical implementations such as graph analytics and building graph-based applications. The rising number of graph databases would also mean more competition, and that is a good thing - competition will bring more innovation, and enable incorporation of more cutting-edge features. With a healthy and steadily growing community of developers, data scientists and even business analysts, this evolution may be on the cards, sooner than we might expect. Amazon Neptune: A graph database service for your applications When, why and how to use Graph analytics for your big data

0
0
38833

article-image-healthcare-analytics-logistic-regression-to-reduce-patient-readmissions

Guest Contributor

20 Dec 2017

8 min read

Healthcare Analytics: Logistic Regression to Reduce Patient Readmissions

Guest Contributor

20 Dec 2017

8 min read

[box type="info" align="" class="" width=""]We bring to you another guest post by Benjamin Rojogan on Logistic regression to aid healthcare sector in reducing patient readmission. Ben's previous post on ensemble methods to optimize machine learning models is also available for a quick read here.[/box] ER visits are not cheap for any party involved. Whether this be the patient or the insurance company. However, this does not stop some patients from being regular repeat visitors. These recurring visits are due to lack of intervention for problems such as substance abuse, chronic diseases and mental illness. This increases costs for everybody in the healthcare system and reduces quality of care by playing a role in the overflowing of Emergency Departments (EDs). Research teams at UW and other universities are partnering with companies like Kensci to figure out how to approach the problem of reducing readmission rates. The ability to predict the likelihood of a patient’s readmission will allow for targeted intervention which in turn will help reduce the frequency of readmissions. Thus making the population healthier and hopefully reducing the estimated 41.3 billion USD healthcare costs for the entire system. How do they plan to do it? With big data and statistics, of course. A plethora of algorithms are available for data scientists to use to approach this problem. Many possible variables could affect the readmission and medical costs. Also, there are also many different ways researchers might pose their questions. However, the researchers at UW and many other institutions have been heavily focused on reducing the readmission rate simply by trying to calculate whether a person would or would not be readmitted. In particular, this team of researchers was curious about chronic ailments. Patients with chronic ailments are likely to have random flare ups that require immediate attention. Being able to predict if a patient will have an ER visit can lead to managing the cause more effectively. One approach taken by the data science team at UW as well as the Department of Family and Community Medicine at the University of Toronto was to utilize logistic regression to predict whether or not a patient would be readmitted. Patient readmission can be broken down into a binary output: either the patient is readmitted or not. As such logistic regression has been a useful model in my experience to approach this problem. Logistic Regression to predict patient readmissions Why do data scientists like to use logistic regression? Where is it used? And how does it compare to other data algorithms? Logistic regression is a statistical method that statisticians and data scientists use to classify people, products, entities, etc. It is used for analyzing data that produces a binary classification based on one or many independent variables. This means, it produces two clear classifications (Yes or No, 1 or 0, etc). With the example above, the binary classification would be: is the patient readmitted or not? Other examples of this could be whether to give a customer a loan or not, whether a medical claim is fraud or not, whether a patient has diabetes or not. Despite its name, logistic regression does not provide the same output like linear regression (per se). There are some similarities, for instance, the linear model is somewhat consistent as you might notice in the equation below where you see what is very similar to a linear equation. But the final output is based on the log odds. Linear regression and multivariate regression both take one to many independent variables and produce some form of continuous function. Linear regression could be used to predict the price of a house, a person’s age or the cost of a product an e-commerce should display to each customer. The output is not limited to only a few discrete classifications. Whereas logistic regression produces discrete classifiers. For instance, an algorithm using logistic regression could be used to classify whether or not a certain stock price would be either >$50 a share or <$50 a share. Linear regression would be used to predict if a stock share would be worth $50.01, $50.02….etc. Logistic regression is a calculation that uses the odds of a certain classification. In the equation above, the symbol you might know as pi actually represents the odds or probability. To reduce the error rate, we should predict Y = 1 when p ≥ 0.5 and Y = 0 when p < 0.5. This creates a linear classifier, a boundary that when the coefficients β0 + x · β has a p value that is p < 0.5 then Y = 0. By generating coefficients that help predict the logit transformation, the method allows to classify for the characteristic of interest. Now that is a lot of complex math mumbo jumbo. Let’s try to break it down into simpler terms. Probability vs. Odds Let’s start with probability. Let’s say a patient has the probability of 0.6 of being readmitted. Then the probability that the patient won’t be readmitted is .4. Now, we want to take this and convert it into odds. This is what the formula above is doing. You would take .6/.4 and get odds of 1.5. That means the odds of the patient being readmitted are 1.5 to 1. If instead the probability was .5 for both being readmitted and not being readmitted, then the odds would be 1:1. Now the next step in the logistic regression model would be to take the odds and get the “Log odds”. You do this by taking the 1.5 and put it into the log portion of the equation. Now you will get .18(rounded). In logistic regression, we don’t actually know p. That is what we are trying to essentially find and model using various coefficients and input variables. Each input provides a value that changes how much more likely an event will or will not occur. All of these coefficients are used to calculate the log odds. This model can take multiple variables like age, sex, height, etc. and specify how much of an effect each variable has on the odds an event will occur. Once the initial model is developed, then comes the work of deciding its value. How does a business go from creating an algorithm inside a computer and translate it into action. Some of us like to say the “computers” are the easy part. Personally I find the hard part to be the “people”. After all, at the end of the day, it comes down to business value. Will an algorithm save money or not? That means it has to be applied in real life. This could take the form of a new initiative, strategy, product recommendation, etc. You need to find the outliers that are worth going after! For instance, if we go back to the patient readmission example again. The algorithm points out patients with high probabilities of being readmitted. However if the readmission costs are low, they will probably be ignored..sadly. That is how businesses (including hospitals) look at problems. Logistic regression is a great tool for binary classification. It is unlike many other algorithms that estimate continuous variables or estimate distributions. This statistical method can be utilized to classify whether a person will be likely to get cancer because of environmental variables like proximity to a highway, smoking habits, etc? This method has been used effectively in the medical, financial and insurance industry successfully for a while. Knowing when to use what algorithm takes time. However, the more problems a data scientist faces, the faster they will recognize whether to use logistic regression or decision trees. Using logistic regression provides the opportunity for healthcare institutions to accurately target at risk individuals who should receive a more tailored behavioral health plan to help improve their daily health habits. This in turn opens the opportunity for better health for patients and lower costs for hospitals. [box type="shadow" align="" class="" width=""] About the Author Benjamin Rogojan Ben has spent his career focused on healthcare data. He has focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. He has also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. Ben privately consults on data science and engineering problems both solo as well as with a company called Acheron Analytics. He has experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.[/box]

0
0
38257

article-image-self-service-business-intelligence-qlik-sense-users

Amey Varangaonkar

29 May 2018

7 min read

Four self-service business intelligence user types in Qlik Sense

Amey Varangaonkar

29 May 2018

7 min read

With the introduction of self-service to BI, there is segmentation at various levels and breaths on how self-service is conducted and to what extent. There are, quite frankly, different user types that differ from each other in level of interest, technical expertise, and the way in which they consume data. While each user will almost be unique in the way they use self-service, the user base can be divided into four different groups. In this article, we take a look at the four types of users in self-service business intelligence model. The following excerpt is taken from the book Mastering Qlik Sense, authored by Martin Mahler and Juan Ignacio Vitantonio. This book presents expert techniques to design and deploy enterprise-grade Business Intelligence solutions for your business, by leveraging the power of Qlik Sense. Power Users or Data Champions Power users are the most tech-savvy business users, who show a great interest in self-service BI. They produce and build dashboards themselves and know how to load data and process it to create a logical data model. They tend to be self-learning and carry a hybrid set of skills, usually a mixture of business knowledge and some advanced technical skills. This user group is often frustrated with existing reporting or BI solutions and finds IT inadequate in delivering the same. As a result, especially in the past, they take away data dumps from IT solutions and create their own dashboards in Excel, using advanced skills such as VBA, Visual Basic for Applications. They generally like to participate in the development process but have been unable to do so due to governance rules and a strict old-school separation of IT from the business. Self-service BI is addressing this group in particular, and identifying those users is key in reaching adoption within an organization. Within an established self-service environment, power users generally participate in committees revolving around the technical environments and represent the business interest. They also develop the bulk of the first versions of the apps, which, as part of a naturally evolving process, are then handed over to more experienced IT for them to be polished and optimized. Power users advocate the self-service BI technology and often not only demo the insights and information they achieved to extract from their data, but also the efficiency and timeliness of doing so. At the same time, they also serve as the first point of contact for other users and consumers when it comes to questions about their apps and dashboards. Sometimes they also participate in a technical advisory capacity on whether other projects are feasible to be implemented using the same technology. Within a self-service BI environment, it is safe to say that those power users are the pillars of a successful adoption. Business Users or Data Visualizers Users are frequent users of data analytics, with the main goal to extract value from the data they are presented with. They represent the group of the user base which is interested in conducting data analysis and data discovery to better understand their business in order to make better-informed decisions. Presentation and ease of use of the application are key to this type of user group and they are less interested in building new analytics themselves. That being said, some form of creating new charts and loading data is sometimes still of interest to them, albeit on a very basic level. Timeliness, the relevance of data, and the user experience are most relevant to them. They are the ones who are slicing and dicing the data and drilling down into dimensions, and who are keen to click around in the app to obtain valuable information. Usually, a group of users belong to the same department and have a power user overseeing them with regard to questions but also in receiving feedback on how the dashboard can be improved even more. Their interaction with IT is mostly limited to requesting access and resolving unexpected technical errors. Consumers or Data Readers Consumers usually form the largest user group of a self-service BI analytics solution. They are the end recipients of the insights and data analytics that have been produced and, normally, are only interested in distilled information which is presented to them in a digested form. They are usually the kind of users who are happy with a report, either digital or in printed form, which summarizes highlights and lowlights in a few pages, requiring no interaction at all. Also, they are most sensitive to the timeliness and availability of their reports. While usually the largest audience, at the same time this user group leverages the self-service capabilities of a BI tool the least. This poses a licensing challenge, as those users don’t take full advantage of the functionality on offer, but are costing the full amount in order to access the reports. It is therefore not uncommon to assign this type of user group a bucket of login access passes or not give them access to the self-service BI platform at all and give them the information they need in (digitally) printed format or within presentations, prepared by users. IT or Data Overseers IT represents the technical user group within this context, who sit in the background and develop and manage the framework within which the self-service BI solution operates. They are the backbone of the deployment and ensure the environment is set up correctly to cater for the various use cases required by the above-described user groups. At the same time, they ensure a security policy is in place and maintained and they introduce a governance framework for deployment, data quality, and best practices. They are in effect responsible for overseeing the power users and helping them with technical questions, but at the same time ensuring terms and definition as well as the look and feel is consistent and maintained across all apps. With self-service BI, IT plays a lesser role in actually developing the dashboards but assumes a more mentoring position, where training, consultation, and advisory in best practices are conducted. While working closely with power users, IT also provides technical support to users and liaises with the IT infrastructure to ensure the server infrastructure is fit for purpose and up and running to serve the users. This also includes upgrading the platform where required and enriching it with additional functionality if and when available. Bringing them together The previous four groups can be distinguished within a typical enterprise environment; however, this is not to say hybrid or fewer user groups are not viable models for self-service BI. It is an evolutionary process in how an organization adapts self-service data analytics with a lot of dependencies on available skills, competing established solutions, culture, and appetite on new technologies. It usually begins with IT being the first users in a newly deployed self-service environment, not only setting up the infrastructure but also developing the first apps for a couple of consumers. Power users then follow up; generally, they are the business sponsors themselves who are often big fans of data analytics, modifying the app to their liking and promoting it to their users. The user base emerges with the success of the solution, where analytics are integrated into their business as the usual process. The last group, the consumers, is mostly the last type of user group that is established, which more often than not doesn’t have actual access to the platform itself, but rather receives printouts, email summaries with screenshots, or PowerPoint presentations. Due to licensing cost and the size of the consumer audience, it is not always easy to give them access to the self-service platform; hence, most of the time, an automated and streamlined PDF printing process is the most elegant solution to cater to this type of user group. At the same time, the size of the deployment also determines the number of various user groups. In small enterprise environments, it will be mostly power users and IT who will be using self-service. This greatly simplifies the approach as well as the setup considerations. If you found the above excerpt useful, make sure you check out the book Mastering Qlik Sense to learn helpful tips and tricks to perform effective Business Intelligence using Qlik Sense. Read more: How Qlik Sense is driving self-service Business Intelligence What we learned from Qlik Qonnections 2018 How self-service analytics is changing modern-day businesses

0
0
38153

Aarthi Kumaraswamy

16 May 2018

27 min read

30 common data science terms explained

Aarthi Kumaraswamy

16 May 2018

27 min read

0
0
37966

article-image-why-oracle-losing-database-race

Aaron Lazar

06 Apr 2018

3 min read

Why Oracle is losing the Database Race

Aaron Lazar

06 Apr 2018

3 min read

When you think of databases, the first thing that comes to mind is Oracle or IBM. Oracle has been ruling the database world for decades now, and it has been able to acquire tonnes of applications that use its databases. However, that’s changing now, and if you didn’t know already, you might be surprised to know that Oracle is losing the database race. Oracle = Goliath Oracle was and still is ranked number one among databases, owing to its legacy in the database ballpark. Source - DB Engines The main reason why Oracle has managed to hold its position is because of lock-in, a CIO’s worst nightmare. Migrating data that’s accumulated over the years is not a walk in the park and usually has top management flinching every time it’s mentioned. Another reason is because Oracle is known to be aggressive when it comes to maintaining and enforcing licensing terms. You won’t be surprised to find Oracle ‘agents’ at the doorstep of your organisation, slapping you with a big fine for non-compliance! Oracle != Goliath for everyone You might wonder whether even the biggies are in the same position, locked-in with Oracle. Well, the Amazons and Salesforces of the world have quietly moved away from lock-in hell and have their applications now running on open-source projects. In fact, Salesforce plans to be completely free of Oracle databases by 2023 and has even codenamed this project “Sayonara”. I wonder what inspired the name! Enter the “Davids” of Databases While Oracle’s databases have been declining, alternatives like SQL Server and PostgreSQL have been steadily growing. SQL Server has been doing it in leaps and bounds, with a growth rate of over 30%. Amazon and Microsoft’s cloud based databases have seen close to 10x growth. While one might think that all Cloud solutions would have dominated the database world, databases like Google Cloud SQL and IBM Cognos have been suffering very slow to no growth as the question of lock-in arises again, only this time with a cloud vendor. MongoDB has been another shining star in the database race. Several large organisations like HSBC, Adobe, Ebay, Forbes and MTV have adopted MongoDB as their database solution. Newer organisations have been resorting to adopt these databases instead to looking to Oracle. However, it’s not really eating into Oracle’s existing market, at least not yet. Is 18c Oracle’s silver bullet? Oracle bragged a lot about 18c, last year, positioning it as a database that needs little to no human interference thanks to its ground-breaking machine learning; one that operates at less than 30 minutes of downtime a year and many more features. Does this make Microsoft and Amazon break into a sweat? Hell no! Although Oracle has strategically positioned 18c as a database that lowers operational cost by cutting down on the human element, it still is quite expensive when compared to its competitors - they haven’t dropped their price one bit. Moreover, it can’t really automate “everything” and there’s always a need for a human administrator - not really convincing enough. Quite naturally customers will be drawn towards competition. In the end, the way I look at it, Oracle already had a head start and is now inches from the elusive finish line, probably sniggering away at all the customers that it has on a leash. All while cloud databases are slowly catching up and will soon be leaving Oracle in a heap of dirt. Reminds me of that fable mum used to read to me...what’s it called...The hare and the tortoise.

0
0
37808

article-image-5-ways-artificial-intelligence-is-upgrading-software-engineering

Melisha Dsouza

02 Sep 2018

8 min read

5 ways artificial intelligence is upgrading software engineering

Melisha Dsouza

02 Sep 2018

8 min read

0
0
37412

article-image-how-artificial-intelligence-and-machine-learning-can-turbocharge-a-game-developers-career

Guest Contributor

06 Sep 2018

7 min read

How Artificial Intelligence and Machine Learning can turbocharge a Game Developer's career

Guest Contributor

06 Sep 2018

7 min read

Gaming - whether board games or games set in the virtual realm - has been a massively popular form of entertainment since time immemorial. In the pursuit of creating more sophisticated, thrilling, and intelligent games, game developers have delved into ML and AI technologies to fuel innovation in the gaming sphere. The gaming domain is the ideal experimentation bed for evolving technologies because not only do they put up complex and challenging problems for ML and AI to solve, they also pose as a ground for creativity - a meeting ground for machine learning and the art of interaction. Machine Learning and Artificial Intelligence in Gaming The reliance on AI for gaming is not a recent development. In fact, it dates back to 1949, when the famous cryptographer and mathematician Claude Shannon made his musings public about how a supercomputer could be made to master Chess. Then again, in 1952, a graduate student in the UK developed an AI that could play tic-tac-toe with ultimate perfection. Source : Medium However, it isn’t just ML and AI that are progressing through experimentations on games. Game development, too, has benefited a great deal from these pioneering technologies. AI and ML have helped enhance the gaming experience on many grounds such as gaming design, the interactive quotient, as well as the inner functionalities of games. The above mentioned AI use cases focus on two primary things: one is to impart enhanced realism in virtual gaming environment and the second is to create a more naturalistic interface between the gaming environment and the players. As of now, the focus of game developers, data scientists, and ML researchers lies in two specific categories of the gaming domain - games of perfect information and games of imperfect information. In games of perfect information, a player is aware of all the aspects of the game throughout the playing session, whereas, in games of imperfect information, players are oblivious to specific aspects of the game. When it comes to games of perfect information such as Chess and Go, AI has shown various instances of overpowering human intelligence. Back in 1997, IBM’s Deep Blue successfully defeated world Chess champion, Garry Kasparov in a six-game match. In 2016, Google’s AlphaGo emerged as the victor in a Go match scoring 4-1 after defeating South Korean Go champion, Lee Sedol. One of the most advanced chess AIs developed yet, Stockfish, uses a combination of advanced heuristics and brute force to compute numeric values for each and every move in a specific position in Chess. It also effectively eliminates bad moves using the Alpha-beta pruning search algorithm. While the progress and contribution of AI and ML to the field of games of perfect information is laudable, researchers are now intrigued by games of imperfect information. Games of imperfect information offer much more challenging situations that are essentially difficult for machines to learn and master. Thus, the next evolution in the world of gaming will be to create spontaneous gaming environment using AI technology, in which developers will build only the gaming environment and its mechanics instead of creating a game with pre-programmed/scripted plots. In such a scenario, the AI will have to confront and solve spontaneous challenges with personalized scenarios generated on the spot. Games like StarCraft and StarCraft II have stirred up massive interest among game researchers and developers. In these games, the players are only partially aware of the gaming aspects and the game is largely determined not just by the AI moves and the previous state of the game, but also by the moves of other players. Since in these games you will have little knowledge about your rival’s moves, you have to take decisions on the go and your moves have to be spontaneous. The recent win of OpenAI Five over amateur human players in Dota2 is a good case in point. OpenAI Five is a team of five neural networks that leverages an advanced version of Proximal Policy Optimization and uses a separate LSTM to learn identifiable strategies. The progress of OpenAI Five shows that even without human data, reinforcement learning can facilitate long-term planning, thus, allowing us to make further progress in the games of imperfect information. Career in Game Development With ML and AI As ML and AI continue to penetrate the gaming industry, it is creating a huge demand for talented and skilled game developers who are well-versed in these technologies. Today, game development is at a place where it’s no longer necessary to build games using time-consuming manual techniques. ML and AI have made the task of game developers easier as by leveraging these technologies, they can design and build innovative gaming environment, and test them automatically. The integration of AI and ML in the gaming domain is giving birth to new job positions like Gameplay Software Engineer (AI), Gameplay Programmer (AI), and Game Security Data Scientist, to name a few. The salaries of traditional game developers is in stark contrast with that of those having AI/ML skills. While the average salary of game developers is usually around $44,000, it can scale up to and over $1,20,000 if one possesses AI/ML skills. Gameplay Engineer Average salary - $73,000 - $1,16,000 Gameplay engineers are usually part of the core game dev team and are entrusted with the responsibility of enhancing the existing gameplay systems to enrich the player experience. Companies today demand for gameplay engineers who are proficient in C/C++ and well-versed with AI/ML technologies. Gameplay Programmer Average salary - $98,000 - $1,49,000 Gameplay programmers work in close collaboration with the production and design team to develop cutting edge features in the existing and upcoming gameplay systems. Programming skills are a must and knowledge of AI/ML technologies is an added bonus. Game Security Data Scientist Average salary - $73,000 - $1,06,000 The role of a gameplay security data scientist is to combine both security and data science approaches to detect anomalies and fraudulent behavior in games. This calls for a high degree of expertise in AI, ML, and other statistical methods. With impressive salaries and exciting job opportunities cropping up fast in the game development sphere, the industry is attracting some major talent towards it. Game developers and software developers around the world are choosing the field due to the promises of rapid career growth. If you wish to bag better and more challenging roles in the domain of game development, you should definitely try and upskill your talent and knowledge base by mastering the fields of ML and AI. Packt Publishing is the leading UK provider of Technology eBooks, Coding eBooks, Videos and Blogs; helping IT professionals to put software to work. It offers several books and videos on Game development with AI and machine learning. It’s never too late to learn new disciplines and expand your knowledge base. There are numerous online platforms that offer great artificial intelligent courses. The perk of learning from a registered online platform is that you can learn and grow at your own pace and according to your convenience. So, enroll yourself in one and spice up your career in game development! About Author: Abhinav Rai is the Data Analyst at UpGrad, an online education platform providing industry oriented programs in collaboration with world-class institutes, some of which are MICA, IIIT Bangalore, BITS and various industry leaders which include MakeMyTrip, Ola, Flipkart etc. Best game engines for AI game development Implementing Unity game engine and assets for 2D game development [Tutorial] How to use arrays, lists, and dictionaries in Unity for 3D game development

0
0
37006

article-image-quantum-expert-robert-sutor-explains-the-basics-of-quantum-computing

Packt Editorial Staff

12 Dec 2019

9 min read

Quantum expert Robert Sutor explains the basics of Quantum Computing

Packt Editorial Staff

12 Dec 2019

9 min read

What if we could do chemistry inside a computer instead of in a test tube or beaker in the laboratory? What if running a new experiment was as simple as running an app and having it completed in a few seconds? For this to really work, we would want it to happen with complete fidelity. The atoms and molecules as modeled in the computer should behave exactly like they do in the test tube. The chemical reactions that happen in the physical world would have precise computational analogs. We would need a completely accurate simulation. If we could do this at scale, we might be able to compute the molecules we want and need. These might be for new materials for shampoos or even alloys for cars and airplanes. Perhaps we could more efficiently discover medicines that are customized to your exact physiology. Maybe we could get a better insight into how proteins fold, thereby understanding their function, and possibly creating custom enzymes to positively change our body chemistry. Is this plausible? We have massive supercomputers that can run all kinds of simulations. Can we model molecules in the above ways today? This article is an excerpt from the book Dancing with Qubits written by Robert Sutor. Robert helps you understand how quantum computing works and delves into the math behind it with this quantum computing textbook. Can supercomputers model chemical simulations? Let’s start with C8H10N4O2 – 1,3,7-Trimethylxanthine. This is a very fancy name for a molecule that millions of people around the world enjoy every day: caffeine. An 8-ounce cup of coffee contains approximately 95 mg of caffeine, and this translates to roughly 2.95 × 10^20 molecules. Written out, this is 295, 000, 000, 000, 000, 000, 000 molecules. A 12 ounce can of a popular cola drink has 32 mg of caffeine, the diet version has 42 mg, and energy drinks often have about 77 mg. These numbers are large because we are counting physical objects in our universe, which we know is very big. Scientists estimate, for example, that there are between 10^49 and 10^50 atoms in our planet alone. To put these values in context, one thousand = 10^3, one million = 10^6, one billion = 10^9, and so on. A gigabyte of storage is one billion bytes, and a terabyte is 10^12 bytes. Getting back to the question I posed at the beginning of this section, can we model caffeine exactly on a computer? We don’t have to model the huge number of caffeine molecules in a cup of coffee, but can we fully represent a single molecule at a single instant? Caffeine is a small molecule and contains protons, neutrons, and electrons. In particular, if we just look at the energy configuration that determines the structure of the molecule and the bonds that hold it all together, the amount of information to describe this is staggering. In particular, the number of bits, the 0s and 1s, needed is approximately 10^48: 10, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000. And this is just one molecule! Yet somehow nature manages to deal quite effectively with all this information. It handles the single caffeine molecule, to all those in your coffee, tea, or soft drink, to every other molecule that makes up you and the world around you. How does it do this? We don’t know! Of course, there are theories and these live at the intersection of physics and philosophy. However, we do not need to understand it fully to try to harness its capabilities. We have no hope of providing enough traditional storage to hold this much information. Our dream of exact representation appears to be dashed. This is what Richard Feynman meant in his quote: “Nature isn’t classical.” However, 160 qubits (quantum bits) could hold 2^160 ≈ 1.46 × 10^48 bits while the qubits were involved in a computation. To be clear, I’m not saying how we would get all the data into those qubits and I’m also not saying how many more we would need to do something interesting with the information. It does give us hope, however. In the classical case, we will never fully represent the caffeine molecule. In the future, with enough very high-quality qubits in a powerful quantum computing system, we may be able to perform chemistry on a computer. How quantum computing is different than classical computing I can write a little app on a classical computer that can simulate a coin flip. This might be for my phone or laptop. Instead of heads or tails, let’s use 1 and 0. The routine, which I call R, starts with one of those values and randomly returns one or the other. That is, 50% of the time it returns 1 and 50% of the time it returns 0. We have no knowledge whatsoever of how R does what it does. When you see “R,” think “random.” This is called a “fair flip.” It is not weighted to slightly prefer one result over the other. Whether we can produce a truly random result on a classical computer is another question. Let’s assume our app is fair. If I apply R to 1, half the time I expect 1 and another half 0. The same is true if I apply R to 0. I’ll call these applications R(1) and R(0), respectively. If I look at the result of R(1) or R(0), there is no way to tell if I started with 1 or 0. This is just like a secret coin flip where I can’t tell whether I began with heads or tails just by looking at how the coin has landed. By “secret coin flip,” I mean that someone else has flipped it and I can see the result, but I have no knowledge of the mechanics of the flip itself or the starting state of the coin. If R(1) and R(0) are randomly 1 and 0, what happens when I apply R twice? I write this as R(R(1)) and R(R(0)). It’s the same answer: random result with an equal split. The same thing happens no matter how many times we apply R. The result is random, and we can’t reverse things to learn the initial value. Now for the quantum version, Instead of R, I use H. It too returns 0 or 1 with equal chance, but it has two interesting properties. It is reversible. Though it produces a random 1 or 0 starting from either of them, we can always go back and see the value with which we began. It is its own reverse (or inverse) operation. Applying it two times in a row is the same as having done nothing at all. There is a catch, though. You are not allowed to look at the result of what H does if you want to reverse its effect. If you apply H to 0 or 1, peek at the result, and apply H again to that, it is the same as if you had used R. If you observe what is going on in the quantum case at the wrong time, you are right back at strictly classical behavior. To summarize using the coin language: if you flip a quantum coin and then don’t look at it, flipping it again will yield heads or tails with which you started. If you do look, you get classical randomness. A second area where quantum is different is in how we can work with simultaneous values. Your phone or laptop uses bytes as individual units of memory or storage. That’s where we get phrases like “megabyte,” which means one million bytes of information. A byte is further broken down into eight bits, which we’ve seen before. Each bit can be a 0 or 1. Doing the math, each byte can represent 2^8 = 256 different numbers composed of eight 0s or 1s, but it can only hold one value at a time. Eight qubits can represent all 256 values at the same time This is through superposition, but also through entanglement, the way we can tightly tie together the behavior of two or more qubits. This is what gives us the (literally) exponential growth in the amount of working memory. How quantum computing can help artificial intelligence Artificial intelligence and one of its subsets, machine learning, are extremely broad collections of data-driven techniques and models. They are used to help find patterns in information, learn from the information, and automatically perform more “intelligently.” They also give humans help and insight that might have been difficult to get otherwise. Here is a way to start thinking about how quantum computing might be applicable to large, complicated, computation-intensive systems of processes such as those found in AI and elsewhere. These three cases are in some sense the “small, medium, and large” ways quantum computing might complement classical techniques: There is a single mathematical computation somewhere in the middle of a software component that might be sped up via a quantum algorithm. There is a well-described component of a classical process that could be replaced with a quantum version. There is a way to avoid the use of some classical components entirely in the traditional method because of quantum, or the entire classical algorithm can be replaced by a much faster or more effective quantum alternative. As I write this, quantum computers are not “big data” machines. This means you cannot take millions of records of information and provide them as input to a quantum calculation. Instead, quantum may be able to help where the number of inputs is modest but the computations “blow up” as you start examining relationships or dependencies in the data. In the future, however, quantum computers may be able to input, output, and process much more data. Even if it is just theoretical now, it makes sense to ask if there are quantum algorithms that can be useful in AI someday. To summarize, we explored how quantum computing works and different applications of artificial intelligence in quantum computing. Get this quantum computing book Dancing with Qubits by Robert Sutor today where he has explored the inner workings of quantum computing. The book entails some sophisticated mathematical exposition and is therefore best suited for those with a healthy interest in mathematics, physics, engineering, and computer science. Intel introduces cryogenic control chip, ‘Horse Ridge’ for commercially viable quantum computing Microsoft announces Azure Quantum, an open cloud ecosystem to learn and build scalable quantum solutions Amazon re:Invent 2019 Day One: AWS launches Braket, its new quantum service and releases

0
0
36680

article-image-neo4j-most-popular-graph-database

Amey Varangaonkar

02 Aug 2018

7 min read

Why Neo4j is the most popular graph database

Amey Varangaonkar

02 Aug 2018

7 min read

Neo4j is an open source, distributed data store used to model graph problems. It departs from the traditional nomenclature of database technologies, in which entities are stored in schema-less, entity-like structures called nodes, which are connected to other nodes via relationships or edges. In this article, we are going to discuss the different features and use-cases of Neo4j. This article is an excerpt taken from the book 'Seven NoSQL Databases in a Week' written by Aaron Ploetz et al. Neo4j's best features Aside from its support of the property graph model, Neo4j has several other features that make it a desirable data store. Here, we will examine some of those features and discuss how they can be utilized in a successful Neo4j cluster. Clustering Enterprise Neo4j offers horizontal scaling through two types of clustering. The first is the typical high-availability clustering, in which several slave servers process data overseen by an elected master. In the event that one of the instances should fail, a new master is chosen. The second type of clustering is known as causal clustering. This option provides additional features, such as disposable read replicas and built-in load balancing, that help abstract the distributed nature of the clustered database from the developer. It also supports causal consistency, which aims to support Atomicity Consistency Isolation and Durability (ACID) compliant consistency in use cases where eventual consistency becomes problematic. Essentially, causal consistency is delivered with a distributed transaction algorithm that ensures that a user will be able to immediately read their own write, regardless of which instance handles the request. Neo4j Browser Neo4j ships with Neo4j Browser, a web-based application that can be used for database management, operations, and the execution of Cypher queries. In addition to, monitoring the instance on which it runs, Neo4j Browser also comes with a few built-in learning tools designed to help new users acclimate themselves to Neo4j and graph databases. Neo4j Browser is a huge step up from the command-line tools that dominate the NoSQL landscape. Cache sharding In most clustered Neo4j configurations, a single instance contains a complete copy of the data. At the moment, true sharding is not available, but Neo4j does have a feature known as cache sharding. This feature involves directing queries to instances that only have certain parts of the cache preloaded, so that read requests for extremely large data sets can be adequately served. Help for beginners One of the things that Neo4j does better than most NoSQL data stores is the amount of documentation and tutorials that it has made available for new users. The Neo4j website provides a few links to get started with in-person or online training, as well as meetups and conferences to become acclimated to the community. The Neo4j documentation is very well-done and kept up to date, complete with well-written manuals on development, operations, and data modeling. The blogs and videos by the Neo4j, Inc. engineers are also quite helpful in getting beginners started on the right path. Additionally, when first connecting to your instance/cluster with Neo4j Browser, the first thing that is shown is a list of links directed at beginners. These links direct the user to information about the Neo4j product, graph modeling and use cases, and interactive examples. In fact, executing the play movies command brings up a tutorial that loads a database of movies. This database consists of various nodes and edges that are designed to illustrate the relationships between actors and their roles in various films. Neo4j's versatility demonstrated in its wide use cases Because of Neo4j's focus on node/edge traversal, it is a good fit for use cases requiring analysis and examination of relationships. The property graph model helps to define those relationships in meaningful ways, enabling the user to make informed decisions. Bearing that in mind, there are several use cases for Neo4j (and other graph databases) that seem to fit naturally. Social networks Social networks seem to be a natural fit for graph databases. Individuals have friends, attend events, check in to geographical locations, create posts, and send messages. All of these different aspects can be tracked and managed with a graph database such as Neo4j. Who can see a certain person's posts? Friends? Friends of friends? Who will be attending a certain event? How is a person connected to others attending the same event? In small numbers, these problems could be solved with a number of data stores. But what about an event with several thousand people attending, where each person has a network of 500 friends? Neo4j can help to solve a multitude of problems in this domain, and appropriately scale to meet increasing levels of operational complexity. Matchmaking Like social networks, Neo4j is also a good fit for solving problems presented by matchmaking or dating sites. In this way, a person's interests, goals, and other properties can be traversed and matched to profiles that share certain levels of equality. Additionally, the underlying model can also be applied to prevent certain matches or block specific contacts, which can be useful for this type of application. Network management Working with an enterprise-grade network can be quite complicated. Devices are typically broken up into different domains, sometimes have physical and logical layers, and tend to share a delicate relationship of dependencies with each other. In addition, networks might be very dynamic because of hardware failure/replacement, organization, and personnel changes. The property graph model can be applied to adequately work with the complexity of such networks. In a use case study with Enterprise Management Associates (EMA), this type of problem was reported as an excellent format for capturing and modeling the inter dependencies that can help to diagnose failures. For instance, if a particular device needs to be shut down for maintenance, you would need to be aware of other devices and domains that are dependent on it, in a multitude of directions. Neo4j allows you to capture that easily and naturally without having to define a whole mess of linear relationships between each device. The path of relationships can then be easily traversed at query time to provide the necessary results. Analytics Many scalable data store technologies are not particularly suitable for business analysis or online analytical processing (OLAP) uses. When working with large amounts of data, coalescing desired data can be tricky with relational database management systems (RDBMS). Some enterprises will even duplicate their RDBMS into a separate system for OLAP so as not to interfere with their online transaction processing (OLTP) workloads. Neo4j can scale to present meaningful data about relationships between different enterprise-marketing entities, which is crucial for businesses. Recommendation engines Many brick-and-mortar and online retailers collect data about their customers' shopping habits. However, many of them fail to properly utilize this data to their advantage. Graph databases, such as Neo4j, can help assemble the bigger picture of customer habits for searching and purchasing, and even take trends in geographic areas into consideration. For example, purchasing data may contain patterns indicating that certain customers tend to buy certain beverages on Friday evenings. Based on the relationships of other customers to products in that area, the engine could also suggest things such as cups, mugs, or glassware. Is the customer also a male in his thirties from a sports-obsessed area? Perhaps suggesting a mug supporting the local football team may spark an additional sale. An engine backed by Neo4j may be able to help a retailer uncover these small troves of insight. To summarize, we saw Neo4j is widely used across all enterprises and businesses, primarily due to its speed, efficiency and accuracy. Check out the book Seven NoSQL Databases in a Week to learn more about Neo4j and the other popularly used NoSQL databases such as Redis, HBase, MongoDB, and more. Read more Top 5 programming languages for crunching Big Data effectively Top 5 NoSQL Databases Is Apache Spark today’s Hadoop?

0
0
36566

Tech Guides - Data

Best practices for deploying self-service BI with Qlik Sense

Top 4 Business Intelligence Tools

Why use JVM (Java Virtual Machine) for deep learning

Say hello to Streaming Analytics

Why is Hadoop dying?

Machine Learning as a Service (MLaaS): How Google Cloud Platform, Microsoft Azure, and AWS are democratizing Artificial Intelligence

2018 is the year of graph databases. Here's why.

Healthcare Analytics: Logistic Regression to Reduce Patient Readmissions

Four self-service business intelligence user types in Qlik Sense

30 common data science terms explained

Trending Topics

Why Oracle is losing the Database Race

5 ways artificial intelligence is upgrading software engineering

How Artificial Intelligence and Machine Learning can turbocharge a Game Developer's career

Quantum expert Robert Sutor explains the basics of Quantum Computing

Why Neo4j is the most popular graph database

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access