Before getting started with SAP Lumira, you need to learn about data discovery. Maybe this term is not new to you, however, we need to clarify what it is in the case of SAP Lumira. In addition, it will be interesting and useful to learn some of the theory.
In this chapter you will learn:
What data discovery is, and how it complements a traditional data warehouse (DWH) and business intelligence (BI)
Data discovery terms
Common organizational architecture and the role of data discovery in the organization
We meet with one of the most powerful and flexible data discovery tools – SAP Lumira
We meet Unicorn Fashion, an e-commerce retail company
We are living in a century of information technology. There are a lot of electronic devices around us which generate lots of data. For example, you can surf the Internet, visit a couple of news portals, order new Nike Air Max shoes from a web store, write a couple of messages to your friend, and chat on Facebook. Your every action produces data. We can multiply that action by the amount of people who have access to the internet or just use a cell phone, and we get really BIG DATA. Of course, you have a question: how big is it? Now, it starts from terabytes or even petabytes. The volume is not the only issue; moreover, we struggle with the variety of data. As a result, it is not enough to analyze only the structured data. We should dive deep in to unstructured data, such as machine data which are generated by various machines.
Nowadays, we should have a new core competence—dealing with big data—, because these vast data volumes won't be just stored, they need to be analysed and mined for information that management can use in order to make right business decisions. This helps to make the business more competitive and efficient.
Unfortunately, in modern organizations there are still many manual steps needed in order to get data and try to answer your business questions. You need the help of your IT guys, or need to wait until new data is available in your enterprise data warehouse. In addition, you are often working with an inflexible BI tool, which can only refresh a report or export it in to Excel. You definitely need a new approach, which gives you a competitive advantage, dramatically reduces errors, and accelerates business decisions.
So, we can highlight some of the key points for this kind of analytics:
Integrating data from heterogeneous systems
Giving more access to data
Using sophisticated analytics
Reducing manual coding
Simplifying processes
Reducing time to prepare data
Focusing on self-service
Leveraging powerful computing resources
We could continue this list with many other bullet points.
If you are a fan of traditional BI tools (later in this chapter, we will compare BI and data discovery tools), you may think that it is almost impossible. Yes, you are right, it is impossible. That's why we need to change the rules of the game. As the business world changes, you must change as well.
Maybe you have guessed what this means, but if not, I can help you. In this book, I will focus on a new approach of doing data analytics, which is more flexible and powerful. It is called data discovery. Of course, we need the right way in order to overcome all the challenges of the modern world. That's why we have chosen SAP Lumira—one of the most powerful data discovery tools in the modern market. But before diving deep into this amazing tool, let's consider some of the challenges of data discovery that are in our path, as well as data discovery advantages.
Let's imagine that you have several terabytes of data. Unfortunately, it is raw unstructured data. In order to get business insight from this data you have to spend a lot of time in order to prepare and clean the data. In addition, you are restricted by the capabilities of your machine. That's why a good data discovery tool usually is combined of software and hardware. As a result, this gives you more power for exploratory data analysis.
Let's imagine that this entire big data store is in Hadoop or any NoSQL data store. You have to at least be at good programmer in order to do analytics on this data. Here we can find other benefit of a good data discovery tool: it gives a powerful tool to business users, who are not as technical and maybe don't even know SQL.
Tip
Apache Hadoop is an open source software project that enables distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Rather than relying on high-end hardware, the resilience of these clusters comes from the software's ability to detect and handle failures at the application layer.
A NoSQL data store is a next generation database, mostly addressing some of the following points: non-relational, distributed, open-source, and horizontally scalable.
You may be confused about data discovery and business intelligence technologies; it seems they are very close to each other or even BI tools can do all what data discovery can do. And why do we need a separate data discovery tool, such as, SAP Lumira?
In order to better understand the difference between the two technologies, you can look at the table below:
Enterprise BI |
Data discovery | |
---|---|---|
Key users |
All users |
Advanced analysts |
Approach |
Vertically-oriented (top to bottom), semantic layers, requests to existing repositories |
Vertically-oriented (bottom-up), mushup, putting data in the selected repository |
Interface |
Reports, dashboards |
Visualization |
Users |
Reporting |
Analysis |
Implementation |
By IT consultants |
By business users |
Let's consider the pros and cons of data discovery:
Pros:
Rapidly analyze data with a short shelf life
Ideal for small teams
Best for tactical analysis
Great for answering on-off questions quickly
Cons:
As a result, it is clear that BI and data discovery handles their own tasks and complement each other.