Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Data Lake for Enterprises

You're reading from  Data Lake for Enterprises

Product type Book
Published in May 2017
Publisher Packt
ISBN-13 9781787281349
Pages 596 pages
Edition 1st Edition
Languages
Authors (3):
Vivek Mishra Vivek Mishra
Profile icon Vivek Mishra
Tomcy John Tomcy John
Profile icon Tomcy John
Pankaj Misra Pankaj Misra
Profile icon Pankaj Misra
View More author details

Table of Contents (23) Chapters

Title Page
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
Part 1 - Overview
Part 2 - Technical Building blocks of Data Lake
Part 3 - Bringing It All Together
Introduction to Data Comprehensive Concepts of a Data Lake Lambda Architecture as a Pattern for Data Lake Applied Lambda for Data Lake Data Acquisition of Batch Data using Apache Sqoop Data Acquisition of Stream Data using Apache Flume Messaging Layer using Apache Kafka Data Processing using Apache Flink Data Store Using Apache Hadoop Indexed Data Store using Elasticsearch Data Lake Components Working Together Data Lake Use Case Suggestions

Enterprise’s current state


As explained briefly in the previous sections, the current state of enterprise data in an organization can be summarized in bullets points as follows:

  • Conventional DW (Data Warehouse) /BI (Business Intelligence):
    • Refined/ cleansed data transferred from production business application using ETL.
    • Data earlier than a certain period would have already been transferred to a storage, which is hard to retrieve, such as magnetic tape storage.
    • Some of its notable deficiencies are as follows:
      • A subset of production data in a cleansed format exists in DW; for any new element in DW, effort has to be made
      • A subset of the data is again in DW, and the rest gets transferred to permanent storage
      • Usually, analysis is really slow, and it is optimized again to perform queries, which are, to an extent, defined
  • Siloed Big Data:
    • Some departments would have taken the right step in building big data. But departments generally don’t collaborate with each other, and this big data becomes siloed and doesn't give the value of a true big data for the enterprise.
    • Some of its deficiencies are as follows:
      • Because of its siloed nature, the analyst is again constrained and not able to mix and match data between departments.
      • A good amount of money would have been spent to build and maintain/manage this and usually over a period of time is not sustainable.
  • Myriad of non-connected applications:
    • There is a good amount of applications on premises and on cloud.
    • Applications apart from churning structured data also produce unstructured data.
    • Some of the deficiencies are as follows:
      • Don't talk to each other
      • Even if it talks, data scientists are not able to use it in an effective way to transform the enterprise in a meaningful way
      • Replication of technology usage for handling many aspects in each business application

We wouldn't say that creating or investing in Data lake is a silver bullet to solve all the aforementioned deficiencies. But it is definitely a step in the right direction, and every enterprise should at least spend some time discussing whether this is indeed required, and if it is a yes, don't deliberate over it too much and take the next step in the path of implementation.

Data lake is an enterprise initiative, and when built, it has to be with the consent of all the stakeholders, and it should have buy-ins from the top executives. It can definitely find ways to improve processes by which enterprises do business. It can help the higher management know more about their business and can increase the success rate of the decision-making process.

You have been reading a chapter from
Data Lake for Enterprises
Published in: May 2017 Publisher: Packt ISBN-13: 9781787281349
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}