Master Big Data Ingestion and Analytics with Flume, Sqoop, Hive and Spark [Video]
- FREE Subscription Access now
- $93.99 Video Buy
-
What do you get with a Packt Subscription?
- Instant access to this title and 7,500+ eBooks & Videos
- Constantly updated with 100+ new titles each month
- Breadth and depth in over 1,000+ technologies
-
Hadoop Introduction
-
Sqoop Import
- Sqoop Introduction
- Managing Target Directories
- Working with Different File Formats
- Working with Different Compressions
- Conditional Imports
- Split-by and Boundary Queries
- Field delimeters
- Incremental Appends
- Sqoop Hive Import
- Sqoop List Tables/Database
- Sqoop Import Practice1
- Sqoop Import Practice2
- Sqoop Import Practice3
-
Sqoop Export
-
Apache Flume
-
Apache Hive
-
Spark Introduction
-
Spark Transformations & Actions
- Map/FlatMap Transformation
- Filter/Intersection
- Union/Distinct Transformation
- GroupByKey/ Group people based on Birthday months
- ReduceByKey / Total Number of students in each Subject
- SortByKey / Sort students based on their rollno
- MapPartition / MapPartitionWithIndex
- Change number of Partitions
- Join / Join email address based on customer name
- Spark Actions
-
Spark RDD Practice
-
Spark Dataframes & Spark SQL
About this video
In this course, you will start by learning about the Hadoop Distributed File System (HDFS) and the most common Hadoop commands required to work with HDFS. Next, you’ll be introduced to Sqoop Import, which will help you gain insights into the lifecycle of the Sqoop command and how to use the import command to migrate data from MySQL to HDFS, and from MySQL to Hive.
In addition to this, you will get up to speed with Sqoop Export for migrating data effectively, along with using Apache Flume to ingest data. As you progress, you will delve into Apache Hive, external and managed tables, working with different files, and Parquet and Avro. Toward the concluding section, you will focus on Spark DataFrames and Spark SQL.
By the end of this course, you will have gained comprehensive insights into big data ingestion and analytics with Flume, Sqoop, Hive, and Spark.
All code and supporting files are available at - https://github.com/PacktPublishing/Master-Big-Data-Ingestion-and-Analytics-with-Flume-Sqoop-Hive-and-Spark
- Publication date:
- July 2019
- Publisher
- Packt
- Duration
- 5 hours 38 minutes
- ISBN
- 9781839212734