Mastering Apache Spark
This course has been retired. Check out the alternatives below
- $12.99 eBook + Subscription Buy
-
What do you get with a Packt Subscription?
- Instant access to this title and 7,500+ eBooks & Videos
- Constantly updated with 100+ new titles each month
- Breadth and depth in over 1,000+ technologies
-
Apache Spark
- Apache Spark
- Overview
- Cluster design
- Cluster management
- Performance
- Cloud
- Summary
-
Apache Spark MLlib
- Apache Spark MLlib
- The environment configuration
- Classification with Naïve Bayes
- Clustering with K-Means
- ANN – Artificial Neural Networks
- Summary
-
Apache Spark Streaming
- Apache Spark Streaming
- Overview
- Errors and recovery
- Streaming sources
- Summary
-
Apache Spark SQL
- Apache Spark SQL
- The SQL context
- Importing and saving data
- DataFrames
- Using SQL
- User-defined functions
- Using Hive
- Summary
-
Apache Spark GraphX
- Apache Spark GraphX
- Overview
- GraphX coding
- Mazerunner for Neo4j
- Summary
-
Graph-based Storage
- Graph-based Storage
- Titan
- TinkerPop
- Installing Titan
- Titan with HBase
- Titan with Cassandra
- Accessing Titan with Spark
- Summary
-
Extending Spark with H2O
- Extending Spark with H2O
- Overview
- The processing environment
- Installing H2O
- The build environment
- Architecture
- Sourcing the data
- Data Quality
- Performance tuning
- Deep learning
- H2O Flow
- Summary
-
Spark Databricks
- Spark Databricks
- Overview
- Installing Databricks
- AWS billing
- Databricks menus
- Account management
- Cluster management
- Notebooks and folders
- Jobs and libraries
- Development environments
- Databricks tables
- The DbUtils package
- Summary
-
Databricks Visualization
- Databricks Visualization
- Data visualization
- REST interface
- Moving data
- Further reading
- Summary