This book is about Python for geospatial development, so in this section, you will learn how to use Python for HDFS operations and Hive queries. There are several database wrapper libraries with Python and Hadoop, but it does not seem like a single library has become a standout go-to library, and others, like Snakebite, don't appear ready to run on Python 3. In this section, you will learn how to use two libraries—PyHive and PyWebHDFS. You will also learn how you can use the Python subprocess module to execute HDFS and Hive commands.
To get PyHive, you can use conda
and the following command:
conda install -c blaze pyhive
You may also need to install the sasl
library:
conda install -c blaze sasl
The previous libraries will give you the ability to run Hive queries from Python. You will also want to be able to move files to HDFS. To do so, you can install pywebhdfs
:
conda install -c conda-forge pywebhdfs
The preceding command will install the library, and as always, you can...