READING HTML WEB PAGES IN PANDAS
Listing B.22 displays the contents of the HTML Web page abc.html, and Listing B.23 shows the contents of read_html_page.py that illustrates how to read the contents of an HTML Web page from Pandas. Note that this code will only work with Web pages that contain at least one HTML <table> element.
Listing B.22: abc.html
<html>
<head>
</head>
<body>
<table>
<tr>
<td>hello from abc.html!</td>
</tr>
</table>
</body>
</html>
Listing B.23: read_html_page.py
import pandas as pd
file_name="abc.html"
with open(file_name, "r") as f:
dfs = pd.read_html(f.read())
print("Contents of HTML Table(s) in the HTML Web
Page:")
print(dfs)
Listing B.23 starts with an import statement, followed by initializing the variable file_name to abc.html that is displayed in Listing B.22. The next code snippet initializes the variable dfs as a data frame with...