Scraping table from WEB.
Let's proceed our web scraping journey and perform another practical task - scrap HTML table from web and save it to Excel file.
Python Knowledge Base: Make coding great again.
- Updated:
2024-12-01 by Andrey BRATUS, Senior Data Analyst.
This time we will use BeautifulSoup - another useful Python library for scrapping tasks.
Let's start from importing bs4 and pandas, most probably you wll need to install them first in your environment.
import requests
from bs4 import BeautifulSoup
import pandas as pd
Then we create an URL object and page object
url = 'https://weird-jokes.fun/test'
page = requests.get(url)
Now we save page object in xml format.
usoup = BeautifulSoup(page.text, 'lxml')
soup
Now we obtain information from table HTML tag .
table1 = soup.find('table')
table1
Then we obtain every title of columns with th HTML tag .
headers = []
for i in table1.find_all('th'):
title = i.text
headers.append(title)
Now we ready to create a dataframe with obtained headers.
mydata = pd.DataFrame(columns = headers)
It's time to fill data using for loop.
for j in table1.find_all('tr')[1:]:
row_data = j.find_all('td')
row = [i.text for i in row_data]
length = len(mydata)
mydata.loc[length] = row
And now its the end of our magic - saving table as XLS file.
mydata.to_excel("python-pro.xls", index=False)