Easy way to make Web table Scraping with BeautifulSoup.

Scraping table from WEB.


Let's proceed our web scraping journey and perform another practical task - scrap HTML table from web and save it to Excel file.

Web table Scraping with BeautifulSoup.


This time we will use BeautifulSoup - another useful Python library for scrapping tasks.



Let's start from importing bs4 and pandas, most probably you wll need to install them first in your environment.



import requests
from bs4 import BeautifulSoup
import pandas as pd

Then we create an URL object and page object



url = 'https://weird-jokes.fun/test'
page = requests.get(url)


Now we save page object in xml format.



usoup = BeautifulSoup(page.text, 'lxml')
soup

Now we obtain information from table HTML tag .



table1 = soup.find('table')
table1

Then we obtain every title of columns with th HTML tag .



headers = []
for i in table1.find_all('th'):
 title = i.text
 headers.append(title)


Now we ready to create a dataframe with obtained headers.



mydata = pd.DataFrame(columns = headers)

It's time to fill data using for loop.



for j in table1.find_all('tr')[1:]:
 row_data = j.find_all('td')
 row = [i.text for i in row_data]
 length = len(mydata)
 mydata.loc[length] = row

And now its the end of our magic - saving table as XLS file.



mydata.to_excel("python-pro.xls", index=False) 




See also related topics: