Python has always been amazing to extract information from internet. In this article, I am going to describe how to extract information using BeautifulSoup4 library with Python. I am expecting you to have basic to intermediate level of Python knowledge to understand this tutorial.
Step 1: We will first install requests and bs4 libraries using pip from our windows terminal. Linux and Mac users can proceed on how to install these libraries by searching into Google. It’s easy!
pip install requests
pip install bs4
Step 2: Import the requests library and BeautifulSoup class.
from bs4 import BeautifulSoup
Step 3: Create a function called CrawlWords(url) which is responsible for having all the scraping activities. Here, url is the parameter which will be the link of the website’s html page we want to scrap.
Step 4: Request the information from the html page and store them into text format inside a variable.
resource = requests.get(url).text
Step 5: Make the text data ready to analyze by using BeautifulSoup() function.