Web Scraping Using Python

Prince Patel
2 min readJul 30, 2021

--

Web scraping is data scraping used for extracting data from website.The web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.

To extract data using web scraping with python, you need to follow these basic steps:

  1. Find the URL that you want to scrape.
  2. Inspecting the Page.
  3. Find the data you want to extract.
  4. Write the code.
  5. Run the code and extract the data.
  6. Store the data in the required format.

Python Libraries Used For Web Scrapping

  1. Requests It allows you to send HTTP/1.1 requests with ease and it doesnot require to manually add query strings to your URLs, or to form-encode your POST data.
  2. BeautifulSoup is used for web scraping purposes to pull the data out of HTML and XML files. It creates a parse tree from page source code that can be used to extract data in a hierarchical and more readable manner.
  3. Pandas is mainly used for data analysis. Pandas allows importing data from various file formats such as comma-separated values, JSON, SQL, Microsoft Excel. Pandas allows various data manipulation operations such as merging, reshaping, selecting, as well as data cleaning, and data wrangling features.

Import the required puthon libraries

import requests
from bs4 import BeautifulSoup
import pandas as pd

Create Empty variables to store scraped data

Text = []

Now enter the URL from where you want the data. Requests library is used to make html requests to the server

res = requests.get('https://en.wikipedia.org/wiki/Apple_Inc.')soup = BeautifulSoup(res.text, 'html.parser')soup.select('mw-headline')for i in soup.select('.mw-headline'):print(i.text)Text.append(i.text)

After extracting the data, you might want to store it in a format. This format varies depending on your requirement. For this, we will store the extracted data in a CSV format.

df = pd.DataFrame({‘Text’:Text})df.to_csv(‘Apple.csv’, index=False, encoding=’utf-8')

By running the above code. Here is the snapshot of the csv file generated after running the code.

Apple.csv

Hence, we get to learn how to scrap data from the internet and format it for further analysis. You can see whole code here.

--

--

No responses yet