Web Scraping with Python

Why is Web Scraping Used?

Web scraping is used to collect large information from websites. But why does someone have to collect such large data from websites? To know about this, let’s look at the applications of web scraping:

What is Web Scraping?

Web scraping is an automated method used to extract large amounts of data from websites. The data on the websites are unstructured. Web scraping helps collect these unstructured data and store it in a structured form. There are different ways to scrape websites such as online Services, APIs or writing your own code. In this article, we’ll see how to implement web scraping with python.

Why is Python Good for Web Scraping?

Here is the list of features of Python which makes it more suitable for web scraping.

Libraries used for Web Scraping

Web Scraping Example : Scraping Flipkart Website

Step 1: Find the URL that you want to scrape

For this example, we are going scrape Flipkart website to extract the Price, Name, and Rating of Laptops. The URL for this page is https://www.flipkart.com/laptops/~buyback-guarantee-on-laptops-/pr?sid=6bo%2Cb5g&uniqBStoreParam1=val1&wid=11.productCard.PMU_V2.

Step 2: Inspecting the Page

The data is usually nested in tags. So, we inspect the page to see, under which tag the data we want to scrape is nested. To inspect the page, just right click on the element and click on “Inspect”.

When you click on the “Inspect” tab, you will see a “Browser Inspector Box” open.

Step 3: Find the data you want to extract

Let’s extract the Price, Name, and Rating which is in the “div” tag respectively.

Step 4: Write the code

from selenium import webdriver
from BeautifulSoup import BeautifulSoup
import pandas as pd

To configure webdriver to use Chrome browser, we have to set the path to chromedriver

driver = webdriver.Chrome(“/usr/lib/chromium-browser/chromedriver”)

Refer the below code to open the URL:

products=[] #List to store name of the product
prices=[] #List to store price of the product
ratings=[] #List to store rating of the product
driver.get(“<a href=”https://www.flipkart.com/laptops/">https://www.flipkart.com/laptops/</a>~buyback-guarantee-on-laptops-/pr?sid=6bo%2Cb5g&amp;amp;amp;amp;amp;amp;amp;amp;amp;uniq")

Github Link:
https://github.com/SunnyAgrawal1208/Data-Science

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store