Jinsuh Lee - Purdue Krannert

Analytics Practice : Web Scraping - Dynamic content

Many Web pages these day are equiped with dynamic content. The dynamic content makes the webpages nicer and neater and enriches user experience with content.

However, this make web scraping a bit more difficult as web scrapers since the content is hidden in the html format. So web scrapers have to simulate a human user rather than a bot in order to correctly get the contents from dynamic webpages. Fortunately there are some packages in python that can perform the job, selenium. If you are using python you can easily install selenium with pip installer. Install pip python.

Once you have selenium package you may need a driver package for a browser. This is needed since a selenium package simulates a human usage rather than a bot. Here is where you can install chrome driver. Install chrome driver . You may use other web browser drivers such as firefox or IE.

Now, let's see whether you can observer the content of this example webpages below. The first one is conventional html followed by dynamic contents using xml and json. You easily view the html source and web scrape the conventional html but need to simulate a human interaction to web scrape the dynamic contents.

Example for dynamic contents

HTML

Hello html.
This is a non dynamic html content.

XML

Click to get XML

JSON

Click to get JSON

You may download the following python selenium codes to web scrape the above.

The following are some nice webpages you may refer to have more idea of dynamic webscraping.

Web Scraping PRO

Web Scraper testing ground