Getting data into R by scraping the web using the rvest package
In this section, we will focus on web scraping and how to implement it using the rvest package.
Web scraping is the procedure of converting unstructured data into a structured format. Structured data can be easily accessed and used. We will use R for scraping the data of most popular feature films from the IMDb website.
The following steps are implemented to get data into R using the rvest package:
Install the rvest package. It is mandatory to install it, as it does not come as a built-in library:
> install.packages('rvest') package 'rvest' successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\Radhika\AppData\Local\Temp\RtmpMvNUA5\downloaded_packages
Include the installed package in R's workspace:
> library(rvest)
Let's start web scraping the IMDb website, which displays the most popular feature films in a given year: