Let’s talk about web scraping with VBA in Excel – it’s something I’ve been digging into lately and it’s a must.
Now you know Excel is a champ at data management and analysis but the real magic happens when you integrate it with VBA turning Excel into a web scraping beast.
Unlocking the Power of Web Queries
You might be surprised to hear that Excel actually has a built-in feature called web queries.
It’s a simple way to grab data directly from websites especially those tables you see on web pages.
Think of it as having a built-in browser within Excel.
Let’s say you want to pull data from a website like Books to Scrape.
Here’s how you’d do it:
- Go to the “Data” tab in Excel.
- Click on “Get External Data” and choose “From Web.”
- Paste the URL of the Books to Scrape page in the window and hit “Go.”
- Excel will analyze the webpage and show you all the tables it can find. Select the one you want and click “OK.”
Boom! The data is loaded into your Excel spreadsheet.
Simple right? But here’s the catch – web queries are mainly for grabbing structured tables.
For all those other HTML elements like lists paragraphs and so on you’ll need something more powerful.
The Web Scraping Powerhouse: VBA
This is where VBA comes in.
It’s like the secret weapon in Excel’s arsenal.
Think of VBA as a mini programming language embedded within Excel.
It allows you to automate things customize Excel and even talk to the outside world including the internet.
Getting Started with VBA
You’ll need Microsoft 365 to work with VBA.
Here’s how you set up your Excel environment:
- Open Excel and create a new spreadsheet.
- Right-click the ribbon at the top and select “Customize the Ribbon.”
- Tick the box next to “Developer” and click “OK.”
- The “Developer” tab will appear. Click on it and then click on “Visual Basic” (or use the shortcut Alt + F11).
- Click “Insert” and then “Module” to create a new module.
You’ll see a blank area where you can write your VBA code.
VBA Fundamentals
If you’re new to VBA you might want to check out some online tutorials or Microsoft’s official learning page.
But here’s the gist:
- Procedures: These are like mini programs that carry out specific tasks. There are two types: sub-procedures and functions.
- Sub-Procedures: These are simple sets of instructions enclosed in
Sub
tags that execute a task but don’t return any data. - Functions: Reusable sets of code that you can call again and again potentially returning data.
A Simple VBA Example
Let’s create a script that opens Internet Explorer visits a website and then prints its HTML content to the Immediate Window (a debug window in VBA):
Sub PrintHTML()
Dim IE As Object URL As String HTML As String
' Set the website URL
URL = "https://www.example.com"
' Create a new instance of Internet Explorer
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True ' Make it visible (optional)
IE.Navigate URL
' Wait for the page to load
Do While IE.Busy = True Or IE.readyState <> 4
DoEvents
Loop
' Get the HTML content
HTML = IE.Document.body.innerHTML
' Print to Immediate Window
Debug.Print HTML
' Close Internet Explorer
IE.Quit
Set IE = Nothing
End Sub
To run this script click the green arrow icon above the code window or press F5. You’ll see the HTML code dumped into the Immediate Window.
Targeted Scraping with VBA
Now let’s take things a step further.
We can make VBA target specific elements on a webpage and extract data exactly what we need.
Let’s go back to Books to Scrape and grab all the book titles from the first page.
-
Inspect the page’s HTML. You’ll find the book titles within the
<article>
elements that have the classproduct_pod
. The titles themselves are within<h3>
tags specifically as thetitle
attribute of the<a>
tag inside. -
Here’s the modified VBA code:
Sub ScrapeBookTitles()
Dim IE As Object URL As String doc As HTMLDocument titles As Object title As Object
' Set the URL
URL = "https://books.toscrape.com/"
' Create an Internet Explorer instance
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.Navigate URL
' Wait for the page to load
Do While IE.Busy = True Or IE.readyState <> 4
DoEvents
Loop
' Get the HTML document
Set doc = IE.Document
' Find all elements with the class 'product_pod'
Set titles = doc.querySelectorAll(".product_pod")
' Loop through each element
Dim i As Long
For i = 0 To titles.Length - 1
' Get the book title
Set title = titles(i).querySelector("h3 a")
' Print to Excel spreadsheet
Sheet1.Cells(i + 1 1).Value = title.title
Next i
' Clean up
IE.Quit
Set IE = Nothing
End Sub
This script will open Books to Scrape find all the book titles and neatly output them into your Excel spreadsheet.
You can customize this code to extract any data you need from the page.
Proxy Power
As you get into serious web scraping using proxies becomes essential.
Proxies act as intermediaries between you and the website you’re scraping masking your IP address and helping you avoid rate limiting and bans.
Think of it like this: Websites sometimes get suspicious if the same IP address keeps hitting them with requests.
Proxies spread out your requests from different locations making your scraping look more natural.
To set up proxies in Windows you’ll need to configure your proxy settings:
- Go to “Settings” > “Network & Internet” > “Proxy.”
- Choose “Manual proxy setup” and then “Edit.”
- Enter the proxy server address and port number.
Now all your web requests will go through the proxy server.
You can also use proxy services like SmartProxy that offer dedicated residential mobile or datacenter proxies.
Expanding Your Horizons
Web scraping with VBA in Excel is a powerful combination allowing you to automate data gathering and integration with your Excel analysis workflows.
Here are some additional ideas to explore:
- Scheduled Scraping: Use VBA’s
Timer
function to run your scraping scripts on a regular basis updating your Excel data. - Data Manipulation: Combine your scraped data with Excel’s powerful formulas and functions to analyze and visualize your findings.
- API Integration: Use VBA to communicate with web APIs allowing you to access even more data.
Remember to scrape responsibly and ethically.
Respect websites’ terms of service and never use scraping for malicious purposes.
By mastering VBA web scraping you can unlock a wealth of information transforming Excel from a data management tool to a data-gathering powerhouse.