Very often when building web scrapers (and lots of other scripts), you’ll run into one of these situations:
- You want to send the program’s results to someone else
- You’re running the script on a remote server and you want automatic, real-time reports on results (e.g. updates on price information from an online retailer, an update indicating a competing company has made changes to their job openings site)
One easy and effective solution is to have your web scraping scripts automatically email their results to you (or anyone else that’s interested).
It turns out this is extremely easy to do in Python. All you need is a Gmail account and you can piggyback on Google’s Simple Mail Transfer Protocol (SMTP) servers. I’ve found this technique really useful, especially for a recent project I created to send myself and my family monthly financial updates from a program that does some customized calculations on our Mint account data.
The first step is importing the built-in Python packages that will do most of the work for us:
import smtplib from email.mime.text import MIMEText
smtplib is the built-in Python SMTP protocol client that allows us to connect to our email account and send mail via SMTP.
MIMEText class is used to define the contents of the email. MIME (Multipurpose Internet Mail Extensions) is a standard for formatting files to be sent over the internet so they can be viewed in a browser or email application. It’s been around for ages and it basically allows you to send stuff other than ASCII text over email, such as audio, video, images, and other good stuff. The example below is for sending an email that contains HTML.
Here is example code to build your MIME email:
sender = 'firstname.lastname@example.org' receivers = ['email@example.com', 'firstname.lastname@example.org'] body_of_email = 'String of html to display in the email' msg = MIMEText(body_of_email, 'html') msg['Subject'] = 'Subject line goes here' msg['From'] = sender msg['To'] = ','.join(receivers)
MIMEText object takes in the email message as a string and also specifies that the message has an html “subtype”. See this site for a useful list of MIME media types and the corresponding subtypes. Check out the Python email.mime docs for other classes available to send other types of MIME messages (e.g. MIMEAudio, MIMEImage).
Next, we connect to the Gmail SMTP server with host
‘smtp.gmail.com’ and port 465, login with your Gmail account credentials, and send it off:
s = smtplib.SMTP_SSL(host = 'smtp.gmail.com', port = 465) s.login(user = 'your_username', password = ‘your_password') s.sendmail(sender, receivers, msg.as_string()) s.quit()
Heads up: notice that the list of email recipients needs to be expressed as a string in the assignment to
msg[‘From’] (with each email separated by a comma), and expressed as a Python list when specified in
s.sendmail(sender, receivers, msg.as_string(). (For quite a while, I was banging my head against the wall trying to figure out why the message was only sending to the first recipient or not sending at all, and this was the source of the error. I finally came across this StackExchange post which solved the problem.)
As a last step, you need to change your Gmail account settings to allow access to “less secure apps” so your Python script can access your account and send emails from it (see instructions here). A scraper running on your computer or another machine is considered “less secure” because your application is considered a third party and it is sending your credentials directly to Gmail to gain access. Instead, third party applications should be using an authorization mechanism like OAuth to gain access to aspects of your account (see discussion here).
Of course, you don’t have to worry about your own application accessing your account since you know it isn’t acting maliciously. However, if other untrusted applications can do this, they may store your login credentials without telling you or doing other nasty things. So, allowing access from less secure apps makes your Gmail account a little less secure.
If you’re not comfortable turning on access to less secure apps on your personal Gmail account, one option is to create a second Gmail account solely for the purpose of sending emails from your applications. That way, if that account is compromised for some reason due to less secure app access being turned on, the attacker would only be able to see sent mail from the scraper.For access to my shared Anki deck and Roam Research notes knowledge base as well as regular updates on tips and ideas about spaced repetition and improving your learning productivity, join "Download Mark's Brain".
3 thoughts on “Setting up Email Updates for your Scraper using Python and a Gmail Account”
This is awesome. I love finding individuals who’s interests collide with my own. Id love to pick your brain and connect. In your experience, what is the best language for building web crawlers? Heres a good resource for building with Python.
Enjoyed looking through this, very good stuff, appreciate it.