scrapers

22 July 2014


example of content theft from sugarpuffish blog
The problem with blogging is no one gives you a handbook when you create your blog. Even after three years I am still learning the ropes. I wanted to share a problem I discovered and had not seen discussed in the community. I figured if I don't know then there is a good chance others are in the same boat.

Every now and again, I have a quick Google search for my blog and glance over how I am ranking on pages. Last week I noticed two URLs that were cause for concern (pictured above). My posts are appearing on another site, which in turn will draw traffic away from my blog. This makes for one very grumpy Sarah! I spent over an hour researching how I could get the site reported and in doing so uncovered the term "Scraped Content". People who "Scrape" content use automated software that takes blog content from your RSS feed and posts it to their site like it is a new post. 

There is not much you can do to prevent scraping but there are steps which could limit the damage it causes. These are tips I have collected from reading other articles.

Contact
I found this option tricky because the website that has taken my content provides no information. You can use Whois Lookup to find the domain registrar/web host and get in contact with them.

Report to Google
Firstly, submit a DMCA (Digital Millennium Copyright Act) notice to Google. By submitting this form you are letting Google know the breach of copyright and asking for the website to be removed. A DMCA notice is the official route but I also uncovered a form allowing you to report the scraper site. I understand this will not get content removed but allows Google to gather information and we can only hope that it aids prevention of scraping in the future.

RSS Feed
There are two options available when it comes to your RSS Feed, either shorten the post or add a footer (you can’t do both). In your Dashboard (pictured below) change “Allow Blog Feed" from FULL to SHORT then only the first 400 characters of your blog post are visible, making it less attractive to a content thief. It shouldn't put off regular readers because they can still click through to the post and read it in full. The alternative is to add a footer which includes a link to your blog and/or social media. Use this code <hr /> <a href="http://www.myblog.com">My Blog Name</a>


You can see from my screen shot I added a short copyright message. Remember your message will be visible to genuine readers of your RSS Feed so keep it polite. The idea behind adding the RSS footer is it will be seen on the scraper site so readers are aware content has been stolen.

Submit URL to Google
I stumbled across an article about submitting your post URL to Google as soon as you publish. This may help your post index quicker than the scraper website. It's all to do with the way Googlebots crawl the internet.

Set up Google Authorship
I still haven't got my head around this one because it involves Google+ and linking it to your blog. There are plenty of tutorials out there. I understand that Authorship doesn't prevent theft but it helps build credibility and allows Google to identify your content as original.

Google Alerts
A number of sites suggest using Google Alerts to find copied content. The suggestion is to set up an alert using keywords or your blog title or a paragraph from your post. You then decide what types of website to search and how often you want a report. For those that publish to their blog regularly, I envisage that your inbox would soon be overflowing so you may not like this method in the long run.

Watermark Images
This is debatable whether it is worth your efforts. I'm not a fan of watermarking because photographs can easily be cropped and marks removed. However, I imagine when it comes to scraping it could be favourable if content is automatically stolen and not edited.

I hope this blog post has been helpful. We may not be able to stamp out scraping but at least use it to your advantage. If you have a blog on Wordpress it looks like you have some plug in options, I suggest researching as I'm only familiar with Blogger. Have you experienced content theft? How did you deal with it?

Sarah x

Instagram

Sugarpuffish. Theme by STS.