Web Scraping: How Bots Bypass Privacy Rules
Nowadays, it seems people are pretty concerned about their online data privacy. And honestly, they should be, especially with last year’s Equifax hacking debacle and the whole Facebook and Cambridge Analytica scandal that hit headlines this spring.
Luckily for the consumer, new laws like GDPR and other proposed pieces of legislation have made it harder for companies to collect and keep data with (or without) people’s consent. In some cases, platforms like Instagram and even Apple’s App Store have drastically reduced the amount of data third-party developers can pull from the platforms’ APIs.
But for businesses whose services rely on publicly available data, not having access to that wealth of information has definitely hurt their bottom lines. To stay afloat and get around new privacy measures, some companies have turned to a tried-and-true method of data aggregation: web scraping.
Web Scraping & Social Media Mining
Web scraping is pretty straightforward: scraper bots browse websites and copy whatever information they’re programmed to gather. The bots then compile their findings into a database, web page, or other document for future use. While some scraper bots are certainly used for malicious practices, many more perform less intrusive, at times benign tasks, such as indexing site content or scanning product prices.
When it comes to the walled gardens of social media, things get a little tricky. Since they can’t pull data directly from the platforms’ APIs, some third-party companies send out scraper bots to trawl through social media feeds and profiles for any publicly available data, such as likes, comments, and followers. Although the scraped data isn’t very detailed, it’s still useful for advertising and marketing purposes.
Most importantly, third-party companies don’t need users’ permission to collect this data, since technically, they’re not liable for what people willingly post on social media. And according to a 2017 court ruling, scraping public profiles is legal. A federal judge in California ruled in favor of data analysis platform hiQ Labs, saying the company was allowed to aggregate publicly available profile data from LinkedIn, despite the professional social network’s objections.
While many other social media companies appear to share LinkedIn’s sentiments, few are trying to stop bot behavior altogether. For platforms like Twitter, which actively encourages the use of automated accounts, cracking down on bot behavior seems counterproductive to the site’s core functions. Facebook and Instagram also inadvertently rely on bots to make themselves more attractive to advertisers and marketers. More bots mean higher engagement rates, meaning more data to leverage for ad purposes.
Protecting Your Data
With third-party bots crawling through feeds and social media sites not making much effort to stop it, consumers have every right to take charge of their online data. If you’re concerned about having your data scraped, consider these three tips:
Set Your Social Media Profiles to Private. The web scraping bots used by third-party companies only read what’s publicly available to them. If your accounts are set to private, the bots probably won’t be able to see your content and/or scrape your data.
Delete or Block Users You Don’t Know. Many scraper bots manifest as fake users that follow people en masse. If you have people on your friends list that seem suspicious, or you can’t verify their identity, consider removing them.
Screen All New Connection Requests. If you already have your account set to private, people need to send you a request to connect, which you can either accept or deny. If the person trying to add you doesn’t seem legitimate, don’t let them in.
Keeping your data secure on social media really is a matter of common sense. If you don’t want your data scraped, then don’t post anything you wouldn’t want shared with outside parties. As long as data scraping remains legal, expect these kinds of activities to continue indefinitely.
This post originally appeared on Anura.