We are currently reading more and more about data scraping incidents at Facebook, LinkedIn and Clubhouse. But what exactly does this data collection entail? We will tell you.
What is meant by (data) scraping?
Data scraping refers to a technique in which a computer program extracts – scrapes together – data from the readable output of another program.
This scraping involves collecting and storing various types of data very quickly from websites, platforms, or social networks. Mostly to use them later for analysis purposes.
Where do we encounter scraping in everyday life?
Scraping is something we encounter very often in our everyday life. Search engines or price comparison sites use scraping to collect and display product feeds, images, prices and other related product details. In the process, information is gathered from many internet sites.
Scraping is also commonly used in a professional context. Scrapers – those who use scraping tools – collect data from various companies, for example, to gain insights about marketing information, user behavior, product ratings or product prices. As a result, they can gain insights about competitors or enable their own company to gain a competitive advantage.
Data scraping is also used to obtain personal information from employees or customers. For example, contact data and addresses that are then sold on to other companies. In some cases, cyber criminals can also gain access to this information.
Scraping – Legitimate or Abuse?
Typically, scraping involves the collection of publicly available data. What is decisive, however, is how they are used. Legitimate uses are, for example, the price comparison sites mentioned above.
However, the data can also be misused. For example, by simply copying professionally created texts from a company and using them for other websites. Or by sending phishing e-mails to e-mail addresses collected by scraping. Similarly, cyber criminals can use data scraping to copy Internet pages in great detail and use them for phishing attempts – for example, a login page for online banking.
Focus on social media channels
Currently, it was reported that several hundred million of users’ personal data have been published and are in circulation. They were “scraped”, for example, from the career network LinkedIn and the audio-based social network app Clubhouse.
Shortly before, Facebook also announced that it had been affected by data scraping. Several hundred million profiles are affected.
However, the affected platforms are not aware of any responsibility, as these data scrapings are not security incidents caused by hackers. Only personal data was scraped that either third parties can access anyway through the apps or public APIs or that the users themselves have published in their profiles. This includes names, profile names, and photo URLs, among other things.
Data found on the darknet
Scraping can have negative consequences for the individuals affected – even if these are not “classic hacker attacks” in which cyber criminals gain unauthorized access to systems, servers or networks.
In the last few days and weeks, several million profile data were offered for sale in well-known hacker forums. In some cases, the data was even made available free of charge. According to reports, sensitive data such as passwords or credit card information was apparently not affected. Nevertheless, there is a possibility that the published data could be combined with information skimmed elsewhere, thereby providing sufficient information for fraud attacks (e.g., phishing, brute force).
We advise greater attention
If you have a profile (or several) on the social networks mentioned, it could be that data about you is affected. Therefore, we advise you to be more attentive in the upcoming weeks.
Be especially cautious of suspicious messages and emails that could be attempts to use the collected data for fraud, phishing or social engineering attacks.
How can you protect yourself?
In general, it is advisable to be conscious when publishing information on the Internet and especially on social networks. Share only the content that you are comfortable with if it is seen by the general public. In addition, be aware that your data is shared quite officially with third parties. And that it is being collected and shared by individuals or through scraping tools, for example.