What is screen scraping and how does it relate to APIs?
Screen scraping is a common challenge for businesses with a significant online presence, such as B. Financial service providers and e-commerce companies. It can be referred to by many different names, such as B. Web Data Extraction, Web Scraping, Web Harvesting, etc. While screen scraping used to be seen primarily as a security challenge for front-end web applications, the changing nature of business applications is increasing the problem of scraping into the API security domain.
For example, business-to-consumer (B2C) architectures have evolved over time from monolithic web applications to new API-based front-end frameworks that can serve the needs of both web and mobile applications. Meanwhile, the increasing use of business-to-business (B2B) APIs by industry ecosystem partners creates even more potential scraping scenarios.
B2B APIs have different API consumers than B2C APIs, expanding the universe of potential data scraping scenarios. Some forms of scraping may be legitimate, but more commonly it’s used to abuse APIs. Examples can be:
- Aggregating information for use in non-sanctioned ways such as product descriptions and product reviews
- Gathering pricing information from e-commerce websites to inform competitive pricing strategies and offers, especially those with ever-changing pricing models such as travel, hotels, and car rentals to name a few
- Access to frequently changing information such as interest rates from financial sites or betting odds from gambling sites for competitive reasons.
In addition to undesirable forms of enabling data leakage, API scraping can put a heavy load on application infrastructure. And unfortunately, mitigating them is not as easy as rate caps or quotas. Many experienced players are adept at conducting “low and slow” scraping activities that fall below existing limits and quotas. This makes it harder to stop without disrupting legitimate API usage.
Additionally, the fact that API scraping is likely to work within these existing rate cap and quota parameters means most organizations have no visibility that it is actually taking place.
How do most companies protect themselves from API scraping?
Most organizations rely on rate limits and quotas to limit the ability to perform web scraping. While this isn’t a magic bullet for the reasons outlined above, it’s still an important first step. At the very least, it puts an upper limit on the amount of scraping that can occur.
Another important best practice is to ensure that clients connecting to APIs are valid. For example, if APIs are generally accessible via mobile devices, steps should be taken to ensure that the mobile client accessing the API has not been hacked, the integrity of the mobile device has not been compromised through jailbreaking, etc.
Some organizations may also use special bot mitigation tools to protect their web applications from automated scraping. These solutions add value to B2C API traffic. However, because they require specific browser or mobile application instrumentation, they are completely ineffective for B2B API scraping, where browsers and mobile apps are absent, which generally comes from a programmatic client. Similarly, compromised Internet of Things (IoT) or Internet of Everything (IoE) devices can be used to create “swarms” that do not originate from standard web or mobile application clients.
So in summary, even if you have rate limits and quotas in place, two main risks still remain:
- They remain open to low and slow scraping on B2C APIs.
- Authenticated B2B API traffic is completely unmonitored.
And these risks are more than theoretical. Earlier this year, a threat actor was able to do so Exploiting a Twitter API vulnerability Scrapping account details from an estimated 5.4 million users.
How does Neosec’s approach close these critical protection gaps?
The key advance Neosec brings to API security is the extension of API monitoring and analysis to authenticated traffic. B2B APIs present a much larger attack surface – and a potential path to greater business value.
Behavioral analytics at the authenticated user level are key to monitoring B2B APIs. This is the only way to determine if a seemingly legitimate, authenticated API consumer using no known attack patterns is scraping your APIs. This requires context, which can only be obtained by analyzing the same user’s API requests over a long period of time – even if they’ve changed access tokens more than 100 times.
Below is a summary of how Neosec’s approach can extend your API protection capabilities beyond traditional bot mitigation techniques.
Comparison of bot mitigation and API data scraping
|Top 10 OWASP API||bot mitigation||Neosec|
|What||UI-based API (B2C only)||Any API (B2C, B2B)|
|Where||In the browser||Via the API|
|As||Detects browser or mobile app and human user signals – assumes every human is good||Behavioral profiling of users and IPs|
|Impact on user experience||High||Low|
|persistence||Easier to bypass||Robust|
|Strengthen||Block high-volume automated scraping on websites||Detects a wide range of abuse and abuse by malicious insiders and attackers posing as legitimate users|
|Common use case for scraping||Scraping prices on the website
(for example: airlines, Playstation 5)
|Scraping of any API resource by any authenticated user – from resellers, partners, suppliers to customers|
*** This is a syndicated blog from Security Bloggers Network by Blog written by the Neosec team. Read the original post at: https://www.neosec.com/blog/how-do-you-protect-an-api-from-scraping