F5 Networks ASM contains a very neat feature called Web Scraping Protection that I wanted to cover briefly. What I would like to highlight is what the feature is and what it does when it is actively doing its job.
This was prompted by the fact that I noticed recently that there is not a lot of documentation available on the web regarding the F5 BIG-IP’s Web Scraping Protection mechanism and almost none regarding what it actually does to the underlying web page code presented to your end users.
Web scraping is defined as a computer software technique of extracting information from websites. The people people running the web scraper program typically save the contents of what is scraped and use it for their own means. Sometimes it is just for archiving purposes, such as Archive.org’s “WayBackMachine“. Several companies even sell what is considered by many to be legitimate commercial web scraping software. One such company is called Mozenda, who lists such clients as Microsoft, IBM and Citi.
But then there are the “Others” as I like to to call them. This can range from hackers with bad intentions to companies simply seeking a competitive advantage over another company. One example of this that I can think of dealt with a few websites who make their living by offering vacationing deals. So these leaders of their industry would publish airfares for many popular destinations on their websites and their competitors would use a computer program to scrape the pricing off of their pages. They would then take this pricing, subtract a few dollars, load it into another program and update the pricing on their own website thereby making their vacation deal offerings just a little cheaper than their competitors!
It does this by attempting to determine whether a web client source is a human or if it is a headless computer program. To do this it injects a piece of java script code into the headers of your HTTP traffic. I will not provide the full source code for the java script, but I will hopefully provide enough for those searching through Google to be able to find this page.
When you are viewing the web page being protected by an ASM and web scraping anomaly detection is being actively used to protect the web page you will see the following elements. To actually see these elements, open up Firefox, browse to the website in question and then right-click and select “View Source”. You should see a java script insert beginning very close to the top of the page that contains some of the following elements:
You can seen by looking at these events that it is looking for keyboard, mouse and other data to determine if the content is being looked at by a human or something that falls in the OTHER category. Once it has made a determination the web application security policy will follow whatever guidelines you have set under the policy settings.
So there you have it, yet one more reason why the F5 BIG-IP ASM is an excellent tool to be included in your defense in depth lineup.