Thursday, 30 August 2018

file_get_contents does not return anything on the html entry

file_get_contents() returns proper file contents on www.akaar.org but not on www.ptsda.org.

The main difference is that akaar.org is a php project and ptsda.org is html.
Basically I am building a web crawler in php. It didn't crawl through that particular site, when I successfully crawled through at least 150+ sites.

ptsda.org is returning this 403 (forbidden) error:
failed to open stream: HTTP request failed! HTTP/1.1 403 ModSecurity Action

So it looks like they have Apache ModSecurity protection in place to stop their content from being scraped in this way.

0 comments:

Post a Comment