Monday, 24 September 2018

PHP – Fetch gzipped content over http with file_get_contents

The file_get_contents function is often used to quickly fetch a http url or resource. Usage is very simple and appears like this

1
$content = file_get_contents('http://www.google.com/');
However the file_get_contents does not get the contents compressed. It requests the server to send everything in plain text format. Most websites are capable of serving compressed content, if they are asked to do so in the http headers. Compressing the content saves bandwidth and speeds up the transfer process.
So the trick to get compressed content with file_get_contents is to send a specific http header that instructs the remote server to provide compressed content. Then the compressed content has to be uncompressed too to convert to original form. Here is a quick function to do that
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
function get_url($url)
{
    //user agent is very necessary, otherwise some websites like google.com wont give zipped content
    $opts = array(
        'http'=>array(
            'method'=>"GET",
            'header'=>"Accept-Language: en-US,en;q=0.8rn" .
                        "Accept-Encoding: gzip,deflate,sdchrn" .
                        "Accept-Charset:UTF-8,*;q=0.5rn" .
                        "User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:19.0) Gecko/20100101 Firefox/19.0 FirePHP/0.4rn"
        )
    );
 
    $context = stream_context_create($opts);
    $content = file_get_contents($url ,false,$context);
     
    //If http response header mentions that content is gzipped, then uncompress it
    foreach($http_response_header as $c => $h)
    {
        if(stristr($h, 'content-encoding') and stristr($h, 'gzip'))
        {
            //Now lets uncompress the compressed data
            $content = gzinflate( substr($content,10,-8) );
        }
    }
     
    return $content;
}
 
echo get_url('https://www.google.com/');
The function first sends the "Accept-Encoding" header in the request. Next if the server replies with content encoded with gzip, then it inflates the content back.

0 comments:

Post a Comment