Monday, 2 February 2015

Get All URLs From Page

Get All URLs From Page

Ever needed to get all the urls from a string or a web page? No? Well, here is a script that will do it anyway. Basically it works on grabbbing anything beginning with http, or https, up to the next double comma " or a space. The results are then returned in an array where you can extract them for your use. 

<?php

$string 
'<a href="http://www.example.com">Example.com</a> has many links with
examples <a href="http://www.example.net/file.php">links</a> to many sites and
even urls without links like http://www.example.org just to fill the gaps and
not to forget this one http://phpro.org/tutorials/Introduction-to-PHP-Regex.html 
which has a space after it. The script has been modifiied from its original so now
it grabs ssl such as https://www.example.com/file.php also'
;
/**
 *
 * @get URLs from string (string maybe a url)
 *
 * @param string $string
 *
 * @return array
 *
 */
function getUrls($string)
{
    
$regex '/https?\:\/\/[^\" ]+/i';
    
preg_match_all($regex$string$matches);
    return (
$matches[0]);
}

 
$urls getUrls($string);

 foreach(
$urls as $url)
 {
    echo 
$url.'<br />';
 }
?>
The above script will output a list of urls from the string like this..
http://www.example.com
http://www.example.net/file.php
http://www.example.org
http://phpro.org/tutorials/Introduction-to-PHP-Regex.html
https://www.example.com/file.php

0 comments:

Post a Comment