Scraper scripts often need to extract all links on a given page. This can be done in a number of ways like regex, domdocument etc.

Here is simple code snippet to do this using domdocument.

/*

Function to get all links on a certain url using the DomDocument

*/

function get_links($link)

{

    //return array

    $ret = array();

    /*** a new dom object ***/

    $dom = new domDocument;

    /*** get the HTML (suppress errors) ***/

    @$dom->loadHTML(file_get_contents($link));

    /*** remove silly white space ***/

    $dom->preserveWhiteSpace = false;

    /*** get the links from the HTML ***/

    $links = $dom->getElementsByTagName('a');

    /*** loop over the links ***/

    foreach ($links as $tag)

    {

        $ret[$tag->getAttribute('href')] = $tag->childNodes->item(0)->nodeValue;

    }

    return $ret;

}

//Link to open and search for links

$link = "http://www.php.net";

/*** get the links ***/

$urls = get_links($link);

/*** check for results ***/

if(sizeof($urls) > 0)

{

    foreach($urls as $key=>$value)

    {

        echo $key . ' - '. $value . '<br >';

    }

}

else

{

    echo "No links found at $link";

}

Tech Blog

Monday, 24 September 2018

PHP : Get links on a page with DomDocument

Scraper scripts often need to extract all links on a given page. This can be done in a number of ways like regex, domdocument etc.

0 comments:

Post a Comment

Total Pageviews

Achievement

Live Traffic

Followers

About Me

I V RAMANA

Recent Comments

Categories

Popular Posts

Hot Topics

Video

News

Comments

Recent

Bottom Ad [Post Page]

Recent Posts

Mysql - How to reset the administrator password in ISPConfig 3

Socialize

Blog Archive

Search This Blog

Post Top Ad

Archive

Post Bottom Ad

Author Details

About Me

Tags

Full width home advertisement

Pages

Post Page Advertisement [Top]

Climb the mountains

Monday, 24 September 2018

0 comments:

Post a Comment

Total Pageviews

Achievement

Live Traffic

Subscribe To

Followers

About Me

I V RAMANA

Recent Comments

Categories

Popular Posts

Hot Topics

Video

News

Comments

Recent

Bottom Ad [Post Page]

Recent Posts

Mysql - How to reset the administrator password in ISPConfig 3

Socialize

Blog Archive

Search This Blog

Post Top Ad

Archive

Post Bottom Ad

Author Details

About Me

Tags

Full width home advertisement

Pages

Post Page Advertisement [Top]

Climb the mountains