Get Text Between Tags ~ Tech Blog

This example provides a method of retrieving text from between tags.By using regular expressions with the preg_match() or preg_match_all() functions, the parse is made to work extremely hard as PHP loops over and over the text to find matches. By using the DOM functions the speed is increased dramatically and parsing is much cleaner. This example shows how it might be done with preg_match_all().


<?php

 /**
 *
 * @get text between tags
 *
 * @param string (The string with tags)
 *
 * @param string $tagname (the name of the tag
 *
 * @return string (Text between tags)
 *
 */
 function getTextBetweenTags($string, $tagname)
 {
    $pattern = "/<$tagname>(.*?)<\/$tagname>/";
    preg_match($pattern, $string, $matches);
    return $matches[1];
 }?>

The above function is very basic, and will not attend to nested tags or check for broken tags. By making use of the PHP DOM extension these issues can be addressed.

The function itself takes three arguements.

$tag: The tag to find the text between
$html: The HTML or XML to be searched
$strict: Tells the function to load in HTML or XML mode, default is HTML mode

The third parameter if set to one allows the function to parse custom tags as found in XML and some XHTML documents.



<?php/**
 *
 * @get text between tags
 *
 * @param string $tag The tag name
 *
 * @param string $html The XML or XHTML string
 *
 * @param int $strict Whether to use strict mode
 *
 * @return array
 *
 */function getTextBetweenTags($tag, $html, $strict=0)
{
    /*** a new dom object ***/
    $dom = new domDocument;

    /*** load the html into the object ***/
    if($strict==1)
    {
        $dom->loadXML($html);
    }
    else
    {
        $dom->loadHTML($html);
    }

    /*** discard white space ***/
    $dom->preserveWhiteSpace = false;

    /*** the tag by its tag name ***/
    $content = $dom->getElementsByTagname($tag);

    /*** the array to return ***/
    $out = array();
    foreach ($content as $item)
    {
        /*** add node value to the out array ***/
        $out[] = $item->nodeValue;
    }
    /*** return the results ***/
    return $out;
}?>

In this example plain HTML is used and no third arguement is supplied to the function. This allows for invalid, or broken HTML. The third paragraph is missing a closing <p> tag, however, with the use of DOM and the loadHTML this deviation is allowed. The example will still parse the HTML and retrieve an array of all the text between all <a> anchor tags.


<?php

$html = '<body>
<h1>Heading</h1>
<a href="http://phpro.org">PHPRO.ORG</a>
<p>paragraph here</p>
<p>Paragraph with a <a href="http://phpro.org">LINK TO PHPRO.ORG</a></p>
<p>This is a broken paragraph
</body>';$content = getTextBetweenTags('a', $html);

foreach( $content as $item )
{
    echo $item.'<br />';
}?>

In this final example two custom tags are used such as may be found in XML or XHTML documents. The third parameter is set to one which tells the function to use XML mode and parse the custom tags.


<?php

$xhtml = '<html>
<body>
<para>This is a paragraph</para>
<para>This is another paragraph</para>
</body>
</html>';$content2 = getTextBetweenTags('para', $xhtml, 1);
foreach( $content2 as $item )
{
    echo $item.'<br />';
}?>

Tech Blog

Monday, 2 February 2015

Get Text Between Tags

0 comments:

Post a Comment

Total Pageviews

Achievement

Live Traffic

Followers

About Me

I V RAMANA

Recent Comments

Categories

Popular Posts

Hot Topics

Video

News

Comments

Recent

Bottom Ad [Post Page]

Recent Posts

Mysql - How to reset the administrator password in ISPConfig 3

Socialize

Blog Archive

Search This Blog

Post Top Ad

Archive

Post Bottom Ad

Author Details

About Me

Tags

Full width home advertisement

Pages

Post Page Advertisement [Top]

Climb the mountains

Monday, 2 February 2015

0 comments:

Post a Comment

Total Pageviews

Achievement

Live Traffic

Subscribe To

Followers

About Me

I V RAMANA

Recent Comments

Categories

Popular Posts

Hot Topics

Video

News

Comments

Recent

Bottom Ad [Post Page]

Recent Posts

Mysql - How to reset the administrator password in ISPConfig 3

Socialize

Blog Archive

Search This Blog

Post Top Ad

Archive

Post Bottom Ad

Author Details

About Me

Tags

Full width home advertisement

Pages

Post Page Advertisement [Top]

Climb the mountains