This example provides a method of retrieving text from between tags.By using regular expressions with the preg_match() or preg_match_all() functions, the parse is made to work extremely hard as PHP loops over and over the text to find matches. By using the DOM functions the speed is increased dramatically and parsing is much cleaner. This example shows how it might be done with preg_match_all().
The above function is very basic, and will not attend to nested tags or check for broken tags. By making use of the PHP DOM extension these issues can be addressed.
The function itself takes three arguements.
- $tag
- The tag to find the text between
- $html
- The HTML or XML to be searched
- $strict
- Tells the function to load in HTML or XML mode, default is HTML mode
The third parameter if set to one allows the function to parse custom tags as found in XML and some XHTML documents.
In this example plain HTML is used and no third arguement is supplied to the function. This allows for invalid, or broken HTML. The third paragraph is missing a closing <p> tag, however, with the use of DOM and the loadHTML this deviation is allowed. The example will still parse the HTML and retrieve an array of all the text between all <a> anchor tags.
In this final example two custom tags are used such as may be found in XML or XHTML documents. The third parameter is set to one which tells the function to use XML mode and parse the custom tags.
<?php
$xhtml = '<html>
<body>
<para>This is a paragraph</para>
<para>This is another paragraph</para>
</body>
</html>';$content2 = getTextBetweenTags('para', $xhtml, 1);
foreach( $content2 as $item )
{
echo $item.'<br />';
}?>
0 comments:
Post a Comment