Thursday, 20 September 2018

Allowed memory size exhausted with the PHP Simple HTML DOM Parser

The Simple HTML DOM Parser is a useful tool for extracting content from web pages using jQuery like syntax. If multiple pages are loaded into the parser the available memory will eventually be exhausted and a fatal error will occur.

Some example code

The example code below has an already defined array $urls which contains a list of URLs to download content and parse.
require_once('/path/to/simple_html_dom.php');
foreach($urls as $url) {
    $dom = file_get_html($url);
    // do some stuff here
}
Depending on the number of pages downloaded, their size and size of the associated DOM, and how much memory is available to PHP, the available memory may be exhausted and result in a fatal error, halting your script at that point.

The error message 

The error message when memory is exhausted will be similar to this:
PHP Fatal error: Allowed memory size of XYZ bytes exhausted (tried to allocate XYZ bytes) in [path]/simple_html_dom.php on line XYZ

The solution

The Simple HTML DOM Parser does not clean up memory in the DOM each time file_get_html or str_get_html is called so it needs to be done explicity each time you have finished with the current DOM. This is as simple as adding ->clear() at the end of a loop or when you've finished using it.
The revised example with clear() is as follows:
require_once('/path/to/simple_html_dom.php');
foreach($urls as $url) {
    $dom = file_get_html($url);
    // do some stuff here
    $dom->clear();
}
Assuming there is enough memory available to PHP to handle each individual page and associated DOM, the process will now no longer suffer from memory exhaustion.

Download PHP Simple HTML DOM Parser / Examples

The PHP Simple HTML DOM Parser can be downloaded from SourceForge where there are also several examples of extracting content from pages.

Related posts:

0 comments:

Post a Comment