Tuesday, 18 September 2018

Get the first sentence with PHP

Sometimes you may need to get the first sentence from a block of content for use as e.g. the meta description for a page. This post looks at how to get the first sentence using PHP.

Warning...

As pointed out by one of the comments the method presented in this post isn't actually very useful because a sentence can contain the . character in places other than at the end, and it can also end with " ! ? as well as .
Even the first sentence in this post would fail because it would return "Sometimes you may need to get the first sentence from a block of content for use as e." breaking after the first . in e.g. instead of at the actual end of the sentence.
If anyone has any ideas how to solve this issue please let me know and I can update this post. For what it's worth, the rest of the original post follows...


Example text

The example text used is as follows, which comes from lipsum.com, a lorem ipsum generator:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque tellus. Quisque id eros sed lacus scelerisque convallis. Sed mattis, augue in ultricies tristique, metus justo placerat est, vitae vestibulum purus urna tempus sapien. Vestibulum tincidunt nisi elementum neque. Vestibulum tincidunt commodo diam. Maecenas vitae nisl ut justo aliquam gravida. Donec fringilla enim tincidunt risus.

The code

Using a combination of the PHP functions strpos() and substr() we can extract the first sentence from the above text like so by looking for the location of the first period / full stop in the content and returning everything up to and including it.
function first_sentence($content) {

    $pos = strpos($content, '.');
    return substr($content, 0, $pos+1);
   
}
Then doing this:
echo first_sentence($content);
would output this:
Lorem ipsum dolor sit amet, consectetur adipiscing elit.

What if there's no periods / full stops?

The first example assumes that would be at least one period / full stop in the content. If there isn't, the example code will simply return the first letter from the passed in string.
This isn't ideal, so we can modify the first_sentence() function to use strpos() to check for a full stop, and if there isn't one just return the whole string instead:
function first_sentence($content) {

    $pos = strpos($content, '.');
       
    if($pos === false) {
        return $content;
    }
    else {
        return substr($content, 0, $pos+1);
    }
   
}

Automatically removing HTML code

And finally, we'll modify the code to remove any HTML tags and entities. If the source content is always plain text then you won't need to do this step, but if it can then you'll need to clean it up first.
You may not need to use the html_entity_decode part (which converts e.g. &amp; to &) but you will need to strip the tags, otherwise in <p>blah blah blah.</p> you'd end up with <p>blah blah blah. without the closing </p> tag. Also it's possible your HTML tags may contain . characters which would falsely indicate the end of the sentence.
function first_sentence($content) {

    $content = html_entity_decode(strip_tags($content));
    $pos = strpos($content, '.');
       
    if($pos === false) {
        return $content;
    }
    else {
        return substr($content, 0, $pos+1);
    }
   
}

Conclusion

It's easy to extract the first sentence from some content using the PHP functions strpos() and substr() by looking for the first occurence of a period or full stop. The final example function in this post combines this with a fallback in case the content does not contain a full stop and cleans the content from HTML tags and entities.

Related posts:

0 comments:

Post a Comment