Saturday, 27 June 2015

Extract a number of characters and words from text

This tutorial shows you how to extract in PHP a specified number of characters or words from a text.
- To get the first "n" characters from a string it's easy, you use the substr() function.
Example:
$text = 'Extract the text with first 45 characters from this string. Free PHP-MySQL courses and tutorials - http://coursesweb.net/';

$text = trim($text);                  // to remove whitespace from the beginning and end
$txt = substr($text, 0, 45);
echo $txt;       // Extract the text with first 45 characters fro
- The trim() function is used to to remove whitespace from the beginning and end of the string. It's a good practice to use this function when working with strings.
- The first character has index 0.

Extract a number of specified words

As you can notice in the example above, when we extract a specified number of characters, the last word can be chopped.
If you have a text from which you want to get a substring, and to add a link with "Read more ...", it's better to get entire the last word, as it is. In this case, instead of get a number of characters, we'll extract a specified number of words.
The words are separated by space, so, we can use a RegExp (Regular expression) pattern to represent a string with a number of spaces according to the number of words we want to extract, then we apply preg_match() with that RegExp to get the substring.
- To make sure we have only one space between words, we apply preg_replace('/\s\s+/', ' ', $text), that replaces multiple whitespaces with a single space character.
Here's an example with the same text:
$text = 'Extract the text with first 45 characters from this string. Free PHP-MySQL courses and tutorials - http://coursesweb.net/';
$nrw = 10;                         // the number of words we want to extract

$rgxwords = '/([^ ]*[ ]{0,1}){1,'.$nrw.'}/i';            // patern to get a number of words from string
$text = preg_replace('/\s\s+/', ' ', $text);             // replace multiple whitespaces whit single space
$text = trim($text);                                     // to remove whitespace from the beginning and end

// get the substring
if(preg_match($rgxwords, $text, $mtc)) $txt = $mtc[0];
else $txt = '';

echo $txt. '<a href="page_address" title="Read more">Read more...</a>';

/* Result:
 Extract the text with first 45 characters from this string. Read more...
*/

Stripping the tags

If the text-content contains HTML tags, when you extract a substring from that content, it is very posible to get opened tags, which will affect the content format when you add that substring to a page. In this case it's better to strip the tags, using strip_tags() function, before extract the number of characters, or words.

- Example, strip the tags and get a specified number of characters:
$text = 'Text content <i>with HTML tags</i>. Free courses for <b>web masters</b>: http://coursesweb.net/';

$text = trim(strip_tags($text));        // strip the tags and remove whitespace from the beginning and end
$txt = substr($text, 0, 30);            // get the first 30 characters
echo $txt;       // Text content with HTML tags. F

- Example, strip the tags and get a specified number of words:
$text = 'Text content <i>with HTML tags</i>. Free courses for <b>web masters</b>: http://coursesweb.net/';
$nrw = 10;                         // the number of words we want to extract

$rgxwords = '/([^ ]*[ ]{0,1}){1,'.$nrw.'}/i';            // patern to get a number of words from string

$text = trim(strip_tags($text));        // strip the tags and remove whitespace from the beginning and end
$text = preg_replace('/\s\s+/', ' ', $text);             // replace multiple whitespaces whit single space

if(preg_match($rgxwords, $text, $mtc)) $txt = $mtc[0];
else $txt = '';

echo $txt;           // Text content with HTML tags. Free courses for web masters:

0 comments:

Post a Comment