PHP: Truncating TextA common problem when creating dynamic web pages (where content is sourced from a database, content management system or external source such as an RSS feed) is that the input text can be too long and cause the page layout to 'break'. One solution is to truncate the text so that it fits on the page. This sounds simple, but often the results aren't as expected due to words and sentences being cut off at inappropriate points. Limiting or Truncating strings using PHPHere's a simple function that avoids the usual pitfalls and offers some flexibility in formatting: // Original PHP code by Chirp Internet: www.chirp.com.au
// Please acknowledge use of this code by including this header.
function myTruncate($string, $limit, $break=".", $pad="...")
{
// return with no change if string is shorter than $limit
if(strlen($string) <= $limit) return $string;
// is $break present between $limit and the end of the string?
if(false !== ($breakpoint = strpos($string, $break, $limit))) {
if($breakpoint < strlen($string) - 1) {
$string = substr($string, 0, $breakpoint) . $pad;
}
}
return $string;
}
For the examples below we use the following paragraph consisting of four sentences of varying length: $description = "The World Wide Web. When your average person on the street refers to
the Internet, they're usually thinking of the World Wide Web. The Web is basically a
series of documents shared with the world written in a coding language called Hyper Text
Markup Language or HTML. When you see a web page, like this one, you downloaded a document
from another computer which has been set up as a Web Server.";
Truncation at sentence breaksThe default action is to break on the first "." after $limit characters and then pad with "...". That means that the output will always be longer than $limit characters, but only as far as the next $break character. Further down the page you can find a function that returns a string that is always shorter than $limit. $shortdesc = myTruncate($description, 300);
echo "<p>$shortdesc</p>";
The World Wide Web. When your average person on the street refers to the Internet, they're usually thinking of the World Wide Web. The Web is basically a series of documents shared with the world written in a coding language called Hyper Text Markup Language or HTML. When you see a web page, like this one, you downloaded a document from another computer which has been set up as a Web Server. (not truncated) $shortdesc = myTruncate($description, 200);
echo "<p>$shortdesc</p>";
The World Wide Web. When your average person on the street refers to the Internet, they're usually thinking of the World Wide Web. The Web is basically a series of documents shared with the world written in a coding language called Hyper Text Markup Language or HTML... $shortdesc = myTruncate($description, 100);
echo "<p>$shortdesc</p>";
The World Wide Web. When your average person on the street refers to the Internet, they're usually thinking of the World Wide Web... Truncation at word breaksIf your text consists of long sentences or you need precise control over the length then breaking on a space might be better. Some clarity is lost as the sentences are broken up, but at least the words remain intact. $shortdesc = myTruncate($description, 300, " ");
echo "<p>$shortdesc</p>";
The World Wide Web. When your average person on the street refers to the Internet, they're usually thinking of the World Wide Web. The Web is basically a series of documents shared with the world written in a coding language called Hyper Text Markup Language or HTML. When you see a web page, like this... $shortdesc = myTruncate($description, 200, " ");
echo "<p>$shortdesc</p>";
The World Wide Web. When your average person on the street refers to the Internet, they're usually thinking of the World Wide Web. The Web is basically a series of documents shared with the world written... $shortdesc = myTruncate($description, 100, " ");
echo "<p>$shortdesc</p>";
The World Wide Web. When your average person on the street refers to the Internet, they're usually thinking... You'll notice that the final text length is now closer to $limit characters, which makes sense as there are a lot more spaces in the normal text than periods. Truncating to a maximum lengthAs noted previously, the function presented above will always return a string slightly longer than $limit characters, up to the next $break character. For those with stricter requirements, here's an alternative function that will truncate text to the $break character before rather than after $limit: // Original PHP code by Chirp Internet: www.chirp.com.au
// Please acknowledge use of this code by including this header.
function myTruncate2($string, $limit, $break=" ", $pad="...")
{
// return with no change if string is shorter than $limit
if(strlen($string) <= $limit) return $string;
$string = substr($string, 0, $limit);
if(false !== ($breakpoint = strrpos($string, $break))) {
$string = substr($string, 0, $breakpoint);
}
return $string . $pad;
}
Note that the default value for $break has changed to the space character. Using the "." character as the breakpoint is now dangerous as there might only be one or two sentences in your text and you could end up with very few words left over. Here you can see the output of this new function: $shortdesc = myTruncate2($description, 300);
echo "<p>$shortdesc</p>";
The World Wide Web. When your average person on the street refers to the Internet, they're usually thinking of the World Wide Web. The Web is basically a series of documents shared with the world written in a coding language called Hyper Text Markup Language or HTML... $shortdesc = myTruncate2($description, 200);
echo "<p>$shortdesc</p>";
The World Wide Web. When your average person on the street refers to the Internet, they're usually thinking of the World Wide Web... $shortdesc = myTruncate2($description, 100);
echo "<p>$shortdesc</p>";
The World Wide Web... Truncating by wordsIf you simply want to truncate text to a certain number of words, then you don't need any fancy functions, the code below will do the trick: <?PHP
$numwords = 10;
preg_match("/([\S]+\s*){0,$numwords}/", $description, $regs);
$shortdesc = trim($\regs[0]);
?>
In other words, match up to 10 occurences of 'one or more non-space characters followed by zero or more whitespace characters': The World Wide Web. When your average person on the (10 words) For more information on trucating text, wrapping text to fit in a column, or splitting content evenly over two or more columns, see the related article on Word Wrapping. Restoring tags in truncated HTMLA lot of people have asked questions about how to deal with HTML tags in truncated text. Obviously the simplest approach is to remove any tags from the string to be truncated, but that's not always good enough for real world applications where formatting is important. This function accepts a string that contains HTML tags and will attempt to automatically close any tags that have been left open. There is an assumption here that the HTML was valid before it was truncated, so all tags were opened and closed in a valid order: // Original PHP code by Chirp Internet: www.chirp.com.au
// Please acknowledge use of this code by including this header.
function restoreTags($input)
{
$opened = array();
// loop through opened and closed tags in order
if(preg_match_all("/<(\/?[a-z]+)>?/i", $input, $matches)) {
foreach($matches[1] as $tag) {
if(preg_match("/^[a-z]+$/i", $tag, $regs)) {
// a tag has been opened
if(strtolower($regs[0]) != 'br') $opened[] = $regs[0];
} elseif(preg_match("/^\/([a-z]+)$/i", $tag, $regs)) {
// a tag has been closed
unset($opened[array_pop(array_keys($opened, $regs[1]))]);
}
}
}
// close tags that are still open
if($opened) {
$tagstoclose = array_reverse($opened);
foreach($tagstoclose as $tag) $input .= "</$tag>";
}
return $input;
}
Please Note: This function is experimental so please use it with caution and give feedback using the Feedback link at the bottom of the page. If you've found a case where it doesn't work, please provide an example so we can check it out. Neither the truncation script nor the above function for restoring HTML tags is designed to handle text that contains image tags, or hyperlinks. They are designed simply for truncating plain text and HTML tags with no attributes. For a more comprehensive solution see the links in the Feedback section below. Working ExampleYou can use the form below to see how the truncate function (myTruncate) works - with the extra option of closing any open HTML tags (the restoreTags function). Related Articles
References
Feedback and Questions15 March 2007: Erik (VectorSector) says: Thanks for a great explanation of truncating text (separate words at that!). You are very smart. 27 November 2007: Jonathan P L Spooner says: Hi there- A very common problem. Try something like the following to insert a 'soft' hyphen if a word is longer than a certain length. 23 May 2008: Nidhesh (DesignTeam) says: Great code. Simple but powerful. Thanks. 8 November 2008: Daniel Peraza says: Hi, excellent article! An excellent question, but unfortunately no simple answer. When you truncate raw (X)HTML you always run the risk of removing a closing tag and unbalancing the code. 11 December 2008: Erik Spaan says: Hi, nice article. I'm using smarty.net as a template system within Zikula CMS. There is a truncate function there already that does more or less you myTruncate. 27 January 2009: Joey says: For truncating HTML code, CakePHP's TextHelper class has a function called truncate that works quite well. You can find it in this file: trac.cakephp.org/browser/tags/1.2.1.8004/cake/libs/view/helpers/text.php Thanks for the link. That's a much more comprehensive script and is a better option if you're working with raw HTML. A bit long to show here though 6 February 2009: Panayiotis Karabassis says: Great article thanks! Thanks for the feedback. The code has been updated now to fix this problem - and is also much shorter 30 March 2009: Colin McKinnon says: Neat code - thanks for publishing it. But there seems to be a bug in it. Hi Colin, the code as written DOES NOT cater for HTML tags with attributes. Only for simple <p>, <br>, <b>, <i>, etc. For a more comprehensive solution check out the links here in the Feedback section. 24 October 2009: Gavin says: Thanks a lot, I was hoping to find a solution to this problem (regarding basic HTML tags and truncation) - this seems to work a charm. Thanks. It seems this this page is expanding horizontally because Firefox is still ignoring the soft-hyphens (­) in the long string example above - despite recent promises to the contrary. In all other browsers it should be ok. |
|
|
© Copyright 2010 Chirp Internet
- Page Last Modified: 22 November 2009
|
|