PHP: Truncating Text
A common problem when creating dynamic web pages (where content is sourced from a database, content management system or external source such as an RSS feed) is that the input text can be too long and cause the page layout to 'break'.
One solution is to truncate the text so that it fits on the page. This sounds simple, but often the results aren't as expected due to words and sentences being cut off at inappropriate points.
Limiting or Truncating strings using PHP
Here's a simple function that avoids the usual pitfalls and offers some flexibility in formatting:
// Original PHP code by Chirp Internet: www.chirpinternet.eu
// Please acknowledge use of this code by including this header.
function myTruncate($string, $limit, $break = ".", $pad = "...")
{
// return with no change if string is shorter than $limit
if(strlen($string) <= $limit) return $string;
// is $break present between $limit and the end of the string?
if(false !== ($breakpoint = strpos($string, $break, $limit))) {
if($breakpoint < strlen($string) - 1) {
$string = substr($string, 0, $breakpoint) . $pad;
}
}
return $string;
}
For the examples below we use the following paragraph consisting of four sentences of varying length:
$description = "The World Wide Web. When your average person on the street refers to
the Internet, they're usually thinking of the World Wide Web. The Web is basically a
series of documents shared with the world written in a coding language called Hyper Text
Markup Language or HTML. When you see a web page, like this one, you downloaded a document
from another computer which has been set up as a Web Server.";
Truncation at sentence breaks
The default action is to break on the first "." after $limit characters and then pad with "...". That means that the output will always be longer than $limit characters, but only as far as the next $break character. Further down the page you can find a function that returns a string that is always shorter than $limit.
$shortdesc = myTruncate($description, 300);
echo "<p>$shortdesc</p>";
The World Wide Web. When your average person on the street refers to the Internet, they're usually thinking of the World Wide Web. The Web is basically a series of documents shared with the world written in a coding language called Hyper Text Markup Language or HTML. When you see a web page, like this one, you downloaded a document from another computer which has been set up as a Web Server. (not truncated)
$shortdesc = myTruncate($description, 200);
echo "<p>$shortdesc</p>";
The World Wide Web. When your average person on the street refers to the Internet, they're usually thinking of the World Wide Web. The Web is basically a series of documents shared with the world written in a coding language called Hyper Text Markup Language or HTML...
$shortdesc = myTruncate($description, 100);
echo "<p>$shortdesc</p>";
The World Wide Web. When your average person on the street refers to the Internet, they're usually thinking of the World Wide Web...
Truncation at word breaks
If your text consists of long sentences or you need precise control over the length then breaking on a space might be better. Some clarity is lost as the sentences are broken up, but at least the words remain intact.
$shortdesc = myTruncate($description, 300, " ");
echo "<p>$shortdesc</p>";
The World Wide Web. When your average person on the street refers to the Internet, they're usually thinking of the World Wide Web. The Web is basically a series of documents shared with the world written in a coding language called Hyper Text Markup Language or HTML. When you see a web page, like this...
$shortdesc = myTruncate($description, 200, " ");
echo "<p>$shortdesc</p>";
The World Wide Web. When your average person on the street refers to the Internet, they're usually thinking of the World Wide Web. The Web is basically a series of documents shared with the world written...
$shortdesc = myTruncate($description, 100, " ");
echo "<p>$shortdesc</p>";
The World Wide Web. When your average person on the street refers to the Internet, they're usually thinking...
You'll notice that the final text length is now closer to $limit characters, which makes sense as there are a lot more spaces in the normal text than periods.
Truncating to a maximum length
As noted previously, the function presented above will always return a string slightly longer than $limit characters, up to the next $break character. For those with stricter requirements, here's an alternative function that will truncate text to the $break character before rather than after $limit:
// Original PHP code by Chirp Internet: www.chirpinternet.eu
// Please acknowledge use of this code by including this header.
function myTruncate2($string, $limit, $break = " ", $pad = "...")
{
// return with no change if string is shorter than $limit
if(strlen($string) <= $limit) return $string;
$string = substr($string, 0, $limit);
if(false !== ($breakpoint = strrpos($string, $break))) {
$string = substr($string, 0, $breakpoint);
}
return $string . $pad;
}
Note that the default value for $break has changed to the space character. Using the "." character as the breakpoint is now dangerous as there might only be one or two sentences in your text and you could end up with very few words left over.
Here you can see the output of this new function:
$shortdesc = myTruncate2($description, 300);
echo "<p>$shortdesc</p>";
The World Wide Web. When your average person on the street refers to the Internet, they're usually thinking of the World Wide Web. The Web is basically a series of documents shared with the world written in a coding language called Hyper Text Markup Language or HTML...
$shortdesc = myTruncate2($description, 200);
echo "<p>$shortdesc</p>";
The World Wide Web. When your average person on the street refers to the Internet, they're usually thinking of the World Wide Web...
$shortdesc = myTruncate2($description, 100);
echo "<p>$shortdesc</p>";
The World Wide Web...
Truncating by words
If you simply want to truncate text to a certain number of words, then you don't need any fancy functions, the code below will do the trick:
<?PHP
$numwords = 10;
preg_match("/(\S+\s*){0,$numwords}/", $description, $regs);
$shortdesc = trim($regs[0]);
?>
In other words, match up to 10 occurences of 'one or more non-space characters followed by zero or more whitespace characters':
The World Wide Web. When your average person on the (10 words)
For more information on trucating text, wrapping text to fit in a column, or splitting content evenly over two or more columns, see the related article on Word Wrapping.
An alternative solution has been provided by Bakaburg (see Feedback). Here is the same solution packaged into a function:
<?PHP
function truncateWords($input, $numwords, $padding="")
{
$output = strtok($input, " \n");
while(--$numwords > 0) $output .= " " . strtok(" \n");
if($output != $input) $output .= $padding;
return $output;
}
?>
Usage should be fairly straight-forward:
<?PHP
$shortdesc = truncateWords($description, 10, "...");
echo "<p>$shortdesc</p>";
?>
The World Wide Web. When your average person on the...
Sure enough, we're left with just the first 10 words of the text. Note that we've set the default delimiters for strtok to a space or line break. You can add other characters to this set as required, but not regular expressions as in the previous solution.
Restoring tags in truncated HTML
A lot of people have asked questions about how to deal with HTML tags in truncated text. Obviously the simplest approach is to remove any tags from the string to be truncated, but that's not always good enough for real world applications where formatting is important.
This function accepts a string that contains HTML tags and will attempt to automatically close any tags that have been left open. There is an assumption here that the HTML was valid before it was truncated, so all tags were opened and closed in a valid order:
<?PHP
// Original PHP code by Chirp Internet: www.chirpinternet.eu
// Please acknowledge use of this code by including this header.
function restoreTags($input)
{
$opened = array();
// loop through opened and closed tags in order
if(preg_match_all("/<(\/?[a-z]+)>?/i", $input, $matches)) {
foreach($matches[1] as $tag) {
if(preg_match("/^[a-z]+$/i", $tag, $regs)) {
// a tag has been opened
if(strtolower($regs[0]) != 'br') $opened[] = $regs[0];
} elseif(preg_match("/^\/([a-z]+)$/i", $tag, $regs)) {
// a tag has been closed
unset($opened[array_pop(array_keys($opened, $regs[1]))]);
}
}
}
// close tags that are still open
if($opened) {
$tagstoclose = array_reverse($opened);
foreach($tagstoclose as $tag) $input .= "</$tag>";
}
return $input;
}
Please Note: This function is experimental so please use it with caution and give feedback using the Feedback link at the bottom of the page. If you've found a case where it doesn't work, please provide an example so we can check it out.
Neither the truncation script nor the above function for restoring HTML tags is designed to handle text that contains image tags, or hyperlinks. They are designed simply for truncating plain text and HTML tags without any attributes. For a more comprehensive solution see the links below.
Working Example
You can use the form below to see how the truncate function (myTruncate) works - with the extra option of closing any open HTML tags (the restoreTags function).
References
- PHP.net: String Functions
Related Articles - Text Manipulation
- HTML Forcing INPUT text to uppercase
- JavaScript HTML content that expands on click
- JavaScript Collapsible containers with rotation support
- PHP Truncating Text
- PHP Passing variables to JavaScript
- PHP Word Wrapping
- PHP What happened with htmlspecialchars?
Perchy 12 January, 2016
Would it be possible to feed a URL into the function and use the ellipsis (...) as a link?
I don't recommend using "..." as the link, but you could pass both the link and some alternative text to the function using the $pad variable:
e.g.
myTruncate($string, $limit, ".", "<a href=\"$url\">[more]</a>");
Michael 21 February, 2014
I have modified it to handle utf-8
<?php
function myTruncate($string, $limit, $break=" ", $pad=" ...") {
// return with no change if string is shorter than $limit
if(mb_strlen($string, 'UTF-8') <= $limit) return $string;
// is $break present between $limit and the end of the string?
if(false !== ($breakpoint = mb_strpos($string, $break, $limit, "UTF-8"))) {
if($breakpoint < mb_strlen($string, 'UTF-8') - 1) {
// $string = substr($string, 0, $breakpoint) . $pad;
$string = mb_substr($string, 0, $breakpoint, "UTF-8") . $pad;
}
}
#put all opened tags into an array
preg_match_all ( "#<([a-z]+)( .*)?(?!/)>#iU", $string, $result );
$openedtags = $result[1];
#put all closed tags into an array
preg_match_all ( "#</([a-z]+)>#iU", $string, $result );
$closedtags = $result[1];
$len_opened = count ( $openedtags );
# all tags are closed
if( count ( $closedtags ) == $len_opened ) {
return $string;
}
$openedtags = array_reverse ( $openedtags );
# close tags
for( $i = 0; $i < $len_opened; $i++ ) {
if ( !in_array ( $openedtags[$i], $closedtags ) )
{
$string .= "</" . $openedtags[$i] . ">";
}
else
{
unset ( $closedtags[array_search ( $openedtags[$i], $closedtags)] );
}
}
return $string;
}
?>
ht1080z 29 November, 2013
Thank you for the code. I upgraded with Multibyte extension, so the function work correctly with greek charset as well.
function maxChars($string, $limit, $break=".", $pad="...") {
$charset = 'UTF-8';
if(mb_strlen($string, $charset) <= $limit) return $string;
if(false !== ($breakpoint = strpos($string, $break, $limit))) {
if($breakpoint < mb_strlen($string, $charset) - 1) {
$string = mb_substr($string, 0, $breakpoint, $charset) . $pad;
}
}
return $string;
}
Bakaburg 12 December, 2010
To truncate by words I would use strtok() with a iteration.
for example:
$string = strtok("blah blah", " \n");
for($i = $numberOfWords; $i > 0 $i--) $string .= " " . strtok(" \n");
echo $string . "...";
Cameron 11 November, 2010
I love it! Thanks!! It is EXACTLY what I was looking for and easy to use.
Gavin 25 October, 2009
Thanks a lot, I was hoping to find a solution to this problem (regarding basic HTML tags and truncation) - this seems to work a charm.
On an entirely different point, I have noticed your webpage in firefox, expands way beyond the normal horizontal layout, so that I have to use the scrollbars to read each line.
Thanks. It seems this this page is expanding horizontally because Firefox is still ignoring the soft-hyphens (­) in the long string example above - despite recent promises to the contrary. In all other browsers it should be ok.
Colin McKinnon 31 March, 2009
Neat code - thanks for publishing it. But there seems to be a bug in it.
As it stands, the code will sometimes split input in the middle of a tag - and can't then close off the incomplete tag.
Hi Colin, the code as written DOES NOT cater for HTML tags with attributes. Only for simple <p>, <br>, <b>, <i>, etc. For a more comprehensive solution check out the links here in the Feedback section.
Panayiotis Karabassis 6 February, 2009
Great article thanks!
Regarding the last script for truncating html:
Take this example:
The script creates open and closed as follows:
open: (b, i, i, b)
closed: (b, i)
Then it goes on to "cancel out" one 'b' and one 'i' and prints the closing tags in the wrong order:
I think you can get around this by traversing closed in reverse order.
Thanks for the feedback. The code has been updated now to fix this problem - and is also much shorter
Joey 28 January, 2009
For truncating HTML code, CakePHP's TextHelper class has a function called truncate that works quite well. You can find it in this file: trac.cakephp.org/browser/tags/1.2.1.8004/cake/libs/view/helpers/text.php
Thanks for the link. That's a much more comprehensive script and is a better option if you're working with raw HTML. A bit long to show here though
Erik Spaan 11 December, 2008
Hi, nice article. I'm using smarty.net as a template system within Zikula CMS. There is a truncate function there already that does more or less you myTruncate.
I'm using this phpinsider.com/smarty-forum/viewtopic.php?t=533 html safe truncating. Just to let you know that there is more code out there.
Daniel Peraza 8 November, 2008
Hi, excellent article!
I do not know regular expressions too much, and I honestly don't understand the preg_match() uses completely, but one question that arose to me when I saw your article.
I need to truncate a text according to the number of words it has, but what if I have (X)HTML formatted text?, will tags count as words as well?, how could I prevent this in order to keep my (X)HTML code well formed?
An excellent question, but unfortunately no simple answer. When you truncate raw (X)HTML you always run the risk of removing a closing tag and unbalancing the code.
To get around this you either need to strip out all the tags first, or after the text has been truncated parse the string again to check for missing tags and add them back (in the correct order) at the end of the truncated text.
Nidhesh 23 May, 2008
Great code. Simple but powerful. Thanks.
Jonathan P L Spooner 27 November, 2007
Hi there-
First off thanks for the super helpful truncation description. I have used it to good effect on my site..
But one thing I am running into is how to handle very long words:
"This glitch-hop mix gives you the tools you need to bake a sweet romance with any particular individual alive. Brought to you by CANOPY RADIOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO"
A very common problem. Try something like the following to insert a 'soft' hyphen if a word is longer than a certain length.
$word_limit = 24;
$text = preg_replace("/(\w{{$word_limit}})/", "$1­", $text);
Erik 16 March, 2007
Thanks for a great explanation of truncating text (separate words at that!). You are very smart.