skip to content

JavaScript: Search Keyword Highlighting

 Tweet0 Tweets

If you've arrived here via a search engine you may have noticed that the keywords you searched for appeared highlighted on the landing page. Here we explain how this is done using JavaScript.

There are two components - firstly identifying the keywords, and then highlighting them on the page using JavaScript. This article covers only the second part - using DHTML to dynamically modify the DOM.

Unfortunately extracting search strings from search engine referrals is now nigh on impossible, but the highlighting code by itself has proven extremely popular.

Working example

Enter some keywords in the box below and click the Apply button. Words in the page content that match the words you've entered will be instantly highlighted in different colours:

Highlight Words

To remove highlighting click the Remove button.

How to implement

First we take the form input (e.g. "search engine keywords") and convert it into a JavaScript regular expression (e.g. "/\b(search|engine|keywords)\b/"). This regular expression will match any of the entered words where they appear in the content area of the page.

In practice what we do is generate the list of keywords to highlight using the search engine referer string, and then exclude common stop words such as 'is', 'and', 'are', etc, but that's another story.

Here is the code we use to call the highlighting function:

<script type="text/javascript" src="hilitor.js"></script> <script type="text/javascript"> var myHilitor = new Hilitor("content"); // id of the element to parse myHilitor.apply("highlight words"); </script>

Please make a local copy of the hilitor.js script on your server rather than linking it from this website. We have some built-in protections which are triggered by hot-linking.

The first line creates an instance of the Hilitor class, specifying that the area to be scanned is the #content element (i.e. inside <main id="content">...</main>) on the page. If we don't supply a valid id then the script will scan document.body, but it's better to limit it to just your content area to avoid highlighting appearing in the page header, menu or footer.

The tag used by default to apply highlighting to matched words is the HTML5 MARK tag. We can change that if we want by passing a second parameter with a different tag name. You can use any tag that renders inline - so not a DIV or similar block element which would break the layout.

Calling the apply() method passes the words to be highlighted to the just instantiated object which traverses the DOM inside the selected node looking for text that matches any of the keywords.

When a match is found, the text node in question is split up and a tag with style settings applied to the matching words. Each word is assigned a colour from the array supplied in the script which is then used throughout the document for that word.

To remove the highlighting from the page we simply call (from a link, script or button):

<script type="text/javascript"> myHilitor.remove(); </script>

If the colours used for highlighting look familiar, it's because they were copied from a now defunct version of Google Cache View.

The hilitor.js JavaScript class

The JavaScript code for this class can be downloaded or copied below. The two main methods are apply() and remove() which respectively apply word highlighting on the page and then remove it, restoring the DOM (more or less) to its original state.

In the variables section you may want to change the default tag to use for highlighting (currently MARK), the list of tags to skip (SCRIPT and FORM) or the array of colours.

Source code for hilitor.js:

// Original JavaScript code by Chirp Internet: www.chirp.com.au // Please acknowledge use of this code by including this header. function Hilitor(id, tag) { // private variables var targetNode = document.getElementById(id) || document.body; var hiliteTag = tag || "MARK"; var skipTags = new RegExp("^(?:" + hiliteTag + "|SCRIPT|FORM|SPAN)$"); var colors = ["#ff6", "#a0ffff", "#9f9", "#f99", "#f6f"]; var wordColor = []; var colorIdx = 0; var matchRegExp = ""; var openLeft = false; var openRight = false; // characters to strip from start and end of the input string var endRegExp = new RegExp('^[^\\w]+|[^\\w]+$', "g"); // characters used to break up the input string into words var breakRegExp = new RegExp('[^\\w\'-]+', "g"); this.setEndRegExp = function(regex) { endRegExp = regex; return endRegExp; }; this.setBreakRegExp = function(regex) { breakRegExp = regex; return breakRegExp; }; this.setMatchType = function(type) { switch(type) { case "left": this.openLeft = false; this.openRight = true; break; case "right": this.openLeft = true; this.openRight = false; break; case "open": this.openLeft = this.openRight = true; break; default: this.openLeft = this.openRight = false; } }; this.setRegex = function(input) { input = input.replace(endRegExp, ""); input = input.replace(breakRegExp, "|"); input = input.replace(/^\||\|$/g, ""); if(input) { var re = "(" + input + ")"; if(!this.openLeft) { re = "\\b" + re; } if(!this.openRight) { re = re + "\\b"; } matchRegExp = new RegExp(re, "i"); return matchRegExp; } return false; }; this.getRegex = function() { var retval = matchRegExp.toString(); retval = retval.replace(/(^\/(\\b)?|\(|\)|(\\b)?\/i$)/g, ""); retval = retval.replace(/\|/g, " "); return retval; }; // recursively apply word highlighting this.hiliteWords = function(node) { if(node === undefined || !node) return; if(!matchRegExp) return; if(skipTags.test(node.nodeName)) return; if(node.hasChildNodes()) { for(var i=0; i < node.childNodes.length; i++) this.hiliteWords(node.childNodes[i]); } if(node.nodeType == 3) { // NODE_TEXT if((nv = node.nodeValue) && (regs = matchRegExp.exec(nv))) { if(!wordColor[regs[0].toLowerCase()]) { wordColor[regs[0].toLowerCase()] = colors[colorIdx++ % colors.length]; } var match = document.createElement(hiliteTag); match.appendChild(document.createTextNode(regs[0])); match.style.backgroundColor = wordColor[regs[0].toLowerCase()]; match.style.color = "#000"; var after = node.splitText(regs.index); after.nodeValue = after.nodeValue.substring(regs[0].length); node.parentNode.insertBefore(match, after); } }; }; // remove highlighting this.remove = function() { var arr = document.getElementsByTagName(hiliteTag); while(arr.length && (el = arr[0])) { var parent = el.parentNode; parent.replaceChild(el.firstChild, el); parent.normalize(); } }; // start highlighting at target node this.apply = function(input) { this.remove(); if(input === undefined || !(input = input.replace(/(^\s+|\s+$)/g, ""))) { return; } if(this.setRegex(input)) { this.hiliteWords(targetNode); } return matchRegExp; }; }

expand code box

Under the hood

When the apply() method is called it generates a regular expression from the keywords, clears any existing highlighting on the page, and then calls the hiliteWords() method passing a reference to the selected start node.

The hiliteWords() method examines the first node to see if it has any child nodes in which case it calls itself recursively once for each child. When a text node is encountered its contents are tested against the regular expression to see if any of our keywords are present.

If there are one or more matches the text node is cut in two at the point where the first match was found. The matched word itself is wrapped in an MARK tag that also specifies the background colour. The process then continues.

It's a bit difficult to explain, but essentially what takes place is the following:

DOM before

Container > Text Node e.g. <p>Paragraph with highlighted word.</p>

DOM after

Container > Text Node // original text node, truncated > MARK > Text Node // var match > Text Node // var after e.g. <p>Paragraph with <mark>highlighted</mark> word.</p>

As the script encounters and highlights different keywords the colour used for each is remembered so that the same keyword can be highlighted consistently down the page.

The for loop doesn't mind that there are suddenly three extra nodes. It moves to the next node, which is the MARK, skips it and moves to the after Text Node containing the remainder of the node that was originally split in two.

Feel free to adapt and use this code as you see fit. We use it to highlight search engine query keywords after the keywords have been extracted from the HTTP Referer using a server-side script, but that's more than can be explained in a single article.

Changing the Match Type

We've added a new method for the Hilitor class setMatchType(). You can set this to 'left', 'right' or 'open'. Any other value will restore the default behaviour of only matching whole words.

The effect is to change the regular expression as shown here:

default
/\b(input|text)\b/i
.setMatchType("left");
/\b(input|text)/i
.setMatchType("right");
/(input|text)\b/i
.setMatchType("open");
/(input|text)/i

Here is a simple working example:

Highlight keywords as you type:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus felis erat, facilisis vitae est sed, interdum luctus dui. Donec condimentum in neque ac consequat. Donec interdum quis massa molestie consequat. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Vestibulum malesuada quam quis sapien volutpat placerat. Phasellus varius, enim vel fringilla pellentesque, felis velit mollis lacus, ac scelerisque sem sem sed eros.

This example has been coded to apply only to it's own section of the page, unlike the main example which will highlight text throughout:

<div id="playground"> <form> <p>Highlight keywords as you type: <input id="keywords" size="24"></p> </form> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus felis erat, facilisis vitae est sed, interdum luctus dui. Donec condimentum in neque ac consequat. Donec interdum quis massa molestie consequat. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Vestibulum malesuada quam quis sapien volutpat placerat. Phasellus varius, enim vel fringilla pellentesque, felis velit mollis lacus, ac scelerisque sem sem sed eros.</p> </div> <script src="hilitor.js"></script> <script> window.addEventListener("DOMContentLoaded", function(e) { var myHilitor2 = new Hilitor("playground"); myHilitor2.setMatchType("left"); document.getElementById("keywords").addEventListener("keyup", function(e) { myHilitor2.apply(this.value); }, false); }, false); </script>

Working with regular expressions

The setRegex() and getRegex() methods convert the input string of keywords into a JavaScript regular expression and vice versa. Note in particular the the regular expressions defined at the top of the script (endRegExp and breakRegExp).

When an input string is received:

The shorthand "\w" (written as \\w when instantiating a RegExp object) represents the set of alphanumeric characters (A-Za-z0-9) including the underscore (_). Our default regular expression also includes hyphens (-) and apostrophes (').

We change the behaviour of our script by changing these regular expressions. For example:

Search for strings that include spaces:

myHilitor.setBreakRegExp(new RegExp('[^\\w\' -]+', "g")); // expanded to include spaces myHilitor.apply("input string,keywords,page");

Search for email addresses:

This regular expression will cover strings including "@" and ".":

myHilitor.setBreakRegExp(new RegExp('[^\\w\'@.-]+', "g")); // expanded to include email characters myHilitor.apply("nobody@example.net");

Otherwise any email addresses (nobody@example.net) in the search keywords would be broken into separate words at each '@' or '.'.

Search for #hashtags:

Trying to match #hashtags in a string is problematic. Rather than /\b(#hashtag)\b/ you actually need just /(#hashtag)\b/ - because there is no \b before the #, while there is a \b between '#' and 'h' - which the code only supports if we make the search type "right" or "open":

myHilitor.setEndRegExp('^[^\\w#]+|[^\\w]+$'); myHilitor.setBreakRegExp(new RegExp('[^\\w\'#-]+', "g")); myHilitor.setMatchType("right"); myHilitor.apply("match,#hashtag,code");

Similar problems exist when there are accented characters, and for that we have a separate branch of the code with UTF-8 support which might be more useful in real world situations.

Using an onload event

Since it's very unfasionable to call javascript inline, here is an example of how you can apply highlighting to the page using an event listener that only fires after the page has been completely rendered:

<script type="text/javascript" src="hilitor.js"></script> <script type="text/javascript"> var myHilitor; // global variable window.addEventListener("DOMContentLoaded", function(e) { myHilitor = new Hilitor("content"); myHilitor.apply("highlight these words"); }, false); </script>

The function call is identical to the example above, just delayed until the page has finished rendering rather than being executed as soon as the function call is encountered.

We've declared myHilitor as a global variable to enable later calls to remove or update keyword highlighting on the page using the remove() and apply() methods.

Link to turn off highlighting

You might be wondering how we generate the link that appears at the top of the page showing the words that have been highlighted alongside a link to remove the highlighting.

It's a bit of a hack, but the code is as follows:

(function() { var targetNode = document.getElementsByTagName("h1")[0]; // first h1 on page var el = document.createElement("p"); // create new p el.id = "remove_hilite"; el.style.fontSize = "10px"; // styles can better be applied using css el.innerHTML = "<b>Highlighted terms</b>: " + myHilitor.getRegex() + " | <a href=\"#\" onclick=\"myHilitor.remove(); document.getElementById('remove_hilite').style['display'] = 'none'; return false;\">remove</a>"; targetNode.parentNode.insertBefore(el, targetNode.nextSibling); // insert p element after first h1 })();

This inserts a new paragraph (P) following the first H1 heading on the page. The text displays the keywords and also a 'remove' link which calls the remove method. You can see an example here.

These commands can also be bundled into the onload method above, in which case you no longer need to wrap them in an anonymous function. The complete code then is as follows:

<script src="hilitor.js"></script> <script> window.addEventListener("DOMContentLoaded", function(e) { // highlight search terms on page var myHilitor = new Hilitor("content"); myHilitor.apply("highlight these words"); // (optional) display link to remove highlighting var targetNode = document.getElementsByTagName("h1")[0]; var el = document.createElement("p"); el.id = "remove_hilite"; el.style.fontSize = "10px"; el.innerHTML = "<b>Highlighted terms</b>: " + myHilitor.getRegex() + " | <a href=\"#\" onclick=\"myHilitor.remove(); document.getElementById('remove_hilite').style['display'] = 'none'; return false;\">remove</a>"; targetNode.parentNode.insertBefore(el, targetNode.nextSibling); }, false); </script>

Here we don't declare a global variable as the on-page link is created from within the enclosure and can refer to the Hilitor object.

Identifying search keywords

To identify search keywords you need to parse the browsers HTTP_REFERER string. Depending on which search engine was used the keywords will usually appear after q= or query=, but there are also other possibilities.

Since 2011, unfortunately, Google does not pass this information for logged in users making it impossible for us to identify their search keywords.

References

< JavaScript

Send a message to The Art of Web:


used only for us to reply, and to display your gravatar.

<- copy the digits from the image into this box

press <Esc> or click outside this box to close

User Comments

Most recent 20 of 72 comments:

Post your comment or question

2 June, 2019

I am trying search for keyword inside html source code i placed html source code inside div as contentEditable but it renders html! How to stop html content from rendering inside div snd make your script work with it? Thanks

2 June, 2019

Thanks for cool demo. I placed a textarea inside playground but unfortunetly it does not higlight text inside text area! Could you tell me how to higlight texts inside textarea?

You can't add formatting inside a textarea element. What you need to do instead is hide the textarea and in it's place display a synchronised HTML (or ContentEditable) version.

11 December, 2018

Hi,

It helped but I could no achieve what I want.

I would like to pass a string from a text field and then load another page highlighting the text matched and jumping to its position

And when highlighting on keyup listener event, I would like to match just the exact string searched, not each keyword, is it possible?

Most all of that should be possible using the existing code. Some of the comments below include code snippets.

I don't have an example that does exactly what you want, but this may help (source).

10 December, 2018

Very nice feature.
I'm wondering if is it possible to jump to the position of the text found? Like jump to the searched text found.

Try the code from Alejandro's comment below

28 September, 2018

@Dave Hamstring

Hey Dave i was looking for the same functionality and your input gave me another alternative, see snippet below:


var after = node.splitText(regs.index);
after.nodeValue = after.nodeValue.substring(regs[0].length);
node.parentNode.insertBefore(match, after);
// smooth scroll into view of first instance of hiliteTag
document.querySelector(hiliteTag).scrollIntoView({behavior: "smooth", block: "center", inline: "center"});
}
};
}

I included a bit of the top and the closing brackets at the bottom for context and lexical positioning.

Hope it helps

15 June, 2018

This hilitor.js looks promising. Can you use several predefined expressions and assign a color to each one? The text to be highlighted will be in a <textarea> input.
Any feedback is appreciated.
Thanks!

30 April, 2018

Hi, hopefully this helps someone

I am using the open ended option, I wanted to make the first highlighted word scroll into view in the browser.

I added: node.parentNode.scrollIntoView(true);

line 97 of hilitor.js

Any feedback on whether this is a good way to do it would be great, thanks!

5 February, 2018

Adding to Phil's code to show the # of matches, you will need to make this change in order to clear out the "matches" if someone backspaces to clear out the search box:

if (input === undefined || !input)
{
document.getElementById('matches').innerHTML = "";
return;
}

Thanks, Phil! Your code worked great for me!

26 January, 2018

Hi,

I am really glad that i found this. But i wonder if its possible the separate the keywords by commas than spaces?

i use it on page load. i have set of keywords like myHilitor.apply("one two, three"). So i would like to have 2 highlighted keywords which are "one two" and "three".

Thanks!

16 January, 2018

I've implemented the Hilitor Search Engine and love it. Two things: Is there any documentation on customizing filters. I noted the one ones on this site do a better job then the default that comes with the code. Second I saw some time back a demo where separate tallies of each unique word was displayed in a window. And remarkably changing as you typed. I'd love to get my hands on that code.

Thanks so much to all involved in this project!

6 November, 2017

Hello,

The lib for hilitor.js has been replaced by a lib that rotates every div on page. Dunno if you've been trolled and/or hacked, be careful.

Here is the changed file:
www.the-art-of-web.com/hilitor.js

Maybe other files has been changed, I didn't looked further.

That only happens when the file is hot-linked from another website. The originals are unchanged and can be safely copied:

www.the-art-of-web.com/hilitor.js
www.the-art-of-web.com/hilitor-utf8.js

9 February, 2017

Good work, I face one problem i am not able to search for Supscript or superscript text in the HTML. Any suggestions please

8 February, 2017

Nothing to congratulate you on your work. I have the same problem, I use it to search for phrases.

There is one problem that I have and that is if a searched word has a punctuation mark (period,comma colon, etc) immediately after it, It doen't find the word to highlight it.

Any suggestions?

You will need to modify "breakCharRegex" as described above. In your case, to include spaces and other punctuation in the [^...].

12 January, 2017

Thank you for your excellent script. I was wondering what is the regular expression for a case-sensitive match.

In this line:

matchRegex = new RegExp(re, "i");

It's the "i" that makes it case-insensitive.

See the documentation.

21 December, 2016

Would it be possible to count the words you highlight, based on the html elements?

For example:
keyword found in <h1>: 1
keyword found in <h2>: 3
keyword found in <p>: 4
keyword found in <a>: 1

With that you can create a simple keyword density tool to create better seo content

I would first run the Hilitor, and then a separate script to identify all the highlighted words and step up to their containing elements.

22 October, 2016

Hi im new to javascript so maybe im wrong but it seems it doesnt match words starting with #foo #bar or any word starting with # which is used in hashtags.

2 September, 2016

Could the code be easily modified to change another style value of the text? For example, adding a strikethrough. Even better, changing text colour and adding a strikethrough, or bold and underline. Etc.

31 August, 2016

It doesn't work with languages other than English??

29 August, 2016

I found this tool pretty helpful.
Thanks a lot for coming up with this.
How can I highlight opening html tags using this.

if I give "<div" , it only highlights and matches all occurrences of div.

25 August, 2016

Hi! I finally got it to work in my JSF web application and like it very much.
There is one problem that I have and that is if a searched word has a punctuation mark (period,comma colon, etc) immediately after it, It doen't find the word to highlight it.
Any suggestions?

top