JavaScript: Search Keyword Highlighting

Tweet 0 Shares 0 Tweets 78 Comments

If you've arrived here via a search engine you may have noticed that the keywords you searched for appeared highlighted on the landing page. Here we explain how this is done using JavaScript.

There are two components - firstly identifying the keywords, and then highlighting them on the page using JavaScript. This article covers only the second part - using DHTML to dynamically modify the DOM.

Unfortunately extracting search strings from search engine referrals is now nigh on impossible, but the highlighting code by itself has proven extremely popular.

Working example

Enter some keywords in the box below and click the Apply button. Words in the page content that match the words you've entered will be instantly highlighted in different colours:

To remove highlighting click the Remove button.

How to implement

First we take the form input (e.g. "search engine keywords") and convert it into a JavaScript regular expression (e.g. "/\b(search|engine|keywords)\b/"). This regular expression will match any of the entered words where they appear in the content area of the page.

In practice what we do is generate the list of keywords to highlight using the search engine referer string, and then exclude common stop words such as 'is', 'and', 'are', etc, but that's another story.

Here is the code we use to call the highlighting function:

<script src="hilitor.js"></script>
<script>

  var myHilitor = new Hilitor("content"); // id of the element to parse
  myHilitor.apply("highlight words");

</script>

Please make a local copy of the hilitor.js script on your server rather than linking it from this website. We have some built-in protections which are triggered by hot-linking.

The first line creates an instance of the Hilitor class, specifying that the area to be scanned is the #content element (i.e. inside <main id="content">...</main>) on the page. If we don't supply a valid id then the script will scan document.body, but it's better to limit it to just your content area to avoid highlighting appearing in the page header, menu or footer.

The tag used by default to apply highlighting to matched words is the HTML5 MARK tag. We can change that if we want by passing a second parameter with a different tag name. You can use any tag that renders inline - so not a DIV or similar block element which would break the layout.

Calling the apply() method passes the words to be highlighted to the just instantiated object which traverses the DOM inside the selected node looking for text that matches any of the keywords.

When a match is found, the text node in question is split up and a tag with style settings applied to the matching words. Each word is assigned a colour from the array supplied in the script which is then used throughout the document for that word.

To remove the highlighting from the page we simply call (from a link, script or button):

<script>

  myHilitor.remove();

</script>

If the colours used for highlighting look familiar, it's because they were copied from a now defunct version of Google Cache View.

The hilitor.js JavaScript class

The JavaScript code for this class can be downloaded or copied below. The two main methods are apply() and remove() which respectively apply word highlighting on the page and then remove it, restoring the DOM (more or less) to its original state.

In the variables section you may want to change the default tag to use for highlighting (currently MARK), the list of tags to skip (SCRIPT and FORM) or the array of colours.

Source code for hilitor.js:

function Hilitor(id, tag)
{

  // Original JavaScript code by Chirp Internet: www.chirpinternet.eu
  // Please acknowledge use of this code by including this header.

  // private variables
  var targetNode = document.getElementById(id) || document.body;
  var hiliteTag = tag || "MARK";
  var skipTags = new RegExp("^(?:" + hiliteTag + "|SCRIPT|FORM|SPAN)$");
  var colors = ["#ff6", "#a0ffff", "#9f9", "#f99", "#f6f"];
  var wordColor = [];
  var colorIdx = 0;
  var matchRegExp = "";
  var openLeft = false;
  var openRight = false;

  // characters to strip from start and end of the input string
  var endRegExp = new RegExp('^[^\\w]+|[^\\w]+$', "g");

  // characters used to break up the input string into words
  var breakRegExp = new RegExp('[^\\w\'-]+', "g");

  this.setEndRegExp = function(regex) {
    endRegExp = regex;
    return endRegExp;
  };

  this.setBreakRegExp = function(regex) {
    breakRegExp = regex;
    return breakRegExp;
  };

  this.setMatchType = function(type)
  {
    switch(type)
    {
      case "left":
        this.openLeft = false;
        this.openRight = true;
        break;

      case "right":
        this.openLeft = true;
        this.openRight = false;
        break;

      case "open":
        this.openLeft = this.openRight = true;
        break;

      default:
        this.openLeft = this.openRight = false;

    }
  };

  this.setRegex = function(input)
  {
    input = input.replace(endRegExp, "");
    input = input.replace(breakRegExp, "|");
    input = input.replace(/^\||\|$/g, "");
    if(input) {
      var re = "(" + input + ")";
      if(!this.openLeft) {
        re = "\\b" + re;
      }
      if(!this.openRight) {
        re = re + "\\b";
      }
      matchRegExp = new RegExp(re, "i");
      return matchRegExp;
    }
    return false;
  };

  this.getRegex = function()
  {
    var retval = matchRegExp.toString();
    retval = retval.replace(/(^\/(\\b)?|\(|\)|(\\b)?\/i$)/g, "");
    retval = retval.replace(/\|/g, " ");
    return retval;
  };

  // recursively apply word highlighting
  this.hiliteWords = function(node)
  {
    if(node === undefined || !node) return;
    if(!matchRegExp) return;
    if(skipTags.test(node.nodeName)) return;

    if(node.hasChildNodes()) {
      for(var i=0; i < node.childNodes.length; i++)
        this.hiliteWords(node.childNodes[i]);
    }
    if(node.nodeType == 3) { // NODE_TEXT

      var nv, regs;

      if((nv = node.nodeValue) && (regs = matchRegExp.exec(nv))) {

        if(!wordColor[regs[0].toLowerCase()]) {
          wordColor[regs[0].toLowerCase()] = colors[colorIdx++ % colors.length];
        }

        var match = document.createElement(hiliteTag);
        match.appendChild(document.createTextNode(regs[0]));
        match.style.backgroundColor = wordColor[regs[0].toLowerCase()];
        match.style.color = "#000";

        var after = node.splitText(regs.index);
        after.nodeValue = after.nodeValue.substring(regs[0].length);
        node.parentNode.insertBefore(match, after);

      }
    }
  };

  // remove highlighting
  this.remove = function()
  {
    var arr = document.getElementsByTagName(hiliteTag), el;
    while(arr.length && (el = arr[0])) {
      var parent = el.parentNode;
      parent.replaceChild(el.firstChild, el);
      parent.normalize();
    }
  };

  // start highlighting at target node
  this.apply = function(input)
  {
    this.remove();
    if(input === undefined || !(input = input.replace(/(^\s+|\s+$)/g, ""))) {
      return;
    }
    if(this.setRegex(input)) {
      this.hiliteWords(targetNode);
    }
    return matchRegExp;
  };

}

expand code box

Under the hood

When the apply() method is called it generates a regular expression from the keywords, clears any existing highlighting on the page, and then calls the hiliteWords() method passing a reference to the selected start node.

The hiliteWords() method examines the first node to see if it has any child nodes in which case it calls itself recursively once for each child. When a text node is encountered its contents are tested against the regular expression to see if any of our keywords are present.

If there are one or more matches the text node is cut in two at the point where the first match was found. The matched word itself is wrapped in an MARK tag that also specifies the background colour. The process then continues.

It's a bit difficult to explain, but essentially what takes place is the following:

DOM before

Container > Text Node

e.g. <p>Paragraph with highlighted word.</p>

DOM after

Container > Text Node          // original text node, truncated
          > MARK > Text Node     // var match
          > Text Node          // var after

e.g. <p>Paragraph with <mark>highlighted</mark> word.</p>

As the script encounters and highlights different keywords the colour used for each is remembered so that the same keyword can be highlighted consistently down the page.

The for loop doesn't mind that there are suddenly three extra nodes. It moves to the next node, which is the MARK, skips it and moves to the after Text Node containing the remainder of the node that was originally split in two.

Feel free to adapt and use this code as you see fit. We use it to highlight search engine query keywords after the keywords have been extracted from the HTTP Referer using a server-side script, but that's more than can be explained in a single article.

Changing the Match Type

We've added a new method for the Hilitor class setMatchType(). You can set this to 'left', 'right' or 'open'. Any other value will restore the default behaviour of only matching whole words.

The effect is to change the regular expression as shown here:

default: /\b(input|text)\b/i
.setMatchType("left");: /\b(input|text)/i
.setMatchType("right");: /(input|text)\b/i
.setMatchType("open");: /(input|text)/i

Here is a simple working example:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus felis erat, facilisis vitae est sed, interdum luctus dui. Donec condimentum in neque ac consequat. Donec interdum quis massa molestie consequat. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Vestibulum malesuada quam quis sapien volutpat placerat. Phasellus varius, enim vel fringilla pellentesque, felis velit mollis lacus, ac scelerisque sem sem sed eros.

This example has been coded to apply only to it's own section of the page, unlike the main example which will highlight text throughout:

<div id="playground">

<form>
<p>Highlight keywords as you type: <input id="keywords" size="24"></p>
</form>

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
Vivamus felis erat, facilisis vitae est sed, interdum luctus dui. Donec 
condimentum in neque ac consequat. Donec interdum quis massa molestie 
consequat. Pellentesque habitant morbi tristique senectus et netus et 
malesuada fames ac turpis egestas. Vestibulum malesuada quam quis sapien 
volutpat placerat. Phasellus varius, enim vel fringilla pellentesque, 
felis velit mollis lacus, ac scelerisque sem sem sed eros.</p>

</div>

<script src="hilitor.js"></script>
<script>

  window.addEventListener("DOMContentLoaded", function(e) {
    var myHilitor2 = new Hilitor("playground");
    myHilitor2.setMatchType("left");
    document.getElementById("keywords").addEventListener("keyup", function(e) {
      myHilitor2.apply(this.value);
    }, false);
  }, false);

</script>

Working with regular expressions

The setRegex() and getRegex() methods convert the input string of keywords into a JavaScript regular expression and vice versa. Note in particular the the regular expressions defined at the top of the script (endRegExp and breakRegExp).

When an input string is received:

endRegExp is used to trim unwanted characters from the start and end of the string; and then
breakRegExp is used to split the input string into separate words to be highlighted on the page.

The shorthand "\w" (written as \\w when instantiating a RegExp object) represents the set of alphanumeric characters (A-Za-z0-9) including the underscore (_). Our default regular expression also includes hyphens (-) and apostrophes (').

We change the behaviour of our script by changing these regular expressions. For example:

Search for strings that include spaces:

  myHilitor.setBreakRegExp(new RegExp('[^\\w\' -]+', "g")); // expanded to include spaces
  myHilitor.apply("input string,keywords,page");

Search for email addresses:

This regular expression will cover strings including "@" and ".":

  myHilitor.setBreakRegExp(new RegExp('[^\\w\'@.-]+', "g")); // expanded to include email characters
  myHilitor.apply("nobody@example.net");

Otherwise any email addresses (nobody@example.net) in the search keywords would be broken into separate words at each '@' or '.'.

Search for #hashtags:

Trying to match #hashtags in a string is problematic. Rather than /\b(#hashtag)\b/ you actually need just /(#hashtag)\b/ - because there is no \b before the #, while there is a \b between '#' and 'h' - which the code only supports if we make the search type "right" or "open":

  myHilitor.setEndRegExp('^[^\\w#]+|[^\\w]+$');
  myHilitor.setBreakRegExp(new RegExp('[^\\w\'#-]+', "g"));
  myHilitor.setMatchType("right");
  myHilitor.apply("match,#hashtag,code");

Similar problems exist when there are accented characters, and for that we have a separate branch of the code with UTF-8 support which might be more useful in real world situations.

Using an onload event

Since it's very unfashionable to call javascript inline, here is an example of how you can apply highlighting to the page using an event listener that only fires after the page has been completely rendered:

<script src="hilitor.js"></script>
<script>

  var myHilitor; // global variable
  window.addEventListener("DOMContentLoaded", function(e) {
    myHilitor = new Hilitor("content");
    myHilitor.apply("highlight these words");
  }, false);

</script>

The function call is identical to the example above, just delayed until the page has finished rendering rather than being executed as soon as the function call is encountered.

We've declared myHilitor as a global variable to enable later calls to remove or update keyword highlighting on the page using the remove() and apply() methods.

Link to turn off highlighting

You might be wondering how we generate the link that appears at the top of the page showing the words that have been highlighted alongside a link to remove the highlighting.

It's a bit of a hack, but the code is as follows:

(function() {
  var targetNode = document.getElementsByTagName("h1")[0]; // first h1 on page
  var el = document.createElement("p"); // create new p
  el.id = "remove_hilite";
  el.style.fontSize = "10px"; // styles can better be applied using css
  el.innerHTML = "<b>Highlighted terms</b>: " + myHilitor.getRegex() + " | <a href=\"#\" onclick=\"myHilitor.remove(); document.getElementById('remove_hilite').style['display'] = 'none'; return false;\">remove</a>";
  targetNode.parentNode.insertBefore(el, targetNode.nextSibling); // insert p element after first h1
})();

This inserts a new paragraph (P) following the first H1 heading on the page. The text displays the keywords and also a 'remove' link which calls the remove method. You can see an example here.

These commands can also be bundled into the onload method above, in which case you no longer need to wrap them in an anonymous function. The complete code then is as follows:

<script src="hilitor.js"></script>
<script>

  window.addEventListener("DOMContentLoaded", function(e) {
    // highlight search terms on page
    var myHilitor = new Hilitor("content");
    myHilitor.apply("highlight these words");

    // (optional) display link to remove highlighting
    var targetNode = document.getElementsByTagName("h1")[0];
    var el = document.createElement("p");
    el.id = "remove_hilite";
    el.style.fontSize = "10px";
    el.innerHTML = "<b>Highlighted terms</b>: " + myHilitor.getRegex() + " | <a href=\"#\" onclick=\"myHilitor.remove(); document.getElementById('remove_hilite').style['display'] = 'none'; return false;\">remove</a>";
    targetNode.parentNode.insertBefore(el, targetNode.nextSibling);
  }, false);

</script>

Here we don't declare a global variable as the on-page link is created from within the enclosure and can refer to the Hilitor object.

Identifying search keywords

To identify search keywords you need to parse the browsers HTTP_REFERER string. Depending on which search engine was used the keywords will usually appear after q= or query=, but there are also other possibilities.

Since 2011, unfortunately, Google does not pass this information for logged in users making it impossible for us to identify their search keywords.

References

MDN: JavaScript Regular Expressions

JavaScript Search Keyword Highlighting
JavaScript Highlighting Words With UTF-8 Support
JavaScript UTF-8 Search Highlight Demo

< JavaScript

User Comments

Most recent 20 of 78 comments:

Post your comment or question

MJP 9 November, 2021

Anyone have success using this script on a dynamically created DIV and it's content? I'm having a heck of a time doing it.

Nicholas 20 August, 2020

I am having a strange issue where on some words the 'mark' tags close half way through the input word. For example, when searching 'help' only the 'he' (and all instances of 'he', such as in 'heard' also) become highlighted. It happens on a number of random characters combinations without any pattern that I can find, but usually on about the 2nd-4th character. Any idea what is happening here? Thanks!

Alejandro 2 May, 2020

@Bharathi

There is n array of colors declared on line 10 of the source file:

var colors = ["#ff6", "#a0ffff", "#9f9", "#f99", "#f6f"];

You could just leave a single item in the array representing the color that u want. The example below is yellow only while highlighting.

var colors = ["#ff6"];

John 30 April, 2020

I have inserted the code of hilitor in my page. It works but search only word how can I set hilitor to search any character? Thank you.

Bharathi 14 April, 2020

Hi,i wish to highlight a word with yellow always.
where i need to change code..

Lars 10 January, 2020

Great example. Works for me. Only I cannot figure out how to make it work with text in an iframe. Do you by chance have an example to share?

Best regards

Lars

There is some discussion and code for this in the comments section if you click [show all comments] below

David19 2 June, 2019

I am trying search for keyword inside html source code i placed html source code inside div as contentEditable but it renders html! How to stop html content from rendering inside div snd make your script work with it? Thanks

David19 2 June, 2019

Thanks for cool demo. I placed a textarea inside playground but unfortunetly it does not higlight text inside text area! Could you tell me how to higlight texts inside textarea?

You can't add formatting inside a textarea element. What you need to do instead is hide the textarea and in it's place display a synchronised HTML (or ContentEditable) version.

Bernardo 11 December, 2018

Hi,

It helped but I could no achieve what I want.

I would like to pass a string from a text field and then load another page highlighting the text matched and jumping to its position

And when highlighting on keyup listener event, I would like to match just the exact string searched, not each keyword, is it possible?

Most all of that should be possible using the existing code. Some of the comments below include code snippets.

I don't have an example that does exactly what you want, but this may help (source).

Bernardo 10 December, 2018

Very nice feature.
I'm wondering if is it possible to jump to the position of the text found? Like jump to the searched text found.

Try the code from Alejandro's comment below

Alejandro 28 September, 2018

@Dave Hamstring

Hey Dave i was looking for the same functionality and your input gave me another alternative, see snippet below:

var after = node.splitText(regs.index); after.nodeValue = after.nodeValue.substring(regs[0].length); node.parentNode.insertBefore(match, after); // smooth scroll into view of first instance of hiliteTag document.querySelector(hiliteTag).scrollIntoView({behavior: "smooth", block: "center", inline: "center"}); } }; }

I included a bit of the top and the closing brackets at the bottom for context and lexical positioning.

Hope it helps

David Granberry 15 June, 2018

This hilitor.js looks promising. Can you use several predefined expressions and assign a color to each one? The text to be highlighted will be in a <textarea> input.
Any feedback is appreciated.
Thanks!

Dave Hamstring 30 April, 2018

Hi, hopefully this helps someone

I am using the open ended option, I wanted to make the first highlighted word scroll into view in the browser.

I added: node.parentNode.scrollIntoView(true);

line 97 of hilitor.js

Any feedback on whether this is a good way to do it would be great, thanks!

Pam Nelligan 5 February, 2018

Adding to Phil's code to show the # of matches, you will need to make this change in order to clear out the "matches" if someone backspaces to clear out the search box:

if (input === undefined || !input) { document.getElementById('matches').innerHTML = ""; return; }

Thanks, Phil! Your code worked great for me!

Romel 26 January, 2018

Hi,

I am really glad that i found this. But i wonder if its possible the separate the keywords by commas than spaces?

i use it on page load. i have set of keywords like myHilitor.apply("one two, three"). So i would like to have 2 highlighted keywords which are "one two" and "three".

Thanks!

Jerry 16 January, 2018

I've implemented the Hilitor Search Engine and love it. Two things: Is there any documentation on customizing filters. I noted the one ones on this site do a better job then the default that comes with the code. Second I saw some time back a demo where separate tallies of each unique word was displayed in a window. And remarkably changing as you typed. I'd love to get my hands on that code.

Thanks so much to all involved in this project!

Romain 6 November, 2017

Hello,

The lib for hilitor.js has been replaced by a lib that rotates every div on page. Dunno if you've been trolled and/or hacked, be careful.

Here is the changed file:
www.the-art-of-web.com/hilitor.js

Maybe other files has been changed, I didn't looked further.

That only happens when the file is hot-linked from another website. The originals are unchanged and can be safely copied:

www.the-art-of-web.com/hilitor.js
www.the-art-of-web.com/hilitor-utf8.js

Radhika 9 February, 2017

Good work, I face one problem i am not able to search for Supscript or superscript text in the HTML. Any suggestions please

Alex 8 February, 2017

Nothing to congratulate you on your work. I have the same problem, I use it to search for phrases.

There is one problem that I have and that is if a searched word has a punctuation mark (period,comma colon, etc) immediately after it, It doen't find the word to highlight it.

Any suggestions?

You will need to modify "breakCharRegex" as described above. In your case, to include spaces and other punctuation in the [^...].

dedalus 12 January, 2017

Thank you for your excellent script. I was wondering what is the regular expression for a case-sensitive match.

In this line:

matchRegex = new RegExp(re, "i");

It's the "i" that makes it case-insensitive.

See the documentation.