Regex for matching text that does not contain a given word

Often we would like to see if a input string is present in the given text or not. While, regular expressions are not the best way to do this, it is useful to understand how we can achieve this using regular expressions. I will present here two different regular expressions which match the text that does not contain a given input word. Both use negative lookahead but behave differently on long input text. We read the text to match from a file, so that we can test our regex against arbitrarily long text. The word that should not occur is passed as the second argument as this is generally short in length.
This is the first regex

var fs = require('fs');
var inputStr = fs.readFileSync(process.argv[2])
                 .toString().replace('\n','');
//regex is /^((?!word).)*$/
var matcher = new RegExp('^((?!' + process.argv[3] + ').)*$');
console.log(matcher.test(inputStr));

download
and this is the second regex

var fs = require('fs');
var inputStr = fs.readFileSync(process.argv[2])
                 .toString().replace('\n','');
//regex is /^(?!.*word).*$/
var matcher = new RegExp('^(?!.*' + process.argv[3] + ').*$');
console.log(matcher.test(inputStr));

download (3)
First regular expression causes stack overflow error by some regex engines (tested with JavaScript/Java) when input text is long and does not contain the given word. It takes twice the number of steps to find the match when compared to the steps taken by second regex.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s