Regex for matching text that does not contain a given word

Often we would like to see if a input string is present in the given text or not. While, regular expressions are not the best way to do this, it is useful to understand how we can achieve this using regular expressions. I will present here two different regular expressions which match the text that does not contain a given input word. Both use negative lookahead but behave differently on long input text. We read the text to match from a file, so that we can test our regex against arbitrarily long text. The word that should not occur is passed as the second argument as this is generally short in length.
This is the first regex

var fs = require('fs');
var inputStr = fs.readFileSync(process.argv[2])
                 .toString().replace('\n','');
//regex is /^((?!word).)*$/
var matcher = new RegExp('^((?!' + process.argv[3] + ').)*$');
console.log(matcher.test(inputStr));

download
and this is the second regex

var fs = require('fs');
var inputStr = fs.readFileSync(process.argv[2])
                 .toString().replace('\n','');
//regex is /^(?!.*word).*$/
var matcher = new RegExp('^(?!.*' + process.argv[3] + ').*$');
console.log(matcher.test(inputStr));

download (3)
First regular expression causes stack overflow error by some regex engines (tested with JavaScript/Java) when input text is long and does not contain the given word. It takes twice the number of steps to find the match when compared to the steps taken by second regex.

Advertisements

Using bind method of Function object

When new elements are dynamically added to the DOM, appropriate event handlers need to be attached to them. When attaching event handlers proper context needs to be set so that the event handler routines can use this object in a productive way. Since JavaScript 1.8.5 doing this is made simple using bind method of the Function object. Prior to the introduction of bind method, Function.apply method is used to set the this context. The following html/js code illustrates a typical usage of bind method.

<!DOCTYPE html>
<head>
   <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.8.1/jquery.min.js"></script>
</head>
<html>
	<body>
		<div id="container"></div>
	</body>
	<script type="text/javascript">
		function changeBackgroundColor(){
			$(this).css('background-color', this.attr('data-backgroundcolor'));
		}
                //the following data is generally obtained from server as a result of some request
		var newDiv = $("<div id='newDiv' data-backgroundcolor='#abc'><h4>Sample content from server</h4></div>")
		newDiv.bind('click', changeBackgroundColor.bind(newDiv));
		$('#container').append(newDiv);
	</script>
</html>

Here changeBackgroundColor.bind(newDiv) creates a new bound function with newDiv as this context inside changeBackgroundColor method.

Simple for-loop startsWith and endsWith in JavaScript

After looking at all the answers posted at http://stackoverflow.com/questions/280634/endswith-in-javascript about implementing String endsWith in JavaScript, I was wondering why there is no simple for-loop implementation in that list of answers. While we all want a concise solution, for-loop endsWith is much more readable and is really really fast. Here is the link to jsperf

function endsWith(str, suffix) {
for (var suffixLength = suffix.length - 1, stringLength = str.length - 1;
suffixLength &gt; -1; --suffixLength, --stringLength) {
if (str.charAt(stringLength) === suffix.charAt(suffixLength)) continue;
else return false;
}
return true;
}
endsWith('abcdefghijklm', 'lam') //false
endsWith('abcdefghijklm', 'klm') //true

While we are at it, here is the for-loop for startsWith

function startsWith(str, prefix) {
for (var index = 0; index &lt; prefix.length; ++index) {
if (str.charAt(index) === prefix.charAt(index)) continue;
else return false;
}
return true;
}
startsWith('abcdefghijklm', 'abc') //true
startsWith('abcdefghijklm', 'def') //false

Trim a string using lookahead

I thought of implementing a simple trim operation using regular expression lookahead and came up with the following version for a start.

var str = "    sDAFSdasf afsd fads  sDAFSdasf afsd      ";
str = /(?!\s).*?(?=\s*$)/g.exec(str)[0];
//returns "sDAFSdasf afsd fads  sDAFSdasf afsd"

For comparing the performance of this lookahead, I picked up trim1 and trim11 from http://blog.stevenlevithan.com/archives/faster-trim-javascript. Here is the link to jsperf : javascriptlookaheadtrim Jsperf indicates that the trim using lookahead is quite fast on small strings and it is quite slower on larger strings. This is expected as this regex scans the entire string while other trim versions do not scan through the whole string.

Explanation of regex

(?!      # Assert that it is impossible to match the regex below starting at this position
   \s    # Match a single character that is a “whitespace character”
)
.        # Match any single character that is not a line break character
   *?    # Between zero and unlimited times, as few times as possible, expanding as needed
(?=      # Assert that the regex below can be matched, starting at this position
   \s    # Match a single character that is a “whitespace character”
      *  # Between zero and unlimited times, as many times as possible, giving back as needed
   $     # Assert position at the end of a line
)

Deep cloning an object in JavaScript

A JavaScript object can be cloned like this

var newObject = JSON.parse(JSON.stringify(objToBeCloned));

JSON based clone does not retain methods of the input object in the output object. Methods are properties of a object that are functions. If var obj = {add : function(a,b){return a+b;}} then add is a method of obj. This solution is good enough if your object does not contain any methods.

var x = {a:{b:{c:{'d':'e'}}}};
//create y, a clone of x
var y = JSON.parse(JSON.stringify(x));
console.log(y.a.b.c.d) //prints e
console.log(y === x) //prints false

If you are using a framework such as jQuery then you can use the extend method provided by it. This would also retain the methods of the original object.

var target = {};
jQuery.extend(target, source);

Following is the code of extend method used by jQuery (it is self-explanatory)

jQuery.extend = function() {
	var options, name, src, copy, copyIsArray, clone,
		target = arguments[0] || {},
		i = 1,
		length = arguments.length,
		deep = false;

	// Handle a deep copy situation
	if ( typeof target === "boolean" ) {
		deep = target;
		target = arguments[1] || {};
		// skip the boolean and the target
		i = 2;
	}

	// Handle case when target is a string or something (possible in deep copy)
	if ( typeof target !== "object" && !jQuery.isFunction(target) ) {
		target = {};
	}

	// extend jQuery itself if only one argument is passed
	if ( length === i ) {
		target = this;
		--i;
	}

	for ( ; i < length; i++ ) {
		// Only deal with non-null/undefined values
		if ( (options = arguments[ i ]) != null ) {
			// Extend the base object
			for ( name in options ) {
				src = target[ name ];
				copy = options[ name ];

				// Prevent never-ending loop
				if ( target === copy ) {
					continue;
				}

				// Recurse if we're merging plain objects or arrays
				if ( deep && copy && ( jQuery.isPlainObject(copy) || (copyIsArray = jQuery.isArray(copy)) ) ) {
					if ( copyIsArray ) {
						copyIsArray = false;
						clone = src && jQuery.isArray(src) ? src : [];

					} else {
						clone = src && jQuery.isPlainObject(src) ? src : {};
					}

					// Never move original objects, clone them
					target[ name ] = jQuery.extend( deep, clone, copy );

				// Don't bring in undefined values
				} else if ( copy !== undefined ) {
					target[ name ] = copy;
				}
			}
		}
	}

	// Return the modified object
	return target;
};

Regular expression generator for matching the anagrams of a given string

This post is related to the answer I gave to the stackoverflow question Is it possible to generate a (compact) regular expression for an anagram of an arbitrary string?
The following javascript code will generate a regex that will match all the anagrams of a given input string. The regex length will increase linearly with the length of the input. It generates a regex which uses positive lookahead to match the anagram of the input string. The lookahead part of regex makes sure all the characters are present in the test input string ignoring their order and the matching part ensures that the length of the test input string is same as the length of the input string (for which regex is constructed).

function anagramRegexGenerator(input) {
	var lookaheadPart = '', matchingPart = '^';
	var positiveLookaheadPrefix='(?=', positiveLookaheadSuffix=')';
	var inputCharacterFrequencyMap = {}
	for ( var i = 0; i < input.length; i++ )
	{
	    !inputCharacterFrequencyMap[input[i]] 
                  ? inputCharacterFrequencyMap[input[i]] = 1
	          : ++inputCharacterFrequencyMap[input[i]];
	}
	for ( var j in inputCharacterFrequencyMap) {
	    lookaheadPart += positiveLookaheadPrefix;
	    for (var k = 0; k < inputCharacterFrequencyMap[j]; k++) {
	        lookaheadPart += '.*';
	        if (j == ' ') {
	            lookaheadPart += '\\s';
	        } else {
	            lookaheadPart += j;
	        }
	        matchingPart += '.';
	    }
	    lookaheadPart += positiveLookaheadSuffix;
	}
	matchingPart += '$';
	return lookaheadPart + matchingPart;
}

Sample input and output is the following

anagramRegexGenerator('aaadaaccc')
//generates the following string.
"(?=.*a.*a.*a.*a.*a)(?=.*d)(?=.*c.*c.*c)^.........$"
anagramRegexGenerator('abcdef ghij'); 
//generates the following string.
"(?=.*a)(?=.*b)(?=.*c)(?=.*d)(?=.*e)(?=.*f)(?=.*\s)(?=.*g)
(?=.*h)(?=.*i)(?=.*j)^...........$" 
//test run returns true
/(?=.*a)(?=.*b)(?=.*c)(?=.*d)(?=.*e)(?=.*f)(?=.*\s)(?=.*g)(?=.*h)
(?=.*i)(?=.*j)^...........$/.test('acdbefghij ')
//or using the RegExp object
//this returns true
new RegExp(anagramRegexGenerator('abcdef ghij')).test('acdbefghij ') 
//this returns false
new RegExp(anagramRegexGenerator('abcdef ghij')).test('acdbefghijj') 

Inheritance in JavaScript

Javascript does not provide inheritance as a built-in feature like Java/C++ but it can be easily programmed by using the prototype property of a constructor function.

Every Javascript object has two internal properties ‘constructor’ and ‘__proto__’. For this post I will just confine to '__proto__' property as it is more instrumental in understanding the prototypal inheritance provided by Javascript. Also, the constructor property is quite intricate and demands an entire post for itself. __proto__ is also written as [[Prototype]]

Let us see what happens when we create an object invoking new operator on a constructor function.

var Animal = function() {
    // private instance level variables visible only inside this function code. 
    // Different for each object created using Animal constructor function.
    var species = '';
    // FIXED public instance variables. Different for each object created using Animal constructor function
    // Fixed means these cannot be changed at runtime.
    this.age = 0;
    this.eat = function(){console.log('EATING')};
    this.setSpecies = function(spec) {species = spec;}
    this.getSpecies = function(){return species;}
}
//public functions of the objects that are created using the Animal constructor function.
Animal.prototype = { 
    // DYNAMIC public instance variables. 
    // Different for each object created using Animal constructor function.
    // Dynamic means these variables can be  changed at runtime 
    // and the changes shall be propagated across all the   
    // instances.
    currentState : 'SLEEPING',
    sleep : function(){console.log('SLEEPING')} 
}
animal = new Animal();

When the new operator is invoked, the __proto__ property of animal object is set to the prototype property of Animal constructor function which means

animal.__proto__ === Animal.prototype //returns true.
animal.__proto__.__proto__ === Object.prototype //returns true.

When we call animal.sleep function the interpreter checks if the animal object has its own property 'sleep' and if it does not, then it searches for 'sleep' property in animal.__proto__ object.

Similarly when we call animal.toString() method, then it checks for toString property inside animal object and finds nothing there. Then it goes ahead and checks in animal.__proto__ object and finds nothing there. It does not stop there and checks for toString property in animal.__proto__.__proto__ object which points to Object.prototype object. It finally finds the toString property there and then executes it. And, since the Animal object did not override the toString method of Object object we get the raw output "[object Object]".

This form of chaining of prototypes forms the basis of inheritance in Javascript. Let us create a Dog constructor to see how we can create a inheritance hierarchy :

var Dog = function(){
    Animal.call(this); //initialize super class in the context of this.
    this.bark = function(){
        console.log('BARKING')
    }
}

Dog is an animal so it needs to inherit from Animal. All we need for this is to set the prototype of Dog constructor to point to a Animal object and the constructor of prototype of Dog as Dog itself. We need to set the constructor of prototype of Dog to Dog so that all the instances of dog will have Dog as the constructor function and not Animal.prototype

Dog.prototype = new Animal(); 
Dog.prototype.constructor = Dog;

Above statement sets Dog.prototype.__proto__ to Animal.prototype which is what we wanted. We now instantiate dog objects using the Dog function we created.

dog = new Dog();

We can now call dog.bark() , dog.sleep(), dog.eat() and dog.toString() seamlessly.