Precision vs Recall

In all applications involving information retrieval, one of the main concerns would be to balance Precision and Recall parameters. The general application involves a scenario where a query is submitted by the user and the documents that match the query in the corpus are retrieved using some information retrieval model. Well known information retrieval models include Vector space model, Latent semantic indexing and several other probabilistic models.

Precision refers to the ratio –> Number of relevant documents retrieved / Total number of retrieved documents.
Recall refers to the ratio –> Number of relevant documents retrieved / Total number of relevant documents.

Now one can observe that Recall will be 1 if one returns all the documents in the corpus but the whole application’s performance will be in jeopardy if one returns all the documents in the corpus, thus rendering the application useless. On the other extreme is when we return no documents, then the Precision is infinity. But even then the application is unusable because no documents are rendered to the user.

Balance between precision and recall has to be established in the applications involving Information Retrieval. Due to the ambiguities in the natural language, many irrelevant documents are returned resulting in a text search applications with low precision. One of the main reasons that Google has established its ground as a giant is due to its Page rank algorithm which increases the precision a lot. While developing the applications one can improve the performance by giving the proper weights to precision and recall as required by the specifications of a particular application.

For example, there might be some applications where all the relevant documents need to be displayed irrespective of the total number of the retrieved documents. In this case one needs to give a high weight to the recall parameter while designing the search query. Commercial search engines require a high degree of precision as the number of web pages  (each web page can be viewed as a document and the set of all web pages is the corpus) they have to index is in zillions. There are several measures like F1_score in which precision and recall are evenly weighted.


Regex for matching text that does not contain a given word

Often we would like to see if a input string is present in the given text or not. While, regular expressions are not the best way to do this, it is useful to understand how we can achieve this using regular expressions. I will present here two different regular expressions which match the text that does not contain a given input word. Both use negative lookahead but behave differently on long input text. We read the text to match from a file, so that we can test our regex against arbitrarily long text. The word that should not occur is passed as the second argument as this is generally short in length.
This is the first regex

var fs = require('fs');
var inputStr = fs.readFileSync(process.argv[2])
//regex is /^((?!word).)*$/
var matcher = new RegExp('^((?!' + process.argv[3] + ').)*$');

and this is the second regex

var fs = require('fs');
var inputStr = fs.readFileSync(process.argv[2])
//regex is /^(?!.*word).*$/
var matcher = new RegExp('^(?!.*' + process.argv[3] + ').*$');

download (3)
First regular expression causes stack overflow error by some regex engines (tested with JavaScript/Java) when input text is long and does not contain the given word. It takes twice the number of steps to find the match when compared to the steps taken by second regex.

Parsing a sms thread in a readable way on Android

First get SMS Backup & Restore app from Play store and backup all the sms and then take the xml and run the following snippet to generate a readable conversation in a text file.

from __future__ import print_function
from xml.dom import minidom

xmldoc = minidom.parse('/Users/narendrayadala/Downloads/sms-20130317085953.xml')
sms_file = open('/Users/narendrayadala/Downloads/Jyothi.txt', 'w')
smslist = xmldoc.getElementsByTagName('sms')

for sms in smslist:
    if sms.attributes['contact_name'].value == 'Jyothi Nerella':
        if sms.attributes['type'].value == '1':
            print(sms.attributes['readable_date'].value + u' - Jyothi: ' + sms.attributes['body'].value, file=sms_file)
            print(sms.attributes['readable_date'].value + u' - Narendra: ' + sms.attributes['body'].value, file=sms_file)

Indenting blocks of text easily on MacVim

Being a windows user previously, I prefer selecting text using Shift+Arrow keys and increase/decrease indent of the text using Tab/Shift-Tab keys. Doing this on MacVim requires a little fiddling with the .vimrc file. The following needs to be added to the .vimrc file.

vnoremap <Tab> >gv
vnoremap <S-Tab> <gv

if has("gui_macvim")
    let macvim_hig_shift_movement = 1

First section of the code causes vim to use Tab key as a mapping for >gv key sequence and second section of the code causes vim to use Shift+Arrow keys to select the text.

Using bind method of Function object

When new elements are dynamically added to the DOM, appropriate event handlers need to be attached to them. When attaching event handlers proper context needs to be set so that the event handler routines can use this object in a productive way. Since JavaScript 1.8.5 doing this is made simple using bind method of the Function object. Prior to the introduction of bind method, Function.apply method is used to set the this context. The following html/js code illustrates a typical usage of bind method.

<!DOCTYPE html>
   <script src=""></script>
		<div id="container"></div>
	<script type="text/javascript">
		function changeBackgroundColor(){
			$(this).css('background-color', this.attr('data-backgroundcolor'));
                //the following data is generally obtained from server as a result of some request
		var newDiv = $("<div id='newDiv' data-backgroundcolor='#abc'><h4>Sample content from server</h4></div>")
		newDiv.bind('click', changeBackgroundColor.bind(newDiv));

Here changeBackgroundColor.bind(newDiv) creates a new bound function with newDiv as this context inside changeBackgroundColor method.

A simple shell script to verify checksums

Often I find myself comparing checksums manually which is a bit cumbersome, so I wrote this trivial shell script to automate the verification.

if [ $# -lt 3 ] ; then
        echo 'usage -- hashfunction filename checksum'
        exit 1
elif [ `$1 $2 | awk '{print $1}'` = $3 ] ; then
        echo 'File not corrupted. We are saved.';
        echo 'File corrupted. We are doomed.';

Usage: sha1sum foobar.txt 988881adc9fc3655077dc2d4d757d480b5ea0e11

If there are multiple files and their corresponding checksums, then one can write them in a file in the following format

988881adc9fc3655077dc2d4d757d480b5ea0e11  foobar1.txt
988881adc9fc3655077dc2d4d757d480b5ea0e11  foobar2.txt
988881adc9fc3655077dc2d4d757d480b5ea0e11  foobar3.txt
988881adc9fc3655077dc2d4d757d480b5ea0e11  foobar4.txt

and run the following command to verify all of the checksums at once

sha1sum -c checksum.txt

Generating all possible postfix expressions

The following Java program returns the list of all possible postfix expressions for a given list of operands and operators. The number of all postfix expressions grows exponentially with the number of operands and is around 5 billion for 10 operands and 4 operators (*,+,/,-). The precise number of all possible postfix expressions is C(n-1)*(noOfOperators^(n-1)) where n is the number of operands and C(n-1) is the n-1th Catalan number.

import java.util.ArrayList;
import java.util.List;
import java.util.Stack;

public class PostfixExpressionGeneratorSansRecursion {
	private static final String [] OPERATORS = new String[] {"*", "+", "/", "-"};
	public static void main(String[] args) {
		List<String> x = new ArrayList<String>();
		for (String ex : generateAllPostfixExpressions(x)){
	 * Assumes input is a list of single digit positive numbers.
	 * @param operands
	 * @return list of all possible postfix expressions for the given list of operands
	private static List<String> generateAllPostfixExpressions(List<String> operands) {
		List<String> allPossiblePostFixExprs = new ArrayList<String>();
		Stack<String> stack = new Stack<String>();
		stack.push(""); //start with a empty partial expression
		int finalPFExprLength = 2*operands.size() - 1;
		int sizeOfOperands = operands.size();
		while (!stack.isEmpty()) {
			String partialExpr = stack.pop();
			int partialExprLength = partialExpr.length();
			if (partialExprLength == finalPFExprLength) {
			int noOfOprdsConsumed = 0;
			for (char c : partialExpr.toCharArray())
				noOfOprdsConsumed += Character.isDigit(c) ? 1 : 0;
			if (2*noOfOprdsConsumed - partialExprLength > 1)
				for (String op : OPERATORS)
					stack.push(partialExpr + op);
			if (noOfOprdsConsumed != sizeOfOperands)
				stack.push(partialExpr + operands.get(noOfOprdsConsumed));
	    return allPossiblePostFixExprs;