News Round Up for the Week Ending 26th March 2010

There has been a lot of interesting news in the world of analytics this week.  Lets see what some of that means for us as analytics professionals.

Have you ever been a little taken aback by the recommendations for you at Amazon or the advertisements that pop up on your mail server? Write to your friend in Dubai about whether she is happy in her job, and on the right- hand side of the page you will notice a little link for “UAE Recruitment Consultant”. The computer, tagging words like “jobs” and “Dubai” together, is attempting to out-think you!

In the last couple of years, the analytics industry has been struggling with the complicated questions of how much information is too much information.  When does the concern for privacy outweigh any potentially useful application of additional personal details? As time passes, and we leave behind a bigger, more indelible track in our online lives between our multiple mails, social media interactions and browser history, it is time to take a step back and re-think some of these considerations.

In the last fifteen days two companies have done just that.

  1. Netflix – a popular US based video mail services runs an annual competition among leading statisticians to improve its recommendation services. Put simply, what this means is that if someone has rented Dilwale Dulhania Le Jayenge three times, is he more likely a Shahrukh Khan fan or a fan of romantic movies or just someone who was born in the 1980s. The better you understand this, the more likely are you to recommend the next best selection to him. And hundreds of amateur and professional statisticians worked on existing Netflix user data to improve these algorithms. On the 12th of March this year, Netflix decided to stop sharing this data for further analysis amidst fears that it was illegal to share private customer information such as video preferences
  2. GoogleOpt Out For long people have worried that between our search histories and emails Google knows everything about us. Facing increasing censure about how it could potentially use this information, Google has recently announced that it will soon announce an opt-out feature, clicking on which will ensure that our data will not be used by Google Analytics. Radical change or lip service? Only time will tell.

As analysts, we can do our bit to share that we don’t violate any of the written (or unwritten) codes of data privacy. Here is how:

  1. Analyze- Don’t Speculate: Remember the time you looked at those credit card transactions and wondered how much that person really earned?  The first rule of ethical data management is to analyze and not speculate, look for patterns and not individual cases and outliers, and use only the information that is actively required for analysis
  2. Don’t share information, don’t share anecdotes: While most analysts respect data privacy and don’t share it with any body else, just as unethical is the cocktail hour anecdote about the information at your disposal. Respect your unnamed client’s privacy and avoid talking about it
  3. Question Yourself. Is My Analysis Really Helping the Customer? When in doubt, stop. And start again. Any analysis that leads to meaningless non actionable insight is nothing more than an intrusion on privacy

Data mining and analysis are extremely powerful tools. But as our venerable friend Mr. Spiderman once said, “With Great Power Comes Great Responsibility” Be responsible with the data you have, and don’t ever spend a sleepless night again!


Gunjan Thakuria, one of our finest consultants, educates us on Churn Analytics

The landscape and dynamics of the telecommunication industry has changed drastically with so many service providers entering the market. The Indian telecommunications industry is one of the fastest growing in the world and India is projected to become the second largest telecom market globally. According to TRAI, the number of telecom subscribers in the country increased to 562.21 million in December 2009, an increase of 3.5 per cent from 543.20 million in November 2009.With so many service providers fighting it out for the same customer base, there is  lot of focus and attention given to churn reduction and customer retention. The fact that customer acquisition is a very expensive exercise has led to more emphasis given to customer retention strategies.

To determine customer retention strategies, it is very important to determine which are the customers who are most likely to churn, and then device strategies based on that. Logistic regression methodology is extensively used to predict churn. Using logistic regression it can be determined not only who is going to churn, but also what the drivers of churn are. It tries to model the log of odds of churning taking into consideration the various characteristics of the customers. The equation in a logistic regression is as follows:

Log (p/1-p)= B0 +B1 X1 + B2X2 +B3X3


P is probability of churning

X1, X2 and X3 are the covariates effecting churn

Bo is the intercept and B1, B2 and B3 are the coefficients

The intercept and coefficient values are determined using the maximum likelihood estimation. Before performing a logistic regression, the data set is divided into two parts, training and validation. The model is developed on the training set and the probability model is validated by using the equation on the validation set. For the model to be validated decile analysis, lift chart and confusion matrix has to be checked.

The customers are then grouped according to their propensity of attrition. Once this is done it becomes easier for the CRM team to device strategies for customer retention. Usually different types of customer retention campaigns are done on the different groups of customers.

This methodology of churn reduction has proved to be highly profitable and productive for telecommunication companies in reducing churn and hence increasing profitability.

Our Kolkata Director, Priyadarshini Bishnu, advises young students to spread their wings and fly!

Kolkata, the city of joy, is definitely the best place to be in the whole wide world. I have been born and brought up in this city and even though I have visited other countries I have never had the inclination to settle abroad or even to settle in some other parts of this country.

I simply love the laid back attitude, the ‘phuchkas’, the ‘misti’ and the warm smell of the summer sun. I would in all probability never be able to settle anywhere else. Here lies the problem! Though the city is absolutely amazing there are so many things that have not quite worked out, like for instance job opportunities are scarce, infrastructure is limited and the professional rat race, that though psychologically challenging at times is extremely essential if one wants to do something more with his or her life.

I would, therefore, ask my fellow citizens to move base if required. Do not feel scared to take up a job elsewhere. I have travelled extensively across India and cities like Mumbai and Bangalore are so much better in terms of securing a job. So don’t be shy or scared to step out.

Today, almost all the big and medium sized MNCs have their headquarters in Mumbai, Bangalore or Delhi so even if not now, tomorrow you will have to re-locate in order to work at a senior managerial level, so why not now?

Open your eyes- opt for courses that will enable you to get a nice cushy job and don’t think twice before re-locating, after all your home will always remain your home and you can always come back to it once everything is done and you have made pots full of gold, rubies and diamonds

Hoor, our Bangalore Excel trainer, shares some little known text functions with you

FIXED Function:

The FIXED function rounds a value to specified precision and then converts the rounded value to text. The function uses the following syntax:

=FIXED (number, decimals, no _commas)

The number argument supplies the value that you want to round and convert to text. The optional decimals argument tells Excel how many places to the right of the decimal point that you want to round. The optional no_commas argument needs to be either 1 (if you want commas) or 0 (if you don’t want commas) in the returned text.

For example, to round to a whole number and convert to text the value 1234.56789, use the following formula:

=FIXED (1234.56789,0,1)

The function returns the text 1,235.

REPT function:

The REPT function repeats a text string. The function uses the following syntax:

=REPT (text, number _ times)

The text argument either supplies the text string or references the cell holding the text string. The number_times argument tells Excel how many times you want to repeat the text. For example, the following formula:


Returns the text string BoraBora.

VALUE function

The VALUE function converts a text string that looks like a value to a value.

The function uses the following syntax:

=VALUE (text)

The text argument either supplies the text string that you want to convert or it references the cell holding the text string. For example, to convert the text string $123,456.78 — assume that this isn’t a value but a text string you can use the following formula:

=VALUE (“$123,456.78”)

The function returns the value 123456.78.

Welcome New Batches

February 1, 2010

The last few days has seen new batches in Gurgaon, Hyderabad, Bangalore and Kolkata. And what wonderful batches they are- we have a doctor with us this time, and people with extensive experience across MR, Operations, Technology and Sales!

A huge welcome to all our new students. Please use the comments space in this thread to suggest analytical questions that you would like answers to, and we will try to see how we can help you!

Warm Regards

The ATI Team

How TV Ratings Work

January 15, 2010

In light of all the brouhaha about the late night shows in the US, it would be interesting to understand how television ratings really work.

The company Nielsen that conducts statistical research began operations in 1923 to sell engineering performance surveys.  The company soon moved on to market research eventually expanding its repertoire to include a national radio rating survey (yes, those were the days). The company eventually moved on to television rating in the 1950’s and has formulated the analysis pattern used by TV rating agencies across the world now.

The key component of the Nielsen ratings is, of course, statistical sampling. Most of what we refer to as television rating comes from a little box called the daily meter which captures what channel the household logs in to and the  People Meter which captures which members of the household watch the show. The all-inclusive number of households  participating in Nielsen daily meter system each year  25,000 out of a grand total of more than 110 million American households that own a Television Set. In addition, Nielsen also sends out millions of paper diaries during November, February, May and July (called “sweeps” in the industry parlance) for households to enter all viewing decisions during these periods.

A combination of these two data sets is used to create TV ratings. A TV rating of 8.5/12 means that 8.5% of the total households watched that show and that 12% of the households who were watching TV at that time watched that show. The numbers that most networks look at though, are the rating in the 18-49 age group as (presumably) that’s where the big spenders lie.

And how do we measure television ratings in India? The process is very simple. Sample households across India’s 75 largest towns are given the  People Meter to capture their viewing decisions. This sample, of course, comes from the estimated 130  million Indian households that have television. The company that conducts this research in India is an industry body called TAM (Television Audience Measurement)

The key issue in case of both of these ratings is obviously the sampling decision. Are these sample sizes enough? Do they accurately cover all ethnicities and economic sections? In countries with vast economic and cultural diversities like India and US can any sample size except for the absolute population be representative? A key part of both TAM and Nielsen’s strategy, therefore is sample design. And while these ratings may not be perfect, they ARE representative. If there’s one thing our many years in the Analytics Industry has taught us, is that some information-however little-is ALWAYS better than no information.

The problem however is the lack of options. For better or for worse, with minor changes, the Nielsen way of doing things has been the established industry norm largely because the huge size and cost of this endeavour. With media spends on television advertising increasing by the day, and television viewing patterns altering dramatically with the introduction of videos on the internet and the DVR, it is only time before someone comes up with a better algorithm to tell us what programs on television are really being watched by most people.

Websites We Love: Econtalk

December 20, 2009

The Library of Economics and Liberty has put together an excellent collection of talks by major economists and statisticians on myriad topics on this website.  Host Russ Roberts talks to featured guests, professors, authors, and Nobel Prize winners about the economics behind current events, markets, the Great Depression, free trade, and the curiosities of everyday decision-making and New talks are updated almost every week, and while you may not always agree with the experts , it is always fun to listen to them. Some of our favorites are Nassem Taleb on the current financial crisis (,  The great Milton Friedman himself on Money ( and the always entertaining Michael Lewis on the Economics of Sports ( Add a couple of these talks to your I Pod and get an excellent mental and physical work out the next time you go for a walk.

Pay special attention to the related links beneath each talk for additional related reading and the comments that (like everywhere else on the internet, occasionally get lively! Pay a visit to the website right now, and we promise you won’t be disappointed!