May 22, 2009

Data Preparation - An important step people always forget.

Welcome back to practice of good econometrics, hope my post are helping you a lot. I believe in less and quality, so I post seldom.

Today's topic is "Data Preparation".


Data Preparation is different from data cleansing, often people use these words interchangeably. We have already learn what do in data cleansing in my earlier posts, now let’s look at what is data preparation and what we do here.

Data Preparation can be said as an understanding of the data that allows us to build the right model, right first time. It helps us in understanding the information enfolded in the data, can be between two independent variables and dependent and independent variables. Once, relationship is identified and traceable, then the predictor variable is re-expressed to reflect the uncovered relationship, and consequently tested for inclusion into the model.

First and prior methods of data preparation are “Correlation analysis” and “Scatter Plots”.


  1. Correlation Analysis:
  • Correlation analysis provides “correlation coefficient” which is a measure of the strength of the linear-relationship between two variables.
  • Guidelines for Correlation Coefficient.
  1. Zero (0) indicates no linear relationship.
  2. +1 indicates a perfect positive linear relationship: as one variable increases in its values, the other variable also increases in its values via an exact linear rule.
  3. -1 indicates a perfect negative linear relationship: as one variable increases in its values, the other variable also decreases in its values via an exact linear rule.
  4. Values between 0 and 0.3 (0 and -0.3) indicate a weak positive (negative) linear relationship.
  5. Values between 0.3 and 0.7 (-0.3 and -0.7) indicate a moderate positive (negative) linear relationship.
  6. Values between 0.7 and 1.0 (-0.7 and -1) indicate a strong positive (negative) linear relationship.
  • Caution with Correlation Coefficient.
  • a) Correlation coefficient is a reliable measure only if the underlying variables exhibit linear relationship. If the underlying relationship is known to non-linear then Correlation coefficient misleads or questionable.
  • b) Hence, one needs to test the linearity assumption of the correlation coefficient, which can be done a Scatter plots.



Scatter Plot Analysis:


a. Scatter plot is a graph which represents mapping of the paired points (Xi, Yi).

b. If the scatter of points appears to be a straight-line, then the linear assumption is satisfied and correlation coefficient provides a meaningful measure.

c. If not then linear assumption is not satisfied and correlation coefficient is questionable.

d. Hence, scatter plots are desirable.


Mar 30, 2009

Analytic Companies should avoid following practices for a better growth

Hi,

Welcome after a gap, yes I was busy with just collecting some detailed info on analytics practices in Indian companies.

I found following listed 'common wrong practices' made by several companies who say they offer analytic services.

1) Engagement of Non Analytical background person as a Manger or Head for analytical team:
a) This first creates a confusion of where exactly the analytics practice comes into picture as a part of Business solutions when proposed to the clients.
b) Mainly in the nature of the work to be offered or proposed, most non analytical background person does get confuse and most of the times offer the work related to data warehousing or management related.

2) One thing all the companies or managers should note that analytics starts with data not with data management.

Others common wrong practices will continued in next posts.

Jan 19, 2009

What is a trend in a given time series

  • Trend is like a growth or decay that reflect the tendencies in a data viz. increase or decrease over a time period.
  • Most of the time series data have some form of trend.
  • And most widely used technique for nullifying a trend is employment of  a time index as an independent/explanatory variable in the Regression (or ARIMA) model employed.