Monday, August 4, 2008

Log Transformation

One of the most commonly used data transformation method is taking the natural logs of the original values. Log transformation works for data where the errors/residuals get larger for larger values of the variable (s). And this trend occurs in most data because the error or change in the value of a variable is often a percent of the value rather than an absolute value. For the same percent error, a larger value of the variable means a larger absolute error, so errors are larger too.

For example, a 5% error translates into an error that is 5% of the value of the variable. If the original value is 100, the error is 5% x 100, or 5. If the original value is 500, the error becomes 5% x 500, or 25.

When we take logs, this multiplicative factor becomes an additive factor, because of the nature of logs.

log(X * error) = log(X) + log(error)

The percent error therefore becomes the same additive error, regardless of the original value of the variable. In other words, the non-uniform errors become uniform. And that's why taking logs of the variable(s) helps in meeting the requirements for our statistical analysis most of the times.

A New View of Statistics website

No comments: