Geographical and Statistical Thoughts

Monday, April 28, 2008

Inflation

Stephen Few is a consultant who specializes in graphic design of business intelligence. He's a practical Edward Tufte. He did a 2 day training at Coldwater Creek and I keep up with his blog. He posted an article this week from a friend of his - Jonathan G. Koomey.

The article discusses the necessity to adjust for inflation when analyzing monetary trends over time. It is somewhat obvious - but who actually does it? Economists yes - its what they do. But do we do it for our statistical and geographic endeavours? Thanks to Allan Greenspan, and our low inflation rates, it has really been almost a moot point for the past handful of years.

I started thinking. What other inflationary type effects exist that should routinely be controlled for? If your business has changed pricing strategy, then how have you adjusted forecasted sales? Is your product mix constant? Has your percentage of sales per channel evolved?

An average order value, year over year, may have changed significantly. A probability to purchase a swim suit, when we are now focusing on dresses, has lost historical meaning. And if you are doing life time RFM - is a $50 purchase 10 years ago still $50? Do you correct that for inflation or decrease it since its NPV is less? Would you increase costs as well if you are forecasting profit?

Thus, in the RFM mix, only one component is changing and small M changes could mix things up quite a bit. It could work against recency as an inflation adjustment would increase sales - which would be the wrong thing to do. From a store sales forecasting/analysis perspective, when looking over 10 years, correcting for the changing nature of monetary values would obviously be necessary, but for a year-over-year statistical models - I'd have to play first and see.

Sunday, April 20, 2008

Drive Time Errors

Drive time software is very popular in real estate research. Click on the map and it uses computerized roads to estimate how long it will take to drive from where you clicked. Typically, the software will estimate travel time bands, so, you'll end up with an area on the map that represents travel time of at most 10 minutes, 20 minutes...

Its interesting to note that these estimates are taken as exact. Of course, they are not. When you click on the map, you are not asked for what day, or what time of day you are estimating travel. We all know about rush hour and we all know that Saturday is different than Tuesday. In some areas, summer is different than winter.

It would be a lot to ask the drive time software to estimate this. They'd have to have a tremendous amount of data with each road segment to make it happen. The cost alone would surpass the benefit to knowing this precisely.

But one could estimate it. When making a drive time analysis, increase and decrease the estimate by 10, 20, and 30%. Have 10, 9, 8 minute drive time bands. Then conduct sensitivity analysis - what different decision(s) would you make if the travel time is really 8 or 11 minutes?

If the decision is the same, then it is probably straight forward. If the decision is different, then the right decision is sensitive to the underlying assumptions. More than likely, drive time is not the only data point that is confusing the decision, but not completely trusting a drive time estimate may help point out uncertainty.

Sunday, April 13, 2008

So You're a Statistician...

I was out for a bike ride on Saturday riding with a new group of friends. I always enjoy the conversation when people learn that I am a statistician - the first look is always the I-remember-that-class-from-college and then a what-do-you-really-do look?

After explaining, they usually get excited and have some understanding of the enjoyment I get helping people make outstanding business decisions. But almost inevitably, the comment comes up about being precise. The comment may be about my balancing our checkbook to the penny (ha!), or being intolerant of mistakes at work.

I always laugh and explain that they were not paying attention in college! Statistics is not about being precise - it's the exact opposite. It's understanding that a precise estimate is not realistic, so you have to include wiggle room when guessing the future. Rather than Tiger Woods will win the Masters today, it's he'll place in the top 10.

It's interesting to see when a "statistical" view point is newsworthy - how many articles do you read that say we are, or are not, currently in a recession? Here, it is advantageous to not be precise. Politicians try not to be precise as they enjoy having wiggle room to work with.

But, it'd also be great to hear more news like a statistician too - not mission accomplished, but we're approaching the expected value of the engagement. Rather than earnings of 7 cents per share, it's we'll do well this quarter.

Actually, that'd be interesting - more wiggle room will drive the precise people I know crazy! And that's fun to watch!

Sunday, April 6, 2008

Right Question

Another sports post this week. 60 Minutes did a piece on Bill James, the Boston Red Sox data nerd. He’s credited with helping them win 2 World Series. Even if you don’t like baseball, it is an interesting data nerd piece, since Bill has developed innovative metrics to measure baseball performance. Bill says the secret is that you have to ask the right question.

As a data nerd, I appreciate his point. I have often found that the best way to answer a particular question is to answer another question. The trick is not answering either question, but tying them together to provide proper context for a decision. Every decent data nerd learns this trick, but let's consider 2 situations where I know the asked question is wrong.

First, consider direct marketing response rates. When I was in the credit card industry, I created very successful statistical response models that had over 2% in response. Today, those response rates are nearly ¼ of a percent! At 98% wrong, I was successful. Now, their model’s responses are damn near 100% wrong!

As the linked article says, these basis point response rates are still a success for the banks. And this is where the direct marketing field needs their Bill James - how can you be 100% wrong and be right?

For awhile, I’ve thought that the response distribution needs academic attention – its nearly binomial – you either responded or you didn’t – but it also has a continuous piece since each customer is a measurable profit stream. To address this special case binomial, the industry combines logistic regression (for response) with linear regression (for the profit stream). This combination results with profitable successes - that are 98% wrong.

For site selection, success is equally as complex. If a store does well, is it because of the location, the merchandise, or because of the specific store management? Is success viewed just within the one store or the network of stores in a market? ROIC may be the ultimate success measure from a real estate strategy point of view, but numerator in the equation is driven by sales that is dependant on the humans actually running the cash register, providing customer service, and selecting the merchandise.

As the economy continues to evolve, we are seeing more retailers changing their minds in their store strategies. Lots of closed stores for Talbots, Ann Taylor, Sigrid Olsen, and perhaps more are coming. How did Ann Taylor measure success? They are closing 117 out of 850 stores - that's more than 1 mistake for every 8 decisions (13.8%). That's a huge capital investment error.

What questions should we be asking? Who is our Bill James? Is it you?