Interesting short post from John D. Cook this week:
Data scientists often complain that the bulk of their work is data cleaning. But if you see data cleaning as the work, not just an obstacle to the work, it can be interesting. You could think of it as data pathology, a kind of analysis before the intended analysis.
It’s a point I try to make a lot in my courses, that cleaning your data is not just necessary but also teaches you about what you’re studying. I think the data pathologist is a nice concept, and not one I would have reached for myself.