Five Dogmas of Statistics

29.3.2014

This is a programmatic post, compiling some of my most fundamental personal beliefs about statistics. To all of its content I of course appreciate any comments and discussion, despite the partially polemic manifesto style that kept me having fun writing it.

The text is structured as follows: It states five dogmas:

  1. Modeling is at the core of statistics.
  2. Statistics is a toolbox.
  3. What is processed in statistics is information, not numbers.
  4. All research is exploratory.
  5. There are situations when statistics is totally useless. Sometimes it is even dangerous.

For all of these claims, I am discussing a.) what they mean to me, b.) why I believe they're right and c.) what their general and practical implications would be. I am not claiming that all these thoughts are particularly new or ground-breaking, nor are the thoughts on them presented here meant to be exhaustive. I do think, however, that they should receive more attention and discussion in the daily business of statistics. That's why I am posting this. Apart from the fact that I think it's not bad to have such a programmatic positioning as one of the first entries in a new blog.

Oh, and a quick meta-note: in the entire blog and especially in this article some text is not black but gray. These are mostly additonal thoughts, arguments or illustrations which I find interesting, but are not necessary for the core points of the post. They can be safely skipped if you don't feel like reading tons of text today.

1. Modeling is at the core of statistics (not inference or estimation).

a.) What does that mean?When people in research-oriented studies like social sciences have their first encounter with statistics, it is usually in a class that starts with teaching them practical procedures: a method for estimating the mean of a normal, or how to check for a group difference with a t-test. That is, the usual way of teaching statistics makes students conceptualize it as a set of procedures. The models are discussed somewhat stepmotherly, often in terms of lists of conditions under which such and such procedure can be applied. I firmly believe that this is just the reverse of a sensible ordering. I think 1. logically primary and 2. the more important skill to convey is the grasping of a situation or structure as a data-generating mechanism, which is tried to cast into a formal model. A model means a decision about what variables and relations are of interest right now, how these relations could look like in general, and what aspects of the situation are neglected inspite of better knowledge. Only then comes the second step of thinking about how to use this model to extract certain information from the data, i.e. estimate a rate or mean parameter or check whether two variables influence each other in a relevant manner. That is for example, a t-test should not be conceptualized and taught as a method for finding a group difference. You should start thinking about the situation of interest, which might be reasonably modeled as independent drawing from two different normal distribution. Having identified this, it turns out in the second step that using a t-test might be a way to answer your target question. Of course, this second step might involve backtracking and correcting the model, in order to make certain tools applicable, since everywhere in science, the models you build are also influenced by the tools you have available and I don't think this a bad thing.

b.) Why? Although it sounds theoretical, this is a pragmatic dogma. I believe the best reason for it is that embracing it makes you do more useful statistics. Because you start out with getting into the real-world situation you are confronted with, and not trying to find one procedure from your always limited repertoire of procedures that fits best/least badly. This, I think, results in more adequate models of the situations, in turn enabling you to better extract information from the generated data, i.e. do better estimation and testing. The downside is that you might have constructed a model for which you don't know any or there are no established procedures. But then you can still go on research or develop them. Or modify your model slightly to be able to use technology you know for a related model -- which gives you, if you do it this way round, a clear feeling for where the assumptions in your model deviate from what you actually think the situation is, a valuable interpretation in interpreting analysis results and judging their credibility. Also, thanks to computers and the huge progress made in numerical as well as Monte Carlo techniques, nowadays immensely more models are tractable then a few decades ago -- including millions of situation-specific models nobody has ever set-up, used or analyzed before.

c.) Implications An important factual consequence of people embracing this dogma is that they will create and use more domain-specific, in total more diverse models. They will stop assuming everything to be normally distributed, and in general use less often standard procedures like a t-test. These procedures have become so much of a default that it sometimes seems like nobody actually remembers the model behind this approach -- you are claiming that your data were generated by drawing independent samples from two normally-distributed variables.

An important normative consequence of the dogma is that it means that statistics cannot be done without stochastics. In order to model situations properly, a lot of knowledge from stochatics about how certain processes and probability distributions relate to each other is of great help. This is, put more generally, because stochastics contains and creates conceptual knowledge,in contrast to the procedural knowledgethat is often labeled „statistics“ in undergraduate classes.

2. Statistics is a toolbox.

a.) What does that mean?Often, statistics comes and is conveyed with a lot of authority attached to it. Procedures like Neyman-Pearson tests or maximum likelihood estimation are communicated to be the optimal way to decide or estimate** in certain situations, if not the only rational way**. Again, I think this is a terribly mistaken attitude. Statistics is a diverse set of knowledge, models, procedures, each of them more or less apt for all the special purposes you might come to think to use „statistics“ for. For a given problem, there are often many different ways of modeling it, again resulting in many different ways to proceed for solving your problem (i.e. extracting your information or making your decision). Hard and general criteria for what model or procedure is „better“ (what this means again depends on your specific purpose) rarely, or as I would say, never exist.

b.) Why? The main reason for this my belief is actually a consequence of dogma 1: Statistics is about modeling. And modeling involves people's decisions about howto model something for some specific purpose. As some philosopher's of science I greatly admire (e.g. Nancy Cartwright and Ian Hacking) point out, there are no generally optimal ways of modeling things. Some theorems in statistics tell us that given a certain formal situation and a specified loss or risk function, procedure xy is optimal in some sense. But when we apply stats to the real-world, there are no formal situations and which loss function to chose is our responsibility, in which no mathematical theorems help. Hence, optimality does not transfer. The experience about some model and/or procedure being a useful tool for certain types of situations, however, possibly does transfer.

A second reason stems from another, even more fundamental philosophical conviction of mine: Science in general is a toolbox. Science is not about or aimed at knowingstuff and being perfectly rational, it is about constructing ways of doingstuff that works. Most often it is a lot more pragmatic and less rational than claimed. Statistics, so to say, inherits these traits whenever it is used for science. When it is not, well, it is a mere tool almost by definition, not endowed with mysterious scientific or rational super-powers.

c.) Implications The most important consequence of this conviction I think regards how statistics is perceived from within the sciences: often, things are claimed to be proven by statistics. If not openly expressed in this wording, I still impute to many people the implicit belief that statistics is able to prove things, maybe even that only statistics is. It is clear that with toolboxes you don't prove things. Toolboxes might help you in solving specific problems. That is, with this dogma statistics loses a lot of its authority. It makes it debatable, like the choice of tools always is, and retransfers a lot of responsibility from the formalism to its users. That might sound like a bad thing to some people, but I'm actually convinced that it is a terribly good thing. It makes better science, and on a more general level, I believe, a better society.

A second consequence of this dogma is more internal to statistics: Embracing it makes statistics a more open field. Suddenly, a lot of what is otherwise classified as artificial intelligence, machine learning, bioinformatics, or even database theory or mathematical visualization, becomes part of statistics. Of course that is nothing but one of the core ideas of „Data Science“, and what I want to say here is that I like just this idea. A classical example, I guess, are support vector machines. They don't have a straight-forward interpretation in terms of probability and hence are seldomly considered part of „statistics“. Despite that, they are awesomely useful classifiers. So why not put them into the toolbox, since often enough classifying is just what you want to do? I think integrating all these tools into one disciplinary framework furthers dialogue and exchange of ideas, in the end leading to a better understanding of the existing tools, and more fruitful development of new tools. Continuing with the SVM example, since they do have a pretty straight-forward geometric interpretation, including them into statistics might point us to interpret also other statistical ideas from a geometrical angle (no pun intended), leading to interesting insights as e.g. regression or the entire theory of concentration of measure exemplifies.

3. What is processed in statistics is information, not numbers.

a.) What does that mean? I am inclined to formulate this dogma even more provocatively: Statistics is not quantitative. We should stop thinking of statistics as transforming numbers to other numbers, which then might be interpreted -- a last step of analysis often conceptualized to be outside of statistics. I think, on the contrary, we should understand statistics as transforming information in all its diverse forms. Information can be numeral, it can be logical, it can be graphical, it can be qualitative („this is somehow more like this than it is like that“) and much more. We should not forget that 1. we can process all this information statistically, not only things that come in form of numbers naturally, and 2. that what we process is only this information, not the additional structure that enters when we use e.g. real numbers to encode the information. What we often do in statistics, mostly for convenience and out of a habit, is that we encodeinformation numerically. But that does not mean that it is numerical. Quite the contrary, it rather means that the interpretation, the back decoding of the numbers, is nothing but a step in the information processing that I want to call „statistics“.

b.) Why? What is modeled, almost everywhere in and outside of the sciences, is not numbers, but structures, i.e. entities and their relations. This is because numbers are not out there in the world, but, just the contrary, they are one of our means of modeling things out there in the world. So what we do in modeling and research is process informationabout these things. Sometimes it might be that variables are very well measurable in real numbers, but only when we decide to do so in modeling. But taking the content to be that numbers, instead of it being more abstractly information, is nothing but a fundamental confusion.

c.) Implications There is one very general consequence of this dogma: It tells you to really make sure that you know where in the analysis you can interpret which aspects of the results (mostly encoded numerically) – the story of the different scales of measurement. It is repeated ever so often in teaching, but neglected in practice at least twice as often. To me the dogma even suggests as a remedy that we shouldn't represent non-numerical information numerically in all intermediate steps of processing, just to make sure we don't get confused. Why, after all, should you encode „smoker“ and „non-smoker“ as 0 or 1 if that doesn't give you any additional information on the data, apart from embedding it into a lot of structure you cannot sensibly interpret (e.g. that the distance between „smoker“ and „non-smoker“ is the same distance as that between 3 and 4, or even worse that „smoker“ times „non-smoker“ = „non-smoker“)? When I'm in the mood, I sometimes contend that this tendency to numerical encoding comes from an inferiority complex of the „soft“ sciences that makes them want to look more like „hard“ physics. My response should be an empowering one: dear soft sciences, you're cool enough by yourself and not in need of that cheap disguise!

I also interpret this dogma as an imperative to devote more attention to non-parametric methods, graphical modeling and approaches which make a lot less use of the structure of the real numbers than for example parametric regression does. We also should make sure we know what's going on in the theory of signal processing.

4. All research is exploratory.

a.) What does that mean? In traditional philosophy of science, often a distinction is made between the "context of discovery" and the "context of justification". That distinctions of contexts refers to there being different ways and norms of scientific behavior when "looking for hypothesis" and exploring the world as when "rigorously" "testing" or "establishing" claims. As you can tell from the amount of inverted commas, I believe this distinction to be mistaken all the way down to the concepts used in its definition. Not for "analytical", theoretical reasons, but because I think it presents a vastly distorted picture of the research process. Most of the time empirical research just behaves "exploratory". It is looking for phenomena or effects in all kinds of data that were generated before the formulation of a hypothesis or testing procedure. Only later on researchers modify the story they tell about how they arrived at certain conclusions, in order to meet the proclaimed standards of the "context of justification" -- like that of first formulating a very particular hypothesis which is then tested empirically in an experiment designed specifically for that purpose. Don't get me wrong: I don't think that violating these standards is a problem. I only believe it is a problem that we still proclaim certain rules of the game which are practically never obeyed. I think we should provide much more statistical tools that are designed to suit this "exploratory" use by scientists, instead of tools based on "rules" our community of users seems to have silently dismissed.

Maybe the previous paragraph requires another clarification: yes, I know that very often in empirical research you do have two "stages", one more exploratory and in psychology sometimes called "piloting", and another one, the "study", where you have a fixed experimental setup which you repeat for a number of times. I do not claim that there is no difference at all between those two stages. But I do claim than also the second stage is basically an exploratory one, much more than claimed: at the latest when analysing the results and writing up the paper, you will most probably change or reword your hypothesis, because now, in the face of the larger amount of data you have, a slightly different one looks more reasonable. You will also most probably play around with different methods of statistical analysis, and not a use one you defined a priori (I doubt you did that after all), independent of its results and the results of other procedures. Again, to make sure I'm not misunderstood: I do not think at all that this is a bad way to proceed. I only say that 1) we should be more honest about the way we proceed, and 2) we should develop statistical tools with that way of proceeding in mind, and not a weird and at the same time unrealistic conception that underlies the popular interpretations of so-called frequentist statistics.

b.) Why? The reason I'd give for that dogma is, so to say, empirical: Whenever I saw people doing empirical research, it was in fact exploratory in the above sense, and only later on couched into terms of hypothesis testing. Of course, we can wait for a black swan to appear, but for the moment I'd take the dogma to be a well working hypothesis, which can be well tested by letting it guide our design decisions for statistical research tools and just see what happens. As in empirical research.

Secondly, I am of course appealing to the critical arguments exchanged in the debate about the context distinction in the philosophy of science. I feel unable to discuss them in brief here, so I'll simply refer to the SEP article linked already above, and the book "Revisiting Discovery and Justification" by Schickore and Steinle (2006), of which I have just read the introduction, but which sounds interesting and rather comprehensive to me.

c. Implications Maybe this is the dogma which would have the most severe consequences if followed consistently. Most of traditional (non-Bayesian, that is to say) statistics is desgined to test pre-specified hypotheses with purposefully conducted experiments. The entire idea of hypothesis testing would be rendered dubious by that dogma. At the moment, I am indeed inclined to be that consistent and take these doubts serious. Hypothesis testing is, in its underlying assumptions, mind-blowingly complex, ambiguous and specific at the same time anyways. Why not throw it overboard and replace it with slicker methods? If these slicker methods do not yet exist, that's a good motivation to devote some energy to their development. Since hypothesis testing has been the driving force for statistical research for more than 50 years now, it would be interesting to see, which results derived in the course of that are still of relevance outside the framework. Is there, for example, any use of Student's t distribution apart from it being the sample distribution in a t test situation?

Of course, the idea of taking all research to be exploratory points somewhat into the direction of Bayesian belief updating. It also suggests to make the research process part of the models of the data generation, including and making explicit phenomena like researcher degrees of freedom, stopping rules and error accumulation in multiple comparisons. This is for sure not easy, but interesting, and some people like E.J. Wagenmakers actually have already started to do it.

5. There are situations when statistics is totally useless. Sometimes it is even dangerous.

a.) What does that mean? Sometimes I have the impression that it is an obligation in certain sciences, especially psychology, to perform quantitative, statistical analyses at all costs. Otherwise your results are not taken seriously by a large community of co-researchers and you're denied publication in mainstream journals. I don't think this is a pleasent state or development. There are other methodological perspectives and traditions which are just as useful tools for many situations. And sometimes these are even vastly superior for a certain purpose. We shoud not forget this and reflect a bit on the capabilities and limitations of statistical approaches very generally.

We should also not forget that using exclusively statistical reasoning in decision making can do serious harm to people. For some cultural reasons, quantitative methods have gained a strong authority. Much is believed as long as it is presented together with some numbers and in particular p-values. And most often where there is authority, there is the danger of its abuse. Also, some core ideas of statistical theory are inherently discriminating and not a good idea to use when dealing with persons rather than, uhm, inanimate carbon rods.

b.) Why? First, when is satistics useless? Much of physics barely uses hypothesis testing or other decidedly statistical methods, while they use differential equation and stochastic modeling all over the place. Not to speak of the humanities, in which I guess a lot of progress would not have been possible if people had cared about how to quantify or "formalize" their experiences, impressions and theories. Of course, this also motivates an analysis of why statistical methods are not used here, what their perceived shortcomings are, and what alternative methods could be devised to fit the needs. This is related to the fourth dogma, I think, of statistics not being exclusively "quantitative".

Second, when is statistical thinking dangerous? I guess, racial profiling is an almost canonical example here. It shows, to me, that is does not suffice to reason internally to statistics only, and conclude from data on previous (and worse: subjective!) experience that the expected value of such and such loss function is maximized when using such and such decision rule. We need to think beyond this and reflect also what the use of this decision rule does with a society and its member individuals. This, put formally, cannot be simply introduced into the loss function, since a loss function does not evaluate decision rules and their general properties like systematic bias, but only point-wise decisions, weighted with their probability. Thinking in terms of expected values is the crucial decision here, which leads to missing important general consequences of the own behavior and decisions.

c.) Implications We should treat alternative approaches with respect rather than make fun of them, as is often done within the "hard", "quantitative" sciences. I think good statisticians should even have a basic idea of non-quantitative methods, in order to consult other experts when confronted with problems beyond the limits of sensible applicability of state-of-the-art statistics. Returning to the tool metaphor, what's the use of having a powerful tool in hands when being misguided about when is a good situation to use it and when not? And if the attributed power to this tool makes you throw away all other tools you have used successfully before?

Also, as it you can tell from the overtone the "why?" paragraph carried: I think we should drop the dogma of scientific neutrality (which is based on the illusion of scientific objectivity anyways) and always also reflect on what behavior and structures will receive support from our results and whether these are consequences we want. In particular, be careful whenever connecting numbers and probabilities to persons, and whenever reasoning exclusively on the basis of expected or modal values.

Now, that was a lot of stuff I said. What do you think?