# Five Dogmas of Statistics

29.3.2014This is a programmatic post, compiling some of my most fundamental personal beliefs about statistics. To all of its content I of course appreciate any comments and discussion, despite the partially polemic manifesto style that kept me having fun writing it.

The text is structured as follows: It states five dogmas:

- Modeling is at the core of statistics.
- Statistics is a toolbox.
- What is processed in statistics is information, not numbers.
- All research is exploratory.
- There are situations when statistics is totally useless. Sometimes it is even dangerous.

For all of these claims, I am discussing a.) what they mean to me, b.) why I believe they're right and c.) what their general and practical implications would be. I am not claiming that all these thoughts are particularly new or ground-breaking, nor are the thoughts on them presented here meant to be exhaustive. I do think, however, that they should receive more attention and discussion in the daily business of statistics. That's why I am posting this. Apart from the fact that I think it's not bad to have such a programmatic positioning as one of the first entries in a new blog.

Oh, and a quick meta-note: in the entire blog and especially in this article some text is not black but gray. These are mostly additonal thoughts, arguments or illustrations which I find interesting, but are not necessary for the core points of the post. They can be safely skipped if you don't feel like reading tons of text today.

## 1. Modeling is at the core of statistics (not inference or estimation).

**a.) What does that mean?**When people in research-oriented studies
like social sciences have their first encounter with statistics, it is
usually in a class that starts with teaching them practical procedures:
a method for estimating the mean of a normal, or how to check for a
group difference with a t-test. That is, the usual way of teaching
statistics makes students conceptualize it as a set of procedures. The
**models** are discussed somewhat stepmotherly, often in terms of lists
of conditions under which such and such procedure can be applied. I
firmly believe that this is just the reverse of a sensible ordering. I
think 1. logically primary and 2. the more important skill to convey is
the grasping of a situation or structure as a data-generating mechanism,
which is tried to cast into a formal model. A model means a decision
about what variables and relations are of interest right now, how these
relations could look like in general, and what aspects of the situation
are neglected inspite of better knowledge. **Only then** comes the
second step of thinking about how to use this model to extract certain
information from the data, i.e. estimate a rate or mean parameter or
check whether two variables influence each other in a relevant manner.
That is for example, a t-test should not be conceptualized and taught as
a method for finding a group difference. You should start thinking about
the situation of interest, which might be reasonably modeled as
independent drawing from two different normal distribution. Having
identified this, it turns out in the second step that using a t-test
might be a way to answer your target question. Of course, this second
step might involve backtracking and correcting the model, in order to
make certain tools applicable, since everywhere in science, the models
you build are also influenced by the tools you have available and I
don't think this a bad thing.

**b.) Why?** Although it sounds theoretical, this is a pragmatic dogma.
I believe the best reason for it is that embracing it makes you do more
useful statistics. Because you start out with getting into the
real-world situation you are confronted with, and not trying to find one
procedure from your always limited repertoire of procedures that fits
best/least badly. This, I think, results in more adequate models of the
situations, in turn enabling you to better extract information from the
generated data, i.e. do better estimation and testing. The downside is
that you might have constructed a model for which you don't know any or
there are no established procedures. But then you can still go on
research or develop them. Or modify your model slightly to be able to
use technology you know for a related model -- which gives you, if you
do it this way round, a clear feeling for where the assumptions in your
model deviate from what you actually think the situation is, a valuable
interpretation in interpreting analysis results and judging their
credibility. Also, thanks to computers and the huge progress made in
numerical as well as Monte Carlo techniques, nowadays immensely more
models are tractable then a few decades ago -- including millions of
situation-specific models nobody has ever set-up, used or analyzed
before.

**c.) Implications** An important factual consequence of people embracing
this dogma is that they will create and use more domain-specific, in
total more diverse models. They will stop assuming everything to be
normally distributed, and in general use less often standard procedures
like a t-test. These procedures have become so much of a default that it
sometimes seems like nobody actually remembers the model behind this
approach -- you are claiming that your data were generated by drawing
independent samples from two normally-distributed variables.

An important normative consequence of the dogma is that it means that
statistics cannot be done without stochastics. In order to model
situations properly, a lot of knowledge from stochatics about how
certain processes and probability distributions relate to each other is
of great help. This is, put more generally, because stochastics contains
and creates **conceptual knowledge,**in contrast to the **procedural
knowledge**that is often labeled „statistics“ in undergraduate classes.

## 2. Statistics is a **toolbox**.

**a.) What does that mean?**Often, statistics comes and is conveyed with
a lot of authority attached to it. Procedures like Neyman-Pearson tests
or maximum likelihood estimation are communicated to be **the optimal
way to **decide** or **estimate**** in certain situations, if not **the
only rational way**. Again, I think this is a terribly mistaken
attitude. Statistics is a diverse set of knowledge, models, procedures,
each of them more or less apt for all the special purposes you might
come to think to use „statistics“ for. For a given problem, there are
often many different ways of modeling it, again resulting in many
different ways to proceed for solving your problem (i.e. extracting your
information or making your decision). Hard and general criteria for what
model or procedure is „better“ (what this means again depends on your
specific purpose) rarely, or as I would say, never exist.

**b.) Why?** The main reason for this my belief is actually a
consequence of dogma 1: Statistics is about modeling. And modeling
involves people's decisions about **how**to model something for some
specific purpose. As some philosopher's of science I greatly admire
(e.g. Nancy Cartwright
and Ian Hacking) point out, there are no
generally optimal ways of modeling things. Some theorems in statistics
tell us that given a certain **formal** situation and a specified loss
or risk function, procedure xy is optimal in some sense. But when we
apply stats to the real-world, there are no **formal** situations and
which loss function to chose is our responsibility, in which no
mathematical theorems help. Hence, **optimality** **does not transfer**.
The experience about some model and/or procedure being a useful tool for
certain types of situations, however, possibly does transfer.

A second reason stems from another, even more fundamental philosophical
conviction of mine: Science in general is a toolbox. Science is not
about or aimed at **knowing**stuff and being perfectly rational, it is
about constructing ways of **doing**stuff that works. Most often it is a
lot more pragmatic and less rational than claimed. Statistics, so to
say, inherits these traits whenever it is used for science. When it is
not, well, it is a mere tool almost by definition, not endowed with
mysterious scientific or rational super-powers.

**c.) Implications** The most important consequence of this conviction I
think regards how statistics is perceived from within the sciences:
often, things are claimed to be **proven** by statistics. If not openly
expressed in this wording, I still impute to many people the implicit
belief that statistics is able to prove things, maybe even that **only**
statistics is. It is clear that with toolboxes you don't prove things.
Toolboxes might help you in solving specific problems. That is, with
this dogma statistics loses a lot of its authority. It makes it
debatable, like the choice of tools always is, and retransfers a lot of
responsibility from the formalism to its users. That might sound like a
bad thing to some people, but I'm actually convinced that it is a
terribly good thing. It makes better science, and on a more general
level, I believe, a better society.

A second consequence of this dogma is more internal to statistics: Embracing it makes statistics a more open field. Suddenly, a lot of what is otherwise classified as artificial intelligence, machine learning, bioinformatics, or even database theory or mathematical visualization, becomes part of statistics. Of course that is nothing but one of the core ideas of „Data Science“, and what I want to say here is that I like just this idea. A classical example, I guess, are support vector machines. They don't have a straight-forward interpretation in terms of probability and hence are seldomly considered part of „statistics“. Despite that, they are awesomely useful classifiers. So why not put them into the toolbox, since often enough classifying is just what you want to do? I think integrating all these tools into one disciplinary framework furthers dialogue and exchange of ideas, in the end leading to a better understanding of the existing tools, and more fruitful development of new tools. Continuing with the SVM example, since they do have a pretty straight-forward geometric interpretation, including them into statistics might point us to interpret also other statistical ideas from a geometrical angle (no pun intended), leading to interesting insights as e.g. regression or the entire theory of concentration of measure exemplifies.

## 3. What is processed in statistics is information, not numbers.

**a.) What does that mean?** I am inclined to formulate this dogma even
more provocatively: **Statistics is not quantitative**. We should stop
thinking of statistics as transforming numbers to other numbers, which
then might be interpreted -- a last step of analysis often
conceptualized to be outside of statistics. I think, on the contrary, we
should understand statistics as transforming **information** in all its
diverse forms. Information can be numeral, it can be logical, it can be
graphical, it can be qualitative („this is somehow more like this than
it is like that“) and much more. We should not forget that 1. we can
process all this information statistically, not only things that come in
form of numbers naturally, and 2. that what we process is **only this**
**information**, not the additional structure that enters when we use
e.g. real numbers to encode the information. What we often do in
statistics, mostly for convenience and out of a habit, is that we
**encode**information numerically. But that does not mean that it **is**
numerical. Quite the contrary, it rather means that the interpretation,
the back decoding of the numbers, is nothing but a step in the
information processing that I want to call „statistics“.

**b.) Why?** What is modeled, almost everywhere in and outside of the
sciences, is not numbers, but structures, i.e. entities and their
relations. This is because numbers are not out there in the world, but,
just the contrary, they are one of our **means** of modeling things out
there in the world. So what we do in modeling and research is process
**information**about these things. Sometimes it might be that variables
are very well measurable in real numbers, but only when we decide to do
so in modeling. But taking the content to be that numbers, instead of it
being more abstractly information, is nothing but a fundamental
confusion.

**c.) Implications** There is one very general consequence of this
dogma: It tells you to really make sure that you know where in the
analysis you can interpret which aspects of the results (mostly encoded
numerically) – the story of the different scales of measurement. It is
repeated ever so often in teaching, but neglected in practice at least
twice as often. To me the dogma even suggests as a remedy that we
shouldn't represent non-numerical information numerically in all
intermediate steps of processing, just to make sure we don't get
confused. Why, after all, should you encode „smoker“ and „non-smoker“ as
0 or 1 if that doesn't give you any additional information on the data,
apart from embedding it into a lot of structure you cannot sensibly
interpret (e.g. that the distance between „smoker“ and „non-smoker“ is
the same distance as that between 3 and 4, or even worse that „smoker“
times „non-smoker“ = „non-smoker“)? When I'm in the
mood, I sometimes contend that this tendency to numerical encoding comes
from an inferiority complex of the „soft“ sciences that makes them want
to look more like „hard“ physics. My response should be an empowering
one: dear soft sciences, you're cool enough by yourself and not in need
of that cheap disguise!

I also interpret this dogma as an imperative to devote more attention to non-parametric methods, graphical modeling and approaches which make a lot less use of the structure of the real numbers than for example parametric regression does. We also should make sure we know what's going on in the theory of signal processing.

## 4. All research is **exploratory**.

**a.) What does that mean?** In traditional philosophy of science, often
a distinction is made between the "context of discovery" and the "context of justification".
That distinctions of contexts refers to there being different ways and
norms of scientific behavior when "looking for hypothesis" and exploring
the world as when "rigorously" "testing" or "establishing" claims. As
you can tell from the amount of inverted commas, I believe this
distinction to be mistaken all the way down to the concepts used in its
definition. Not for "analytical", theoretical reasons, but because I
think it presents a vastly distorted picture of the research process.
Most of the time empirical research just behaves "exploratory". It is
looking for phenomena or effects in all kinds of data that were
generated **before** the formulation of a hypothesis or testing
procedure. Only later on researchers modify the story they tell about
how they arrived at certain conclusions, in order to meet the proclaimed
standards of the "context of justification" -- like that of first
formulating a very particular hypothesis which is then tested
empirically in an experiment designed specifically for that purpose.
Don't get me wrong: I don't think that violating these standards is a
problem. I only believe it is a problem that we still proclaim certain
rules of the game which are practically never obeyed. I think we should
provide much more statistical tools that are designed to suit this
"exploratory" use by scientists, instead of tools based on "rules" our
community of users seems to have silently dismissed.

Maybe the previous paragraph requires another clarification: yes, I know that very often in empirical research you do have two "stages", one more exploratory and in psychology sometimes called "piloting", and another one, the "study", where you have a fixed experimental setup which you repeat for a number of times. I do not claim that there is no difference at all between those two stages. But I do claim than also the second stage is basically an exploratory one, much more than claimed: at the latest when analysing the results and writing up the paper, you will most probably change or reword your hypothesis, because now, in the face of the larger amount of data you have, a slightly different one looks more reasonable. You will also most probably play around with different methods of statistical analysis, and not a use one you defined a priori (I doubt you did that after all), independent of its results and the results of other procedures. Again, to make sure I'm not misunderstood: I do not think at all that this is a bad way to proceed. I only say that 1) we should be more honest about the way we proceed, and 2) we should develop statistical tools with that way of proceeding in mind, and not a weird and at the same time unrealistic conception that underlies the popular interpretations of so-called frequentist statistics.

**b.) Why?** The reason I'd give for that dogma is, so to say,
empirical: Whenever I saw people doing empirical research, it was in
fact exploratory in the above sense, and only later on couched into
terms of hypothesis testing. Of course, we can wait for a black swan to
appear, but for the moment I'd take the dogma to be a well working
hypothesis, which can be well tested by letting it guide our design
decisions for statistical research tools and just see what happens. As
in empirical research.

Secondly, I am of course appealing to the critical arguments exchanged in the debate about the context distinction in the philosophy of science. I feel unable to discuss them in brief here, so I'll simply refer to the SEP article linked already above, and the book "Revisiting Discovery and Justification" by Schickore and Steinle (2006), of which I have just read the introduction, but which sounds interesting and rather comprehensive to me.

**c. Implications** Maybe this is the dogma which would have the most
severe consequences if followed consistently. Most of traditional
(non-Bayesian, that is to say) statistics is desgined to test
pre-specified hypotheses with purposefully conducted experiments. The
entire idea of hypothesis testing would be rendered dubious by that
dogma. At the moment, I am indeed inclined to be that consistent and
take these doubts serious. Hypothesis testing is, in its underlying
assumptions, mind-blowingly complex, ambiguous and specific at the same
time anyways. Why not throw it overboard and replace it with slicker
methods? If these slicker methods do not yet exist, that's a good
motivation to devote some energy to their development. Since hypothesis
testing has been the driving force for statistical research for more
than 50 years now, it would be interesting to see, which results derived
in the course of that are still of relevance outside the framework. Is
there, for example, any use of Student's t distribution apart from it
being the sample distribution in a t test situation?

Of course, the idea of taking all research to be exploratory points somewhat into the direction of Bayesian belief updating. It also suggests to make the research process part of the models of the data generation, including and making explicit phenomena like researcher degrees of freedom, stopping rules and error accumulation in multiple comparisons. This is for sure not easy, but interesting, and some people like E.J. Wagenmakers actually have already started to do it.

## 5. There are situations when statistics is totally useless. Sometimes it is even dangerous.

**a.) What does that mean?** Sometimes I have the impression that it is
an obligation in certain sciences, especially psychology, to perform
quantitative, statistical analyses at all costs. Otherwise your results
are not taken seriously by a large community of co-researchers and
you're denied publication in mainstream journals. I don't think this is
a pleasent state or development. There are other methodological
perspectives and traditions which are just as useful tools for many
situations. And sometimes these are even vastly superior for a certain
purpose. We shoud not forget this and reflect a bit on the capabilities
and limitations of statistical approaches very generally.

We should also not forget that using exclusively statistical reasoning
in decision making can do serious harm to people. For some cultural
reasons, quantitative methods have gained a strong authority. Much is
believed as long as it is presented together with some numbers and in
particular *p*-values. And most often where there is authority, there is
the danger of its abuse. Also, some core ideas of statistical theory are
inherently discriminating and not a good idea to use when dealing with
persons rather than, uhm, inanimate carbon rods.

**b.) Why?** First, when is satistics useless? Much of physics barely
uses hypothesis testing or other decidedly statistical methods, while
they use differential equation and stochastic modeling all over the
place. Not to speak of the humanities, in which I guess a lot of
progress would not have been possible if people had cared about how to
quantify or "formalize" their experiences, impressions and theories. Of
course, this also motivates an analysis of why statistical methods are
not used here, what their perceived shortcomings are, and what
alternative methods could be devised to fit the needs. This is related
to the fourth dogma, I think, of statistics not being exclusively
"quantitative".

Second, when is statistical thinking dangerous? I guess, racial
profiling is an almost canonical example here. It shows, to me, that is
does not suffice to reason internally to statistics only, and conclude
from data on previous (and worse: subjective!) experience that the
expected value of such and such loss function is maximized when using
such and such decision rule. We need to think beyond this and reflect
also what the use of this decision rule does with a society and its
member individuals. This, put formally, cannot be simply introduced into
the loss function, since a loss function does not evaluate decision
rules and their general properties like systematic bias, but only
point-wise **decisions**, weighted with their probability. Thinking in
terms of expected values is the crucial decision here, which leads to
missing important general consequences of the own behavior and
decisions.

**c.) Implications** We should treat alternative approaches with respect
rather than make fun of them, as is often done within the "hard",
"quantitative" sciences. I think good statisticians should even have a
basic idea of non-quantitative methods, in order to consult other
experts when confronted with problems beyond the limits of sensible
applicability of state-of-the-art statistics. Returning to the tool
metaphor, what's the use of having a powerful tool in hands when being
misguided about when is a good situation to use it and when not? And if
the attributed power to this tool makes you throw away all other tools
you have used successfully before?

Also, as it you can tell from the overtone the "why?" paragraph carried:
I think we should drop the dogma of scientific neutrality (which is
based on the illusion of scientific objectivity anyways) and always also
reflect on what behavior and structures will receive support from our
results and whether these are consequences we want. In particular, be
careful whenever connecting numbers and probabilities to **persons**,
and whenever reasoning exclusively on the basis of **expected or modal
values**.

Now, that was a lot of stuff I said. What do you think?