# Thinking About the P-Value

An important property of any research is not only how it was carried out by the researcher, but how as a consumer you read it. In this post, I am going to reflecting on what the P-Value is, and its implication when reading journal articles. You want to know whether the researcher puts faith in his or her report, before you can say anything about it.

## Introducing P-Values and Research

First, **What is a P-Value and what is its place in reporting research?**

To answer this, I think I need to start at the foundations of every research. Research is made possible because of a lot of questions we have. These questions are generated not only through our subjects of interest, but even in our daily interactions outside the academia. This is seen by the level of curiosity we have when it comes to news:

- Gossip,
- Offline and online News,
- Social media,
- In fact, any source that promises to tell us what we did not know initially. That which promises to give us answers.

So we have to look for answers: real answers based on facts. We want to know how our world is.

Second, for this to happen, we have to have a way to gather information, analyse it and report it.

this is where the field of statistics comes in: to help us correctly interpret what we have collected and report it to the world. Statistics in turn is based on the mathematical branch of probability.

- Will it be sunny tommorrow?

- Probably, yes.
- Based on the previous experiences, and times of the year, it should be sunny tomorrow.
- Based on this week’s patterns, it is likely to be cloudy.

The answers presented above have to do with probability. So *answers generated through research are based on probability.* I will reflect on why we are never 100% certain some other time, but any truth you might have is subject to revision when fresh information comes in.^{1}

**But why are we not fully certain, and how confident then are we in our findings?** Because statistics is made possible on practical considerations:

- You can only make research based on a subset of the population or group of interest;
- You want to learn about the whole population from that subset (also known as a sample);
- Most importantly, you want to generalise the results you got to the whole population.

Okay, Now back to the P-Value? what does it mean?

A P-value is the **probability level** you report. It means that you want to be sure, but then your certainty in your results is this level **provided that a null hypothesis is true.**

And this brings us to the next question on hypotheses. When you are researching, you do so guided by a set of assumptions about the population you are researching.

If you assume that the population has a particular characteristic in certain quantities or measurements, then you start off with a hypothesis which states that measurement. That hypothesis is known as a null hypothesis. As a researcher, you want to find out more: you are never content until you learn more about that population. You do not want to settle on what the null hypothesis says.

Now, you collect the information and analyse it. You find out something else other than what the null hypothesis was saying. For example, you start off believing that the world is flat, because that is what everyone else is saying.

But then you measure the world and discover that it is not flat. How do you report your results.

You have to think like this:

- Given that the null hypothesis is true, then
- Based on the outcome of my findings, the P-Value is 0.002 that the world is flat.

Or put in another way: “Based on what I found out during my research, given what the null hypothesis states, the P-Value is so and so.” The P-Value shows the degree of inconsistency between your research findings and the null hypothesis. If the null hypothesis were true, your results show the level of inconsistency with that null hypothesis.

The lower the P-Value, the more difficulty it is to uphold the null hypothesis. This means that a P-value means to place faith in the null hypothesis **up to such and such level** expressed in probability or proportion.^{2}

The lower the P-Value, the lower the faith you have to put in the null hypothesis. The higher, the better. Ah, it means that your results are telling a story which, when if the null hypothesis were true, then we have to place our faith in it, (not your research), up to a given proportion stated as a P-Value.

### Why a Null Hypothesis?

A null hypothesis is important in justifying a research. It shows us that we should never be contented with handed down wisdom, whether it came down through other researchers, popular beliefs or whichever source. The purpose of research is seen in how people systematically unearth facts to have the whole truth about the world they live in.

So a null hypothesis provides us a starting point, but your findings help us understand a phenomenon better. The level of faith we place in such results is seen in the reported P-Value. For that value to make sense, it is recommended that prior to any research, researchers report their level of significance. This means that they have to stand by their reported level even if the P-Value says something else. We will talk about P-Hacking in some other post.

## Key Takeaways and Conclusion

In conclusion, I know that the explanation of a P-Value I gave in this post was simplistic, but I trust that it makes sense enough for anyone to start interpreting written research.

The key takeaways from this post are as follows:

- A P-value is a measurement of the level of faith you must place in an initial hypothesis (the null hypothesis) based on what you discovered from your research;
- It is expressed as a probability value. You can think of it as saying: “In case your null hypothesis is true,
*based on your findings, however*it’s probability level of truth is this proportion.” - A P-Value is compared to the significance level you set up for yourself at the start of the research;
- In the event that your significance level is higher than the P-Value, you can reject your null hypothesis.

In future posts, I will revisit this topic when I will address the null hypothesis testing in detail and its bearing on reports we consume on a daily basis.

There is a branch of statistics based on this reasoning known as the Bayesian Statistics by the way. This is unlike the frequentist statistics we are taught at school where probability is based on the long-run results. With Bayesian statistics, we have the prior and the posterior beliefs. ↩︎

Probability values are expressed between 0 and 1. However, you can convert a probability value, or proportion to a percentage by simply multiplying the result by 100. Or just move the decimal point in a proportion two places to the right, and you have your percentage. ↩︎