Dealing With Maths Accessibility, Part 3

Doing Statistics

There are some lessons we learn later in life, one being that most of the stuff you do in school — though it appears to be irrelevant — proves to be handy when you are an adult. One such skill is maths:

  • We learn it as a game,
  • Enjoy it in solving puzzles (as algebra),
  • Find it even when buying clothes.

But there is a practical side to maths which we are going to discuss in this post: that of using it for research.

In this post, I hope to be short, (hmm, I remember in the previous post making this same promise but failing it anyway!) But doing statistics on your computer is easy.

Using Statistical Software

While the skills we learnt in the previous two posts on reading and writing it are useful in themselves, particularly handling general maths, yet statistics has its own rules of the game that you have to know.

We will not be doing statistics in this post, but demonstrating its computation using the R Language. So having laid the rules, we need to answer this question: How can a blind person do statistics using a computer?

The answer is the same for anyone, even without blindness. He or she does it using two things:

  1. His or her own competence in analysing data; and
  2. With the aid of an appropriate statistical program.

Statistical computing is concerned with development and use of specialised programs for aiding in calculation of statistics.

Thus, the first skill of being able to gather data that answer your research question, understanding what you are looking for, and interpreting any output from your statistical program is entirely your domain. A program can tell you the regression coefficient, a P-Value, a result of running this and that test — but at the end of the day, you have to know how to interpret those and putting them in the final report.

As indicated in the previous post on writing maths, the job of understanding lies with you. The program is there to help you calculate statistics and serve you much time.

In the mainstream statistics, there are some popular packages such as IBM SPSS, SAS and Stata Unfortunately, personally, I never tried them – except of course for SPSS when I was investigating its accessibility some two years ago. However, you can find Dr. Jonathan Godfrey’s review of these and other programs here in the R Journal enlightening.

Thus, I will be recommending the use of one statistical package which has gained the scientific world with storm: the R Statistical Software.

What is R?

R is one of the statistics programs around, and is a descendant of the S program. Thus between S and R, we are talking of about 44 years!

R is just like LaTeX, it works through the command-line. Of course, you can install a program like RStudio which comes either as a desktop or server version. The desktop version, while mostly accessible when it comes to menus and some editing windows, is not yet fully accessible with its dialog boxes. You may consider running the RStudio Server edition which uses your browser to display its controls. Both versions are free.

In the meantime, let’s talk about R on the command-line, and this discussion of the RStudio will be for another day. So,

  • Download R From this page - Install it the usual way of selecting the downloaded file, running it and following the prompts.
  • Then launch it from your desktop.

After launching it, it will first display some notices, copyrights and credits. This will be followed by its prompt indicated by >, meaning that you can start entering your expressions.

Expressions and data Entry

Like all statistical programs, with R you are working with data. This can be in the form of values you enter, data you load from files on your computer or fetch from the Internet. R uses functions to handle this.

However, for you to be familiar with the R interface, you can play around on the command line like this:

> 3+6
[1] 9

This means that after you type “3+6” after its > prompt, and press enter, it gives you the feedback as 9.

But note that prior to that answer, there is a number in square brackets, []. This says, this is the first number produced.

Why? Because statistical programs deal with a series of values which it produces. For the sake of space here, I would say, if it produces a number, it indexes its position in the output on that line. Meaning?

  • That if you have one output, it is simply indexed as [1] and that’s it.
  • And in case it produces many values (as is the case most of the time), the first output on the line shows the index of the value at that point. For instance, if we have an expression which brings more than 9 values, then
    1. If one line accommodates 5 values, the first line will have values 1 to 5. So the first line will have [1] as the output.
    2. Then the second line will have values 6 to 9. The index in brackets will say, [6] to remind you that the index is now on the sixth position on this line.

This brings me to the issue of data entry in R. Entering data is easy as the assumption is that you enter more than one value per any variable you create. R uses vectors to handle many values. You use the c() function (for combine. For example:

x <- c(1,4,9,12,15)

We entered five values here in the c() function.

If you know Excel, the way you enter functions in it, is the same way you will do in R:

  • Just remember the name of the function you want.
  • Write it and pass the appropriate number of arguments.
  • The function will do all the work and you get the result.

In the above example, we entered raw data in the program. However, we can load data which is saved in a file.

For example, to load a CSV data file, called covid19-stats.csv, do the following while R is open:

  • type: read.csv('covid19-stats.csv') and press ENTER.
  • You can save the results from your file into a new variable. This variable is handy when you will need to use it in other operations like plotting. For instance,
x <- read.csv('covid19-stats.csv')

This way,

  1. You created a variable you named x.
  2. Then you read in the file contents from covid19-stats.csv into that variable.
  3. The <- symbol is for assigning the value at its righthand to the variable name on the lefthand.

Plotting

When you need to display your data in a graphical form, we say you are plotting it. This is often done when one wants to quickly get the visual layout of the data.

You can plot the variables. To plot, yoou need to consider what type of variables you have and the final graph you wish to have. You plot using the plot() function. To get a histogram, you use the hist() function.

To demonstrate plotting, we will use some data that comes already with your R program. This data is important as it was collected by other statisticians. We can use it for educational purposes.

We are going to use the one called the mtcars which has some values regarding cars.

So if we need to plot the weight of the cars, we may choose a histogram which shows how the weight variable is distributed. As it is a continuous variable, a histogram may be appropriate. So we say:

hist(mtcars[,'wt'], main = "Car Weights")

In the above case, I passed two parameters to the hist() function: the data in the mtcars. From that variable, I extracted the weight column ‘wt’ and indexed it within brackets. I then stated the label that must be printed with that histogram.

The final graph is shown in the figure below:

Car Weights Histogram

To plot a scatterplot,1 we use the plot() function. The plot fufnction can be used to draw a variety of graphics. Its output is mainly dependent on the parameters passed to it and the variables it will be plotting.

So if we want to plot a scatterplot so that two variables may be seen how they relate to each other, we will still use the mtcars data we used above. This time though, I need to see how weight affects a car’s mile per gallon value (recorded as ‘mpg’ in the mtcars dataset.)

So this time, I have to pass two variables — one plotted on the x-axis while the other plotted on the y-axis. I will have the weight on the x-axis and the mpg on the y-axis. However, to simplify matters, I will just pass the two columns I need as a vector inside the mtcars index.

cars <- mtcars[,c('wt','mpg')]
plot(x = cars$wt, y = cars$mpg, xlab = "Weight", ylab = "Mileage", main = "Weight versus Mileage")

Every time you graph like this, after these operations, R will bring up in a separate window the plot. It is displayed in the R Graphics: Device 2program.

You can save this graph by going to the File menu of the R Graphics Viewer and pressing DOWN ARROW, and choose SAVE.

You can save in PNG, JPG, BMP, SVG or a number of other graphic formats.

After the last plotting, I got this scatterplot, withh my screen-reader even reading the title I passed as the value of the “main” argument:

A Cars’ Weights vs Mileage Scatterplot

You can then link this final plot into your report for sharing with others.

More?

There is a lot you can do with R. This was just a scratch of the surface and in the interest of space and time, I could not discuss all its possibilities. The key here was to show that even as a blind person, you can do statistical analysis using mainstream applications like R.

Even though I discussed R, as it is the one that I am familiar with, I understand that SAS is another accessible statistical package. You can even use its web interface which you can access in Chrome when you install an addon. You can plot and enter data. So you may need to invest your time in learning it. The advantage of SAS is that you may not need to do a lot of coding or work on the command-line.

With R though, you can add its functionality by installing packages. But for now, let us conclude this session.

Conclusion

R as a statistical package is simple. Just invest your time in learning not the program, but the principles of statistics themselves to enjoy it. To learn more about R, I recommend that you visit this excellent tutorial by Dr. Godfrey specially designed for blind people.

I hope you enjoyed reading this series of posts just as I enjoyed writing it. In the meantime, happy calculating!


  1. A scatterplot is used when you want to see the relationship of two variables. ↩︎

Related