tidystats taxonomy


The purpose of tidystats is to take multiple statistical tests, extract their statistics, and organize the statistics into a file. It is the organization of the statistics that is probably the most difficult and opinionated aspect of tidystats. Below I show what I’ve come up with. Please note that this is a work in progress and that it might change when I add support for new statistical tests (or after receiving feedback on the taxonomy).


A collection of statistics (e.g., t-value, p-value) belonging to a test. Every test will have Statistics. Sometimes a test only has one set of Statistics (e.g., a t-test) and other tests have multiple sets of Statistics. For example, the Statistics of a regression analysis done with lm() consists of the model fit Statistics and the Statistics belonging to each Term in the model. Statistics have a Name to help indicate what the statistics refer to.


This category is a bit of a general category that refers to different components depending on the exact test. For example, in a regression analysis it refers to the coefficients, but in an ANOVA test of multiple models, it refers to the models being compared. A Term consists of a Name and a set of Statistics.


Refers to Statistics consisting of two variables, such as correlations and covariances. However, this is only used when the number of variables can be more than 2, such as in multilevel models or when you tidy a correlation matrix. If you simply run a correlation test (e.g., cor.test()) and tidy the results, you will get a list of Statistics, but not a list of Statistics including a Pair. The reason for this category to exist is to tidy correlation or covariance matrices. Pairs have two Names and a set of Statistics.


Groups is also a rather broad category. It can refer to grouped descriptives (e.g., created with describe_data()), to within-subjects components in ANOVA, or to Groups in the Random Effects of a multilevel model. A Group has a Name and can consist of Statistics, Terms, and/or Pairs.


Effects is a category for multilevel models, which consists of Random Effects and Fixed Effects. Random Effects consists of Groups and Fixed Effects consists of Terms and Pairs.


Sometimes tests return multiple Models, each containing Statistics. This category contains those models. Currently it is only used when tidying certain tests from the BayesFactor package.


The name provides information about what the statistics refer to. It can be the name of a set of Statistics, a Term, a Group, and so on. Pairs have two names.