The tidystats
package is designed to address two
problems common in scientific research: incomplete and incorrect
statistics reporting. The problem of incomplete statistics reporting
likely stems from a fundamental trade-off between wanting to be
comprehensive on the one hand and providing a clear narrative on the
other. The problem of incorrect statistics reporting is likely caused by
manually copy-pasting statistical output from the output window into a
text editor. tidystats
addresses these two problems by
enabling researchers to combine their statistical analyses into a single
file, from which a subset of the analyses can then be reported using a
Microsoft Word add-in.
tidystats
is designed to easily fit in your data
analysis workflow. In fact, tidystats
can simply be tacked
on at the end of a data analysis session, with only one minor
requirement. This requirement is that all analyses are stored in a
variable. For example, if you run a regression analysis using the
lm()
function, you store the result of that analysis in a
variable: model1 <- lm(extra ~ group, data = sleep)
. By
storing each analysis in a variable, you can later add each analysis to
a list using the add_stats()
function from
tidystats
. Once all the analyses are gathered together, you
save the analyses to a .json file using the write_stats()
function. This .json file can then be read by a Word add-in to report
your analyses in Word, or shared with others and read into R to extract
statistics from your analyses (e.g., for meta-analyses).
Below follows an example of a few analyses conducted on the
quote_source
data contained within the
tidystats
package. The data is from a large-scaled
replication of Lorge & Curtiss (1936). More details can be found in
the paper of the replication effort (Klein et al., 2014). In short,
participants saw the following quote:
“I have sworn to only live free, even if I find bitter the taste of death.”
The quote was attributed to either George Washington, a liked individual, or Osama Bin Laden, a disliked individual. Participants were asked to what extent they agree with the quote, on a 9-point Likert scale ranging from 1 (disagreement) to 9 (agreement).
We start with a bit of setup.
# Load packages
library(tidystats)
library(dplyr)
# Load data
<- tidystats::quote_source data
The main hypothesis is that people will like the quote more when it is attributed to George Washington compared to Osama Bin Laden. We test this hypothesis by first looking at the descriptives and then by conducting a t-test.
<- data %>%
descriptives group_by(source) %>%
describe_data(response, short = TRUE)
descriptives
var | source | N | M | SD |
---|---|---|---|---|
response | Bin Laden | 3083 | 5.23 | 2.11 |
response | Washington | 3242 | 5.93 | 2.21 |
<- t.test(response ~ source, data = data)
t_test
t_test#>
#> Welch Two Sample t-test
#>
#> data: response by source
#> t = -13, df = 6323, p-value <2e-16
#> alternative hypothesis: true difference in means between group Bin Laden and group Washington is not equal to 0
#> 95 percent confidence interval:
#> -0.802 -0.589
#> sample estimates:
#> mean in group Bin Laden mean in group Washington
#> 5.23 5.93
Participants appear to rate the quote a bit more positively when it is attributed to George Washington.
We can also perform some additional tests. For instance, does it
matter if the participant is from the US? And does age matter? To answer
these questions, we can perform interaction tests using
lm()
.
The interaction with the participant being from the U.S. or not:
<- lm(response ~ source * us_or_international, data = data)
lm_us_or_not summary(lm_us_or_not)
#>
#> Call:
#> lm(formula = response ~ source * us_or_international, data = data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -5.005 -1.228 -0.228 1.772 3.772
#>
#> Coefficients:
#> Estimate Std. Error t value
#> (Intercept) 5.2278 0.0437 119.50
#> sourceWashington 0.7769 0.0613 12.67
#> us_or_internationalinternational 0.0210 0.0955 0.22
#> sourceWashington:us_or_internationalinternational -0.3717 0.1323 -2.81
#> Pr(>|t|)
#> (Intercept) <2e-16 ***
#> sourceWashington <2e-16 ***
#> us_or_internationalinternational 0.826
#> sourceWashington:us_or_internationalinternational 0.005 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.16 on 6321 degrees of freedom
#> (18 observations deleted due to missingness)
#> Multiple R-squared: 0.0275, Adjusted R-squared: 0.027
#> F-statistic: 59.5 on 3 and 6321 DF, p-value: <2e-16
The interaction is significant, so it appears to matter whether the participant is from the U.S. or not. In fact, participants from the U.S. show a stronger effect than those from outside the U.S.
The interaction with the participant’s age:
<- lm(response ~ source * age, data = data)
lm_age summary(lm_age)
#>
#> Call:
#> lm(formula = response ~ source * age, data = data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -5.743 -1.202 -0.152 1.793 3.883
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 4.93370 0.09589 51.45 < 2e-16 ***
#> sourceWashington 0.55737 0.13558 4.11 4e-05 ***
#> age 0.01147 0.00336 3.41 0.00065 ***
#> sourceWashington:age 0.00545 0.00478 1.14 0.25433
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.16 on 6308 degrees of freedom
#> (31 observations deleted due to missingness)
#> Multiple R-squared: 0.0308, Adjusted R-squared: 0.0304
#> F-statistic: 66.9 on 3 and 6308 DF, p-value: <2e-16
No significant interaction effect, so we do not have evidence for age changing the size of the effect.
Let’s say these are the analyses we want to save the output of and
report later. This is where tidystats
comes in. The steps
to perform are to first create an empty list and then to use the
add_stats()
function to add analyses to the list. This is
why we stored each analysis into a variable. The
add_stats()
function takes an analysis, extracts the
statistics, and adds the result to a list. Optionally, you can add
additional information about each analysis, such as whether it was
preregistered, whether it was a primary, secondary, or exploratory
analysis, or simply add some notes.
# Create an empty list to store the analyses in
<- list()
results
# Add the analyses
<- results %>%
results add_stats(t_test, preregistered = TRUE, type = "primary",
notes = "A t-test comparing the effect of source on the quote rating.") %>%
add_stats(lm_us_or_not, preregistered = FALSE, type = "exploratory",
notes = "Interaction effect with being from the U.S. or not.") %>%
add_stats(lm_age)
You can see that I added quite some information to the first and second analysis. This is recommended because it is easy to forget which analysis is which; and you might accidentally report the wrong analysis if you have many of them. It’s also nice to add some documentation so that others who are not as familiar with your data can also better understand each analysis.
To save these analyses to a file, you can use the
write_stats()
function.
write_stats(results, "lorge-curtiss-1936-replication.json")
Note the file extension: .json. These types of files are simply text files, but in a format that is machine-readable (unfortunately, not very human-readable). This file can be used to share with others so that they can read it back into R and extract statistics (e.g., for meta-analyses) or by you to report the statistics in Word.
Lorge, I., & Curtiss, C. C. (1936). Prestige, suggestion, and attitudes. The Journal of Social Psychology, 7, 386-402. https://doi.org/10.1080/00224545.1936.9919891
Klein, R.A. et al. (2014) Investigating Variation in Replicability: A “Many Labs” Replication Project. Social Psychology, 45(3), 142-152. https://dx.doi.org/10.1027/1864-9335/a000178