The `vtable`

package serves the purpose of outputting
automatic variable documentation that can be easily viewed while
continuing to work with data.

`vtable`

contains four main functions:
`vtable()`

(or `vt()`

), `sumtable()`

(or `st()`

), `labeltable()`

, and
`dftoHTML()`

/`dftoLaTeX()`

.

This vignette focuses on some bonus helper functions that come with
`vtable`

that have been exported because they may be handy to
you. This can come in handy for saving a little time, and can help you
avoid having to create an unnamed function when you need to call a
function.

`vtable`

includes four shortcut functions. These are
generally intended for use with the `summ`

option in
`vtable`

and `sumtable`

because nested functions
don’t look very nice in a `vtable`

, or in a
`sumtable`

unless you explicitly set the
`summ.names`

.

`nuniq`

`nuniq(x)`

returns `length(unique(x))`

, the
number of unique values in the vector.

`countNA`

, `propNA`

, and
`notNA`

These three functions are shortcuts for dealing with missing data. You have probably written out the nested versions of these many times!

Function | Short For |
---|---|

`countNA()` |
`sum(is.na())` |

`propNA()` |
`mean(is.na())` |

`notNA()` |
`sum(!is.na())` |

Note that `notNA()`

also has some additional formatting
options, which you would probably ignore if using it iteractively.

`is.round`

This function is a shortcut for
`!any(!(x == round(x,digits)))`

.

It takes two arguments: a vector `x`

and a number of
`digits`

(0 by default). It checks whether you can round to
`digits`

digits without losing any information.

`formatfunc`

`formatfunc()`

is a function that returns a function,
which itself helps format numbers using the `format()`

function, in the same spirit as the `label_`

functions in the
scales package. It is largely used for the `numformat`

argument of `sumtable()`

.

`formatfunc()`

for the most part takes the same arguments
as `format()`

, and so `help(format)`

can be a
guide for using it. However, there are some differences.

Some defaults are changed. By default,
`scientific = FALSE, trim = TRUE`

.

There are four new arguments as well. `percent = TRUE`

will format the number as a percentage by multiplying it by 100 and
adding a % at the end. You can instead set `percent`

equal to
some number, and that number will instead be taken as 100%, instead of
1. So `percent = 100`

, for example, will just add a % at the
end without doing any multiplying.

`prefix`

and `suffix`

will, naturally, add
prefixes or suffixes to the formatted number. So
`prefix = '$', suffix = 'M'`

, for example, will produce a
function that will turn `3`

into `$3M`

.
`scale`

will multiply the number by `scale`

before
formatting it. So
`prefix = '$', suffix = 'M', scale = 1/1000000`

will turn
`3000000`

into `$3M`

.

```
library(vtable)
<- formatfunc(percent = TRUE, digits = 3, nsmall = 2, big.mark = ',')
my_formatter_func my_formatter_func(523.2355987)
```

`## [1] "52,323.56%"`

`pctile`

`pctile(x)`

is short for
`quantile(x,1:100/100)`

. So in one sense this is another
shortcut function. But this inherently lets you interact with
percentiles a bit differently.

While `quantile()`

has you specify which percentile you
want in the function call, `pctile()`

returns an object with
all integer percentiles, and you can pull out which ones you want
afterwards. `pctile(x)[50]`

is the 50th percentile, etc..
This can be convenient in several applications, an obvious one being in
`sumtable`

.

```
library(vtable)
#Some random normal data, and its percentiles
<- rnorm(1000)
d <- pctile(d)
pc
#25th, 50th, 75th percentile
c(25,50,75)] pc[
```

```
## 25% 50% 75%
## -0.690855543 -0.006592872 0.663079331
```

```
#Inverse normal CDF with 100 points of articulation
plot(pc)
```

`weighted.sd`

`weighted.sd(x, w)`

is a function to calculate a weighted
standard deviation of `x`

using `w`

as weights,
much like the base `weighted.mean()`

does for means. It is
mostly used as a helper function for `sumtable()`

when
`group.weights`

is specified. However, you can use it on its
own if you like. Unlike `weighted.mean()`

, setting
`na.rm = TRUE`

will account for missings both in
`x`

and `w`

.

The weighted standard deviation is calculated as

\[ \frac{\sum_i(w_i*(x_i-\bar{x}_w)^2)}{\frac{N_w-1}{N_w}\sum_iw_i} \]

Where \(\bar{x}_w\) is the weighted mean of \(x\), and \(N_w\) is the number of observations with a nonzero weight.

```
<- 1:100
x <- 1:100
w weighted.mean(x, w)
```

`## [1] 67`

`sd(x)`

`## [1] 29.01149`

`weighted.sd(x, w)`

`## [1] 23.80476`

`independence.test`

`independence.test`

is a helper function for
`sumtable(group.test=TRUE)`

that tests for independence
between a categorical variable `x`

and another variable
`y`

that may be categorical or numerical.

Then, it outputs a *formatted string* as its output, with
significance stars, for printing.

The function takes the format

```
independence.test(x,y,w=NA,
factor.test = NA,
numeric.test = NA,
star.cutoffs = c(.01,.05,.1),
star.markers = c('***','**','*'),
digits = 3,
fixed.digits = FALSE,
format = '{name}={stat}{stars}',
opts = list())
```

`factor.test`

and `numeric.test`

These are functions that actually perform the independence test.
`numeric.test`

is used when `y`

is numeric, and
`factor.test`

is used in all other instances.

Specifically, these functions should take only `x`

,
`y`

, and `w=NULL`

as arguments, and should return
a list with three elements: the name of the test statistic, the test
statistic itself, and the p-value of the test.

By default, these are the internal functions
`vtable:::chisq.it`

for `factor.test`

and
`vtable:::groupf.it`

for `numeric.test`

, so you
can take a look at those (just put `vtable:::chisq.it`

in the
terminal and it will show you the function’s code) if you’d like to make
your own test functions.

`star.cutoffs`

and `star.markers`

These are numeric and character vectors, respectively, used for p-value cutoffs and to create significance markers.

`star.cutoffs`

indicates the cutoffs, and
`star.markers`

indicates the markers to be used with each
cutoff, in the same order. So with
`star.cutoffs = c(.01,.05,.1)`

and
`star.markers = c('***','**','*')`

, each p-value below .01
will get marked with `'***'`

, each from .01 to .05 will get
`'**'`

, and each from .05 to .1 will get `*`

.

Defaults are set to “economics defaults” (.1, .05, .01). But these are of course easy to change.

```
data(iris)
independence.test(iris$Species,
$Sepal.Length,
irisstar.cutoffs = c(.05,.01,.001))
```

`## [1] "F=119.265*"`

`digits`

and `fixed.digits`

`digits`

indicates how many digits after the decimal place
from the test statistics and p-values should be displayed.
`fixed.digits`

determines whether trailing zeros are
maintained.

```
independence.test(iris$Species,
$Sepal.Width,
irisdigits=1)
```

`## [1] "F=49.2***"`

```
independence.test(iris$Species,
$Sepal.Width,
irisdigits=4,
fixed.digits = TRUE)
```

`## [1] "F=49.1600***"`

`format`

This is the printing format that the output will produce,
incorporating the name of the test statistic `{name}`

, the
test statistic `{stat}`

, the significance markers
`{stars}`

, and the p-value `{pval}`

.

If your `independence.test`

is heading out to another
format besides being printed in the R console, you may want to add
additional markup like `'{name}$={stat}^{stars}$'}`

in LaTeX
or `'{name}={stat}<sup>{stars}</sup>'`

in HTML.
If you do this, be sure to think carefully about escaping or not
escaping characters as appropriate when you print!

```
independence.test(iris$Species,
$Sepal.Width,
irisformat = 'Pr(>{name}): {pval}{stars}')
```

`## [1] "Pr(>F): <0.001***"`

`opts`

You can create a named list where the names are the above options and
the values are the settings for those options, and input it into
`independence.test`

using `opts=`

. This is an easy
way to set the same options for many
`independence.test`

s.