sphunif
: Uniformity Tests on
the Circle, Sphere, and HypersphereSuppose that you want to test if a sample of circular data is uniformly distributed. For example, the following circular uniform sample in radians:
Then you can call the main function in the sphunif
package, unif_test
, specifying the type
of
test to be performed. For example, the Watson (omnibus) test:
library(sphunif)
#> Loading required package: Rcpp
unif_test(data = cir_data, type = "Watson") # An htest object
#>
#> Watson test of circular uniformity
#>
#> data: cir_data
#> statistic = 0.03701, p-value = 0.8584
#> alternative hypothesis: any alternative to circular uniformity
By default, the calibration of the test statistic is done by Monte
Carlo. This can be changed with p_value = "asymp"
to use
the asymptotic distribution:
unif_test(data = cir_data, type = "Watson", p_value = "MC") # Monte Carlo
#> Loading required package: foreach
#> Loading required package: future
#>
#> Watson test of circular uniformity
#>
#> data: cir_data
#> statistic = 0.03701, p-value = 0.8592
#> alternative hypothesis: any alternative to circular uniformity
unif_test(data = cir_data, type = "Watson", p_value = "asymp") # Asymp. distr.
#>
#> Watson test of circular uniformity
#>
#> data: cir_data
#> statistic = 0.03701, p-value = 0.8584
#> alternative hypothesis: any alternative to circular uniformity
You can perform several tests within a single call
to unif_test
. Choose the available circular uniformity
tests from
avail_cir_tests
#> [1] "Ajne" "Bakshaev" "Bingham" "Cressie"
#> [5] "CCF09" "FG01" "Gine_Fn" "Gine_Gn"
#> [9] "Gini" "Gini_squared" "Greenwood" "Hermans_Rasson"
#> [13] "Hodges_Ajne" "Kuiper" "Log_gaps" "Max_uncover"
#> [17] "Num_uncover" "PAD" "PCvM" "PRt"
#> [21] "Pycke" "Pycke_q" "Range" "Rao"
#> [25] "Rayleigh" "Riesz" "Rothman" "Vacancy"
#> [29] "Watson" "Watson_1976"
For example, you can use the Projected Anderson–Darling (García-Portugués, Navarro-Esteban, and Cuesta-Albertos (2020), also an omnibus test) test and the Watson test:
# A *list* of htest objects
unif_test(data = cir_data, type = c("PAD", "Watson"))
#> Series truncated from 10000 to 10000 terms (difference <= 0 with the HBE tail probability; last weight = 1.137e-08).
#> $PAD
#>
#> Projected Anderson-Darling test of circular uniformity
#>
#> data: cir_data
#> statistic = 0.54217, p-value = 0.8403
#> alternative hypothesis: any alternative to circular uniformity
#>
#>
#> $Watson
#>
#> Watson test of circular uniformity
#>
#> data: cir_data
#> statistic = 0.03701, p-value = 0.8584
#> alternative hypothesis: any alternative to circular uniformity
García-Portugués and Verdebout (2018) gives a review of different tests of uniformity on the circle. Section 5.1 in Pewsey and García-Portugués (2021) also gives an overview of recent advances.
Suppose now that you want to test if a sample of spherical data is uniformly distributed. Consider a non-uniformly-generated1 sample of directions in Cartesian coordinates, such as:
# Sample data on S^2
set.seed(987202226)
theta <- runif(n = 30, min = 0, max = 2 * pi)
phi <- runif(n = 30, min = 0, max = pi)
sph_data <- cbind(cos(theta) * sin(phi), sin(theta) * sin(phi), cos(phi))
The available spherical uniformity tests:
avail_sph_tests
#> [1] "Ajne" "Bakshaev" "Bingham" "CJ12" "CCF09"
#> [6] "Gine_Fn" "Gine_Gn" "PAD" "PCvM" "PRt"
#> [11] "Pycke" "Rayleigh" "Rayleigh_HD" "Riesz"
See again García-Portugués and Verdebout (2018) for a review of tests of uniformity on the sphere and complementary Section 5.1 in Pewsey and García-Portugués (2021).
The default type = "all"
equals
type = avail_sph_tests
, which might be too much for
standard analysis:
unif_test(data = sph_data, type = "all", p_value = "MC", M = 1e3)
#> $Ajne
#>
#> Ajne test of spherical uniformity
#>
#> data: sph_data
#> statistic = 0.057937, p-value = 0.991
#> alternative hypothesis: any non-axial alternative to spherical uniformity
#>
#>
#> $Bakshaev
#>
#> Bakshaev (2010) test of spherical uniformity
#>
#> data: sph_data
#> statistic = 1.0215, p-value = 0.622
#> alternative hypothesis: any alternative to spherical uniformity
#>
#>
#> $Bingham
#>
#> Bingham test of spherical uniformity
#>
#> data: sph_data
#> statistic = 17.573, p-value = 0.003
#> alternative hypothesis: scatter matrix different from constant
#>
#>
#> $CJ12
#>
#> Cai and Jiang (2012) test of spherical uniformity
#>
#> data: sph_data
#> statistic = 19.442, p-value = 0.78
#> alternative hypothesis: unclear consistency
#>
#>
#> $CCF09
#>
#> Cuesta-Albertos et al. (2009) test of spherical uniformity with k = 50
#>
#> data: sph_data
#> statistic = 1.1355, p-value = 0.764
#> alternative hypothesis: any alternative to spherical uniformity
#>
#>
#> $Gine_Fn
#>
#> Gine's Fn test of spherical uniformity
#>
#> data: sph_data
#> statistic = 1.5391, p-value = 0.392
#> alternative hypothesis: any alternative to spherical uniformity
#>
#>
#> $Gine_Gn
#>
#> Gine's Gn test of spherical uniformity
#>
#> data: sph_data
#> statistic = 1.3074, p-value = 0.003
#> alternative hypothesis: any axial alternative to spherical uniformity
#>
#>
#> $PAD
#>
#> Projected Anderson-Darling test of spherical uniformity
#>
#> data: sph_data
#> statistic = 0.91653, p-value = 0.5
#> alternative hypothesis: any alternative to spherical uniformity
#>
#>
#> $PCvM
#>
#> Projected Cramer-von Mises test of spherical uniformity
#>
#> data: sph_data
#> statistic = 0.12769, p-value = 0.622
#> alternative hypothesis: any alternative to spherical uniformity
#>
#>
#> $PRt
#>
#> Projected Rothman test of spherical uniformity with t = 0.333
#>
#> data: sph_data
#> statistic = 0.15523, p-value = 0.666
#> alternative hypothesis: any alternative to spherical uniformity if t is irrational
#>
#>
#> $Pycke
#>
#> Pycke test of spherical uniformity
#>
#> data: sph_data
#> statistic = 0.042839, p-value = 0.299
#> alternative hypothesis: any alternative to spherical uniformity
#>
#>
#> $Rayleigh
#>
#> Rayleigh test of spherical uniformity
#>
#> data: sph_data
#> statistic = 0.13345, p-value = 0.98
#> alternative hypothesis: mean direction different from zero
#>
#>
#> $Rayleigh_HD
#>
#> HD-standardized Rayleigh test of spherical uniformity
#>
#> data: sph_data
#> statistic = -1.1703, p-value = 0.98
#> alternative hypothesis: mean direction different from zero
#>
#>
#> $Riesz
#>
#> Warning! This is an experimental test not meant to be used
#>
#> data: sph_data
#> statistic = 1.0215, p-value = 0.622
#> alternative hypothesis: unclear, experimental test
unif_test(data = sph_data, type = "Rayleigh", p_value = "asymp")
#>
#> Rayleigh test of spherical uniformity
#>
#> data: sph_data
#> statistic = 0.13345, p-value = 0.9875
#> alternative hypothesis: mean direction different from zero
The hyperspherical setting is treated analogously to the
spherical setting, and the available tests are exactly the same
(avail_sph_tests
). Here is an example of testing uniformity
with a sample of size 100
on the \(9\)-sphere:
The following snippet partially reproduces the data application in
García-Portugués, Navarro-Esteban, and
Cuesta-Albertos (2021) on testing the uniformity of known
Venusian craters. Incidentally, it also illustrates how to monitor the
progress of a Monte Carlo simulation when p_value = "MC"
using progressr.
# Load spherical data
data(venus)
head(venus)
#> name diameter theta phi
#> 1 Janice 10.0 4.5724136 1.523672
#> 2 HuaMulan 24.0 5.8939769 1.514946
#> 3 Tatyana 19.0 3.7070793 1.490511
#> 4 Landowska 33.0 1.2967796 1.476025
#> 5 Ruslanova 44.3 0.2897247 1.465029
#> 6 Sveta 21.0 4.7684140 1.439181
nrow(venus)
#> [1] 967
# Compute Cartesian coordinates from polar coordinates
venus$X <- cbind(cos(venus$theta) * cos(venus$phi),
sin(venus$theta) * cos(venus$phi),
sin(venus$phi))
# Test uniformity using the Projected Cramér-von Mises and Projected
# Anderson-Darling tests on S^2, both using asymptotic distributions
unif_test(data = venus$X, type = c("PCvM", "PAD"), p_value = "asymp")
#> Series truncated from 10000 to 10000 terms (difference <= 0 with the HBE tail probability; last weight = 6.249e-14).
#> Series truncated from 10000 to 10000 terms (difference <= 0 with the HBE tail probability; last weight = 4.999e-13).
#> $PCvM
#>
#> Projected Cramer-von Mises test of spherical uniformity
#>
#> data: venus$X
#> statistic = 0.25844, p-value = 0.1272
#> alternative hypothesis: any alternative to spherical uniformity
#>
#>
#> $PAD
#>
#> Projected Anderson-Darling test of spherical uniformity
#>
#> data: venus$X
#> statistic = 1.5077, p-value = 0.1149
#> alternative hypothesis: any alternative to spherical uniformity
# Define a handler for progressr
require(progress)
#> Loading required package: progress
require(progressr)
#> Loading required package: progressr
handlers(handler_progress(
format = ":spin [:bar] :percent Total: :elapsedfull End \u2248 :eta",
clear = FALSE))
# Test uniformity using Monte-Carlo approximated null distributions
with_progress(
unif_test(data = venus$X, type = c("PCvM", "PAD"),
p_value = "MC", chunks = 1e2, M = 5e2, cores = 2)
)
#> $PCvM
#>
#> Projected Cramer-von Mises test of spherical uniformity
#>
#> data: venus$X
#> statistic = 0.25844, p-value = 0.126
#> alternative hypothesis: any alternative to spherical uniformity
#>
#>
#> $PAD
#>
#> Projected Anderson-Darling test of spherical uniformity
#>
#> data: venus$X
#> statistic = 1.5077, p-value = 0.116
#> alternative hypothesis: any alternative to spherical uniformity
unif_stat
allows to compute several statistics to
different samples within a single call, thus facilitating Monte Carlo
experiments:
# M samples of size n on S^2
samps_sph <- r_unif_sph(n = 30, p = 3, M = 10)
# Apply all statistics to the M samples
unif_stat(data = samps_sph, type = "all")
#> Ajne Bakshaev Bingham CJ12 CCF09 Gine_Fn Gine_Gn
#> 1 0.47609328 2.2119954 2.382230 18.36313 1.5714737 2.2702163 0.3658432
#> 2 0.05231638 0.5266412 4.984710 21.26823 0.8961785 0.7372848 0.5280193
#> 3 0.21601031 1.0974739 3.752754 18.42823 1.4807671 1.2776073 0.4135661
#> 4 0.16994745 0.8920866 2.710392 26.76415 1.1073186 1.0173293 0.3375395
#> 5 0.24320463 1.2717785 3.607297 23.14723 1.3761164 1.4079420 0.4351235
#> 6 0.31846184 1.5920190 4.297304 21.63874 1.3718298 1.7129064 0.4390590
#> 7 0.29956640 1.5662143 4.377052 21.03363 1.5190590 1.7113360 0.5130704
#> 8 0.24576739 1.1694726 1.151481 22.37936 1.1051464 1.2430711 0.2600015
#> 9 0.30644656 1.5440129 4.982531 21.16927 1.4369394 1.6971020 0.4713157
#> 10 0.29664792 1.5808092 5.911191 24.17573 1.4809133 1.7620336 0.5754419
#> PAD PCvM PRt Pycke Rayleigh Rayleigh_HD
#> 1 1.5396354 0.27649943 0.39262094 0.157845184 6.50306437 1.43012004
#> 2 0.4863028 0.06583015 0.08353385 -0.140634304 0.09011958 -1.18795371
#> 3 0.8766092 0.13718424 0.16400787 0.019631134 1.87776213 -0.45815169
#> 4 0.6948689 0.11151082 0.14774812 -0.103164830 1.75644170 -0.50768055
#> 5 0.9503006 0.15897231 0.21864541 -0.006193974 2.95653601 -0.01774410
#> 6 1.1442056 0.19900238 0.27062129 0.007535607 4.18444451 0.48354745
#> 7 1.1468233 0.19577679 0.27158541 0.055528018 3.88963126 0.36319044
#> 8 0.8602975 0.14618407 0.19759185 -0.066992075 2.93592539 -0.02615835
#> 9 1.1340723 0.19300161 0.25856393 0.037310492 3.76467886 0.31217884
#> 10 1.1687326 0.19760115 0.27044780 0.065156710 3.75199894 0.30700228
#> Riesz
#> 1 2.2119954
#> 2 0.5266412
#> 3 1.0974739
#> 4 0.8920866
#> 5 1.2717785
#> 6 1.5920190
#> 7 1.5662143
#> 8 1.1694726
#> 9 1.5440129
#> 10 1.5808092
Additionally, unif_stat_MC
is an utility for performing
the previous simulation through a convenient parallel wrapper:
# Break the simulation in 10 chunks of tasks to be divided between 2 cores
sim <- unif_stat_MC(n = 30, type = "all", p = 3, M = 1e4, cores = 2,
chunks = 10)
# Critical values for test statistics
sim$crit_val_MC
#> Ajne Bakshaev Bingham CJ12 CCF09 Gine_Fn Gine_Gn PAD
#> 0.1 0.4496785 2.196705 9.025914 26.01065 1.647233 2.342101 0.7567629 1.550902
#> 0.05 0.5524659 2.621310 10.847585 26.72631 1.767765 2.731491 0.8646518 1.815025
#> 0.01 0.7630362 3.567303 15.001728 27.91766 2.001870 3.662148 1.1247111 2.408752
#> PCvM PRt Pycke Rayleigh Rayleigh_HD Riesz
#> 0.1 0.2745881 0.3814613 0.1485489 6.164469 1.291889 2.196705
#> 0.05 0.3276638 0.4586759 0.2158455 7.797687 1.958648 2.621310
#> 0.01 0.4459129 0.6330852 0.3694422 11.235679 3.362202 3.567303
# Test statistics
head(sim$stats_MC)
#> Ajne Bakshaev Bingham CJ12 CCF09 Gine_Fn Gine_Gn
#> 1 0.07731898 0.6733451 6.4629125 16.48550 1.1562364 0.9231689 0.6138929
#> 2 0.12829023 0.6640230 1.5313998 22.16624 0.9716421 0.7897359 0.2765750
#> 3 0.37101536 1.6889680 0.7746345 19.10714 1.4180425 1.7274690 0.2434075
#> 4 0.39459572 1.9804764 4.8542543 23.13971 1.4792212 2.1076112 0.5292283
#> 5 0.08893105 0.8522425 9.6273995 24.74282 1.1618723 1.1923836 0.8366594
#> 6 0.14701046 0.8374124 3.7637907 25.14477 1.2318834 1.0074790 0.4194372
#> PAD PCvM PRt Pycke Rayleigh Rayleigh_HD Riesz
#> 1 0.6031523 0.08416814 0.09909001 -0.08114254 0.3037207 -1.1007514 0.6733451
#> 2 0.5595737 0.08300288 0.09857764 -0.12664657 0.8429050 -0.8806304 0.6640230
#> 3 1.1906011 0.21112100 0.28955223 0.05017846 4.8530448 0.7565024 1.6889680
#> 4 1.4047872 0.24755955 0.35056836 0.09179785 5.4328088 0.9931900 1.9804764
#> 5 0.7607502 0.10653031 0.12523474 -0.02107874 0.3352458 -1.0878814 0.8522425
#> 6 0.6849339 0.10467655 0.13017402 -0.07467730 1.2446565 -0.7166160 0.8374124
# Power computation using a pre-built sampler for the alternative
pow <- unif_stat_MC(n = 30, type = "all", p = 3, M = 1e4, cores = 2,
chunks = 10, r_H1 = r_alt, crit_val = sim$crit_val_MC,
alt = "vMF", kappa = 1)
pow$power_MC
#> Ajne Bakshaev Bingham CJ12 CCF09 Gine_Fn Gine_Gn PAD PCvM PRt
#> 0.1 0.8263 0.8226 0.1460 0.1001 0.7769 0.8126 0.1442 0.8191 0.8226 0.8243
#> 0.05 0.7208 0.7197 0.0768 0.0552 0.6593 0.7134 0.0784 0.7159 0.7197 0.7240
#> 0.01 0.4774 0.4647 0.0168 0.0110 0.3961 0.4512 0.0171 0.4624 0.4647 0.4676
#> Pycke Rayleigh Rayleigh_HD Riesz
#> 0.1 0.7924 0.8282 0.8282 0.8226
#> 0.05 0.6890 0.7246 0.7246 0.7197
#> 0.01 0.4337 0.4729 0.4729 0.4647
# How to use a custom sampler according to ?unif_stat_MC
r_H1 <- function(n, p, M, l = 1) {
samp <- array(dim = c(n, p, M))
for (j in 1:M) {
samp[, , j] <- mvtnorm::rmvnorm(n = n, mean = c(l, rep(0, p - 1)),
sigma = diag(rep(1, p)))
samp[, , j] <- samp[, , j] / sqrt(rowSums(samp[, , j]^2))
}
return(samp)
}
pow <- unif_stat_MC(n = 30, type = "all", p = 3, M = 1e2, cores = 2,
chunks = 5, r_H1 = r_H1, crit_val = sim$crit_val_MC)
pow$power_MC
#> Ajne Bakshaev Bingham CJ12 CCF09 Gine_Fn Gine_Gn PAD PCvM PRt Pycke
#> 0.1 1.00 1.00 0.44 0.13 0.99 1.00 0.43 1.00 1.00 1.00 0.99
#> 0.05 0.99 0.99 0.30 0.06 0.97 0.99 0.29 0.99 0.99 0.99 0.99
#> 0.01 0.97 0.98 0.07 0.00 0.94 0.97 0.08 0.98 0.98 0.98 0.96
#> Rayleigh Rayleigh_HD Riesz
#> 0.1 1.00 1.00 1.00
#> 0.05 0.99 0.99 0.99
#> 0.01 0.97 0.97 0.98
unif_stat_MC
can be used for constructing the Monte
Carlo calibration necessary for unif_test
, either for
providing a rejection rule based on $crit_val_MC
or for
approximating the \(p\)-value by
$stats_MC
.
# Using precomputed critical values
ht1 <- unif_test(data = samps_sph[, , 1], type = "Rayleigh",
p_value = "crit_val", crit_val = sim$crit_val_MC)
ht1
#>
#> Rayleigh test of spherical uniformity
#>
#> data: samps_sph[, , 1]
#> statistic = 6.5031, p-value = NA
#> alternative hypothesis: mean direction different from zero
ht1$reject
#> 0.1 0.05 0.01
#> FALSE TRUE TRUE
# Using precomputed Monte Carlo statistics
ht2 <- unif_test(data = samps_sph[, , 1], type = "Rayleigh",
p_value = "MC", stats_MC = sim$stats_MC)
ht2
#>
#> Rayleigh test of spherical uniformity
#>
#> data: samps_sph[, , 1]
#> statistic = 6.5031, p-value = 0.999
#> alternative hypothesis: mean direction different from zero
ht2$reject
#> 0.1 0.05 0.01
#> TRUE TRUE TRUE
# Faster than
unif_test(data = samps_sph[, , 1], type = "Rayleigh", p_value = "MC")
#>
#> Rayleigh test of spherical uniformity
#>
#> data: samps_sph[, , 1]
#> statistic = 6.5031, p-value = 0.09281
#> alternative hypothesis: mean direction different from zero
Uniformly-distributed polar coordinates do not translate into uniformly-distributed Cartesian coordinates!↩︎