9 Wilcoxon Signed Rank Test for One Sample
9.1 Introduction to Wilcoxson Signed Rank Test
The one-sample Wilcoxson Signed Rank Test is used to compare a sample proportion to a population proportion.
9.2 Mathematical definition of the Wilcoxson Signed Rank Test
See the maths here: https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test
9.3 Data source and description
We will use gene-level \(p\)-values from the Golub and Van Loan (1999) data set from the R package multtest::
(https://rdrr.io/bioc/multtest/man/golub.html); the original is a data set of data set of gene expression values for leukemia, but we have gene-specific \(p\)-values from a gene-level hypothesis test. We created these \(p\)-values in the script R/create_golub_data_20240523.R
, but they do not represent any real analysis results.
9.4 Cleaning the data to create a model data frame
Because our method requires only one sample, we have very little work to do. We import the data set of \(p\)-values.
golub_pVals_num <- readRDS(file = "../data/02_golub_pVals_20240523.rds")
There are 3051 \(p\)-values. The null hypothesis would be that there is no statistically significant effects in the data, so the distribution of these \(p\)-values should be a Uniform distribution. Our hypothesis is that the population mean is then 0.5 (the average value of a Uniform distribution).
9.5 Assumptions of the Wilcoxson Signed Rank Test
To use a one-sample Wilcoxson Signed Rank Test, we make the following assumptions:
- The data are from a random sample
- Each observation in the data are independent
- The values can be “ranked”
If these assumptions hold, then the test statistic is asymptotically normal.
9.6 Checking the assumptions
9.6.1 Independence and Randomness
These are gene-level \(p\)-values, so we do not have “independence”. However, because this is a pedagogical example, we will take a random sample of these genes to test (and this random sample should be independent enough, but we have no guarantee of this).
What does the data distribution look like?
hist(gene_sample)
Remember, this is a “fake” analysis (all 38 samples in this data are leukemia cases, and I tested one half against the other—there should absolutely NOT be any real biological signal in this data).
9.6.2 Type of Data
These values are \(p\)-values, so they can be ranked.
9.7 Code to run a Wilcoxson Signed Rank Test
Now that we have checked our assumptions, we can perform the Wilcoxson Signed Rank Test on random samples of the genes to test if they have an average value of 0.5.
wilcox.test(
x = gene_sample,
mu = 0.5, # average from all theoretical p-values under H0
alternative = "less" # H1: random p-values < 0.5
)
Wilcoxon signed rank test with continuity correction
data: gene_sample
V = 5859, p-value = 1.584e-07
alternative hypothesis: true location is less than 0.5
9.8 Brief interpretation of the output
The \(p\)-value for this test is less than 0.05, so we reject the hypothesis that the average gene-specific \(p\)-value for this set of results is greater than or equal to 0.5 (the theoretical average of \(p\)-values under the null hypothesis).