Identify \(p_{path}\) significant features, extract principal components (PCs) from those specific features to construct a data matrix, predict the response with this data matrix, and record the model fit statistic of this prediction.

superpc.st(
  fit,
  data,
  n.threshold = 20,
  threshold.ignore = 0,
  n.PCs = 1,
  min.features = 3,
  epsilon = 1e-06
)

Arguments

fit

An object of class superpc returned by the function superpc.train.

data

A list of test data:

  • x : A "tall" pathway data frame (\(p_{path} \times N\)).

  • y : A response vector corresponding to type.

  • censoring.status : If type = "survival", the censoring indicator (\(1 - \) the observed event indicator). Otherwise, NULL.

  • featurenames : A character vector of the measured -Omes in x.

n.threshold

The number of bins into which to split the feature scores returned in the fit object.

threshold.ignore

Calculate the model for feature scores above this percentile of the threshold. We have observed that the smallest threshold values (0% - 40%) largely have no effect on model \(t\)-scores. Defaults to 0.00 (0%).

n.PCs

The number of PCs to extract from the pathway.

min.features

What is the smallest number of genes allowed in each pathway? This argument must be kept constant across all calls to this function which use the same pathway list. Defaults to 3.

epsilon

I'm not sure why this is important. It's called when comparing the absolute score values to each value of the threshold vector. Defaults to \(10^{-6}\).

Value

A list containing:

  • thresholds : A labelled vector of quantile values of the score vector in the fit object.

  • n.threshold : The number of splits to make in the score vector.

  • scor : A matrix of model fit statistics. Each column is the threshold level of predictors allowed into the model, and each row is a PC included. Which genes are included in the matrix before PC extraction is governed by comparing their model score to the quantile value of the scores at each threshold value.

  • tscor : A matrix of model \(t\)-statisics for each PC included (rows) at each threshold level (columns).

  • type : Which model was called? Options are survival, regression, or binary.

Details

NOTE: the number of thresholds at which to test (n.threshold) can be larger than the number of features to bin. This will result in constant \(t\)-statistics for the first few bins because the model isn't changing.

See https://web.stanford.edu/~hastie/Papers/spca_JASA.pdf.

See also

Examples

# DO NOT CALL THIS FUNCTION DIRECTLY. # Use SuperPCA_pVals() instead if (FALSE) { data("colon_pathwayCollection") data("colonSurv_df") colon_OmicsSurv <- CreateOmics( assayData_df = colonSurv_df[,-(2:3)], pathwayCollection_ls = colon_pathwayCollection, response = colonSurv_df[, 1:3], respType = "surv" ) asthmaGenes_char <- getTrimPathwayCollection(colon_OmicsSurv)[["KEGG_ASTHMA"]]$IDs data_ls <- list( x = t(getAssay(colon_OmicsSurv))[asthmaGenes_char, ], y = getEventTime(colon_OmicsSurv), censoring.status = getEvent(colon_OmicsSurv), featurenames = asthmaGenes_char ) superpcFit <- superpc.train( data = data_ls, type = "surv" ) superpc.st( fit = superpcFit, data = data_ls ) }