Layer modeling correlation of statistics

layer_stat_cor(
  stats,
  modeling_results = fetch_data(type = "modeling_results"),
  model_type = names(modeling_results)[1],
  reverse = FALSE,
  top_n = NULL
)

Arguments

stats

A query data.frame where the row names are ENSEMBL gene IDs, the column names are labels for clusters of cells or cell types, and where each cell contains the given statistic for that gene and cell type. These statistics should be computed similarly to the modeling results from the data we provide. For example, like the enrichment t-statistics that are derived from comparing one layer against the rest. The stats will be matched and then correlated with the reference statistics.

If using the output of registration_wrapper() then use $enrichment to access the results from registration_stats_enrichment(). This function will automatically extract the statistics and assign the ENSEMBL gene IDs to the row names of the query matrix.

modeling_results

Defaults to the output of fetch_data(type = 'modeling_results'). This is a list of tables with the columns f_stat_* or t_stat_* as well as p_value_* and fdr_* plus ensembl. The column name is used to extract the statistic results, the p-values, and the FDR adjusted p-values. Then the ensembl column is used for matching in some cases. See fetch_data() for more details. Typically this is the set of reference statistics used in layer_stat_cor().

model_type

A named element of the modeling_results list. By default that is either enrichment for the model that tests one human brain layer against the rest (one group vs the rest), pairwise which compares two layers (groups) denoted by layerA-layerB such that layerA is greater than layerB, and anova which determines if any layer (group) is different from the rest adjusting for the mean expression level. The statistics for enrichment and pairwise are t-statistics while the anova model ones are F-statistics.

reverse

A logical(1) indicating whether to multiply by -1 the input statistics and reverse the layerA-layerB column names (using the -) into layerB-layerA.

top_n

An integer(1) specifying whether to filter to the top n marker genes. The default is NULL in which case no filtering is done.

Value

A correlation matrix between the query stats and the reference statistics using only the ENSEMBL gene IDs present in both tables. The columns are sorted using hierarchical clustering.

Details

Check https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/Layer_Guesses/dlpfc_snRNAseq_annotation.R for a full analysis from which this family of functions is derived from.

Author

Andrew E Jaffe, Leonardo Collado-Torres

Examples