Layer modeling correlation of statistics

layer_stat_cor(
  stats,
  modeling_results = fetch_data(type = "modeling_results"),
  model_type = names(modeling_results)[1],
  reverse = FALSE,
  top_n = NULL
)

Arguments

stats

A query data.frame where the row names are ENSEMBL gene IDs, the column names are labels for clusters of cells or cell types, and where each cell contains the given statistic for that gene and cell type. These statistics should be computed similarly to the modeling results from the data we provide. For example, like the enrichment t-statistics that are derived from comparing one layer against the rest. The stats will be matched and then correlated with the reference statistics.

If using the output of registration_wrapper() then use $enrichment to access the results from registration_stats_enrichment(). This function will automatically extract the statistics and assign the ENSEMBL gene IDs to the row names of the query matrix.

modeling_results

Defaults to the output of fetch_data(type = 'modeling_results'). This is a list of tables with the columns f_stat_* or t_stat_* as well as p_value_* and fdr_* plus ensembl. The column name is used to extract the statistic results, the p-values, and the FDR adjusted p-values. Then the ensembl column is used for matching in some cases. See fetch_data() for more details. Typically this is the set of reference statistics used in layer_stat_cor().

model_type

A named element of the modeling_results list. By default that is either enrichment for the model that tests one human brain layer against the rest (one group vs the rest), pairwise which compares two layers (groups) denoted by layerA-layerB such that layerA is greater than layerB, and anova which determines if any layer (group) is different from the rest adjusting for the mean expression level. The statistics for enrichment and pairwise are t-statistics while the anova model ones are F-statistics.

reverse

A logical(1) indicating whether to multiply by -1 the input statistics and reverse the layerA-layerB column names (using the -) into layerB-layerA.

top_n

An integer(1) specifying whether to filter to the top n marker genes. The default is NULL in which case no filtering is done.

Value

A correlation matrix between the query stats and the reference statistics using only the ENSEMBL gene IDs present in both tables. The columns are sorted using hierarchical clustering.

Details

Check https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/Layer_Guesses/dlpfc_snRNAseq_annotation.R for a full analysis from which this family of functions is derived from.

See also

Other Layer correlation functions: annotate_registered_clusters(), layer_stat_cor_plot()

Author

Andrew E Jaffe, Leonardo Collado-Torres

Examples


## Obtain the necessary data
if (!exists("modeling_results")) {
    modeling_results <- fetch_data(type = "modeling_results")
}
#> 2024-12-16 21:51:56.023728 loading file /github/home/.cache/R/BiocFileCache/5db39a09009_Human_DLPFC_Visium_modeling_results.Rdata%3Fdl%3D1

## Compute the correlations
cor_stats_layer <- layer_stat_cor(
    tstats_Human_DLPFC_snRNAseq_Nguyen_topLayer,
    modeling_results,
    model_type = "enrichment"
)

## Explore the correlation matrix
head(cor_stats_layer[, seq_len(3)])
#>               WM       Layer6     Layer5
#> 22 (3) 0.6824669 -0.009192291 -0.1934265
#> 3 (3)  0.7154122 -0.070042729 -0.2290574
#> 23 (3) 0.6637885 -0.031467704 -0.2018306
#> 17 (3) 0.6364983 -0.094216046 -0.2026147
#> 21 (3) 0.6281443 -0.050336358 -0.1988774
#> 7 (4)  0.1850724 -0.197283175 -0.2716890
summary(cor_stats_layer)
#>        WM               Layer6              Layer5             Layer4         
#>  Min.   :-0.46352   Min.   :-0.197283   Min.   :-0.27169   Min.   :-0.253477  
#>  1st Qu.:-0.26653   1st Qu.:-0.071919   1st Qu.:-0.14318   1st Qu.:-0.152714  
#>  Median :-0.19813   Median :-0.039558   Median : 0.03858   Median : 0.003875  
#>  Mean   :-0.02982   Mean   : 0.004476   Mean   : 0.01355   Mean   : 0.019215  
#>  3rd Qu.: 0.12482   3rd Qu.: 0.018964   3rd Qu.: 0.16692   3rd Qu.: 0.160288  
#>  Max.   : 0.71541   Max.   : 0.457031   Max.   : 0.30194   Max.   : 0.425598  
#>      Layer3             Layer2             Layer1        
#>  Min.   :-0.36105   Min.   :-0.31419   Min.   :-0.29670  
#>  1st Qu.:-0.10007   1st Qu.:-0.06673   1st Qu.:-0.12129  
#>  Median : 0.05998   Median : 0.01664   Median :-0.01590  
#>  Mean   : 0.02022   Mean   : 0.01056   Mean   :-0.01750  
#>  3rd Qu.: 0.13638   3rd Qu.: 0.11008   3rd Qu.: 0.03523  
#>  Max.   : 0.56413   Max.   : 0.50734   Max.   : 0.63940  

## Repeat with top_n set to 10
summary(layer_stat_cor(
    tstats_Human_DLPFC_snRNAseq_Nguyen_topLayer,
    modeling_results,
    model_type = "enrichment",
    top_n = 10
))
#>        WM                Layer6              Layer5         
#>  Min.   :-0.419078   Min.   :-0.245585   Min.   :-0.309621  
#>  1st Qu.:-0.223879   1st Qu.:-0.148746   1st Qu.:-0.177987  
#>  Median :-0.104689   Median :-0.034822   Median : 0.049559  
#>  Mean   :-0.003598   Mean   :-0.008681   Mean   :-0.004698  
#>  3rd Qu.: 0.040547   3rd Qu.: 0.036607   3rd Qu.: 0.145510  
#>  Max.   : 0.733922   Max.   : 0.586829   Max.   : 0.393224  
#>      Layer4              Layer3               Layer2         
#>  Min.   :-0.333112   Min.   :-0.3983700   Min.   :-0.206511  
#>  1st Qu.:-0.119284   1st Qu.:-0.1264219   1st Qu.:-0.101970  
#>  Median :-0.004004   Median :-0.0000972   Median :-0.017683  
#>  Mean   : 0.007484   Mean   : 0.0055855   Mean   : 0.009139  
#>  3rd Qu.: 0.116301   3rd Qu.: 0.1143531   3rd Qu.: 0.092592  
#>  Max.   : 0.458987   Max.   : 0.6718714   Max.   : 0.472751  
#>      Layer1         
#>  Min.   :-0.263452  
#>  1st Qu.:-0.149696  
#>  Median :-0.064278  
#>  Mean   :-0.001268  
#>  3rd Qu.: 0.032686  
#>  Max.   : 0.728782