Layer modeling correlation of statistics
layer_stat_cor(
stats,
modeling_results = fetch_data(type = "modeling_results"),
model_type = names(modeling_results)[1],
reverse = FALSE,
top_n = NULL
)
A query data.frame
where the row names are ENSEMBL gene IDs,
the column names are labels for clusters of cells or cell types, and where
each cell contains the given statistic for that gene and cell type. These
statistics should be computed similarly to the modeling results from
the data we provide. For example, like the enrichment
t-statistics that
are derived from comparing one layer against the rest. The stats
will be
matched and then correlated with the reference statistics.
If using the output of registration_wrapper()
then use $enrichment
to
access the results from registration_stats_enrichment()
. This function will
automatically extract the statistics and assign the ENSEMBL gene IDs to the
row names of the query matrix.
Defaults to the output of
fetch_data(type = 'modeling_results')
. This is a list of tables with the
columns f_stat_*
or t_stat_*
as well as p_value_*
and fdr_*
plus
ensembl
. The column name is used to extract the statistic results, the
p-values, and the FDR adjusted p-values. Then the ensembl
column is used
for matching in some cases. See fetch_data()
for more details. Typically
this is the set of reference statistics used in layer_stat_cor()
.
A named element of the modeling_results
list. By default
that is either enrichment
for the model that tests one human brain layer
against the rest (one group vs the rest), pairwise
which compares two
layers (groups) denoted by layerA-layerB
such that layerA
is greater
than layerB
, and anova
which determines if any layer (group) is different
from the rest adjusting for the mean expression level. The statistics for
enrichment
and pairwise
are t-statistics while the anova
model ones
are F-statistics.
A logical(1)
indicating whether to multiply by -1
the
input statistics and reverse the layerA-layerB
column names (using the -
)
into layerB-layerA
.
An integer(1)
specifying whether to filter to the top n marker
genes. The default is NULL
in which case no filtering is done.
A correlation matrix between the query stats
and the reference
statistics using only the ENSEMBL gene IDs present in both tables.
The columns are sorted using hierarchical clustering.
Check https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/Layer_Guesses/dlpfc_snRNAseq_annotation.R for a full analysis from which this family of functions is derived from.
Other Layer correlation functions:
annotate_registered_clusters()
,
layer_stat_cor_plot()
## Obtain the necessary data
if (!exists("modeling_results")) {
modeling_results <- fetch_data(type = "modeling_results")
}
#> 2024-12-13 19:41:44.340269 loading file /github/home/.cache/R/BiocFileCache/5c656d46b9_Human_DLPFC_Visium_modeling_results.Rdata%3Fdl%3D1
## Compute the correlations
cor_stats_layer <- layer_stat_cor(
tstats_Human_DLPFC_snRNAseq_Nguyen_topLayer,
modeling_results,
model_type = "enrichment"
)
## Explore the correlation matrix
head(cor_stats_layer[, seq_len(3)])
#> WM Layer6 Layer5
#> 22 (3) 0.6824669 -0.009192291 -0.1934265
#> 3 (3) 0.7154122 -0.070042729 -0.2290574
#> 23 (3) 0.6637885 -0.031467704 -0.2018306
#> 17 (3) 0.6364983 -0.094216046 -0.2026147
#> 21 (3) 0.6281443 -0.050336358 -0.1988774
#> 7 (4) 0.1850724 -0.197283175 -0.2716890
summary(cor_stats_layer)
#> WM Layer6 Layer5 Layer4
#> Min. :-0.46352 Min. :-0.197283 Min. :-0.27169 Min. :-0.253477
#> 1st Qu.:-0.26653 1st Qu.:-0.071919 1st Qu.:-0.14318 1st Qu.:-0.152714
#> Median :-0.19813 Median :-0.039558 Median : 0.03858 Median : 0.003875
#> Mean :-0.02982 Mean : 0.004476 Mean : 0.01355 Mean : 0.019215
#> 3rd Qu.: 0.12482 3rd Qu.: 0.018964 3rd Qu.: 0.16692 3rd Qu.: 0.160288
#> Max. : 0.71541 Max. : 0.457031 Max. : 0.30194 Max. : 0.425598
#> Layer3 Layer2 Layer1
#> Min. :-0.36105 Min. :-0.31419 Min. :-0.29670
#> 1st Qu.:-0.10007 1st Qu.:-0.06673 1st Qu.:-0.12129
#> Median : 0.05998 Median : 0.01664 Median :-0.01590
#> Mean : 0.02022 Mean : 0.01056 Mean :-0.01750
#> 3rd Qu.: 0.13638 3rd Qu.: 0.11008 3rd Qu.: 0.03523
#> Max. : 0.56413 Max. : 0.50734 Max. : 0.63940
## Repeat with top_n set to 10
summary(layer_stat_cor(
tstats_Human_DLPFC_snRNAseq_Nguyen_topLayer,
modeling_results,
model_type = "enrichment",
top_n = 10
))
#> WM Layer6 Layer5
#> Min. :-0.419078 Min. :-0.245585 Min. :-0.309621
#> 1st Qu.:-0.223879 1st Qu.:-0.148746 1st Qu.:-0.177987
#> Median :-0.104689 Median :-0.034822 Median : 0.049559
#> Mean :-0.003598 Mean :-0.008681 Mean :-0.004698
#> 3rd Qu.: 0.040547 3rd Qu.: 0.036607 3rd Qu.: 0.145510
#> Max. : 0.733922 Max. : 0.586829 Max. : 0.393224
#> Layer4 Layer3 Layer2
#> Min. :-0.333112 Min. :-0.3983700 Min. :-0.206511
#> 1st Qu.:-0.119284 1st Qu.:-0.1264219 1st Qu.:-0.101970
#> Median :-0.004004 Median :-0.0000972 Median :-0.017683
#> Mean : 0.007484 Mean : 0.0055855 Mean : 0.009139
#> 3rd Qu.: 0.116301 3rd Qu.: 0.1143531 3rd Qu.: 0.092592
#> Max. : 0.458987 Max. : 0.6718714 Max. : 0.472751
#> Layer1
#> Min. :-0.263452
#> 1st Qu.:-0.149696
#> Median :-0.064278
#> Mean :-0.001268
#> 3rd Qu.: 0.032686
#> Max. : 0.728782