This function is provided for convenience. It runs all the functions
required for computing the modeling_results
. This can be useful for
finding marker genes on a new spatially-resolved transcriptomics dataset
and thus using it for run_app()
. The results from this function can also be
used for performing spatial registration through layer_stat_cor()
and
related functions of sc/snRNA-seq datasets.
registration_wrapper(
sce,
var_registration,
var_sample_id,
covars = NULL,
gene_ensembl = NULL,
gene_name = NULL,
suffix = "",
min_ncells = 10,
pseudobulk_rds_file = NULL
)
A SingleCellExperiment-class object or one that inherits its properties.
A character(1)
specifying the colData(sce)
variable of interest against which will be used for computing the relevant
statistics. This should be a categorical variable, with all categories
syntaticly valid (could be used as an R variable, no special characters or
leading numbers), ex. 'L1.2', 'celltype2' not 'L1/2' or '2'.
A character(1)
specifying the colData(sce)
variable
with the sample ID.
A character()
with names of sample-level covariates.
A character(1)
specifying the rowData(sce_pseudo)
column with the ENSEMBL gene IDs. This will be used by layer_stat_cor()
.
A character(1)
specifying the rowData(sce_pseudo)
column with the gene names (symbols).
A character(1)
specifying the suffix to use for the
F-statistics column. This is particularly useful if you will run this
function more than once and want to be able to merge the results.
An integer(1)
greater than 0 specifying the minimum
number of cells (for scRNA-seq) or spots (for spatial) that are combined
when pseudo-bulking. Pseudo-bulked samples with less than min_ncells
on
sce_pseudo$ncells
will be dropped.
A character(1)
specifying the path for saving
an RDS file with the pseudo-bulked object. It's useful to specify this since
pseudo-bulking can take hours to run on large datasets.
A list()
of data.frame()
with the statistical results. This is
similar to fetch_data("modeling_results")
.
We chose a default of min_ncells = 10
based on OSCA from section 4.3
at
http://bioconductor.org/books/3.15/OSCA.multisample/multi-sample-comparisons.html.
They cite https://doi.org/10.1038/s41467-020-19894-4 as the paper where
they came up with the definition of "very low" being 10. You might want
to use registration_pseudobulk()
and manually explore sce_pseudo$ncells
to choose the best cutoff.
Other spatial registration and statistical modeling functions:
registration_block_cor()
,
registration_model()
,
registration_pseudobulk()
,
registration_stats_anova()
,
registration_stats_enrichment()
,
registration_stats_pairwise()
## Ensure reproducibility of example data
set.seed(20220907)
## Generate example data
sce <- scuttle::mockSCE()
## Add some sample IDs
sce$sample_id <- sample(LETTERS[1:5], ncol(sce), replace = TRUE)
## Add a sample-level covariate: age
ages <- rnorm(5, mean = 20, sd = 4)
names(ages) <- LETTERS[1:5]
sce$age <- ages[sce$sample_id]
## Add gene-level information
rowData(sce)$ensembl <- paste0("ENSG", seq_len(nrow(sce)))
rowData(sce)$gene_name <- paste0("gene", seq_len(nrow(sce)))
## Compute all modeling results
example_modeling_results <- registration_wrapper(
sce,
var_registration = "Cell_Cycle",
var_sample_id = "sample_id",
covars = c("age"),
gene_ensembl = "ensembl",
gene_name = "gene_name",
suffix = "wrapper"
)
#> 2024-12-16 21:52:11.719185 make pseudobulk object
#> 2024-12-16 21:52:11.814341 dropping 9 pseudo-bulked samples that are below 'min_ncells'.
#> 2024-12-16 21:52:11.827254 drop lowly expressed genes
#> 2024-12-16 21:52:11.872247 normalize expression
#> 2024-12-16 21:52:11.929133 create model matrix
#> 2024-12-16 21:52:11.939552 run duplicateCorrelation()
#> 2024-12-16 21:52:14.318972 The estimated correlation is: -0.0783081238514531
#> 2024-12-16 21:52:14.321565 computing enrichment statistics
#> 2024-12-16 21:52:14.436945 extract and reformat enrichment results
#> 2024-12-16 21:52:14.462522 running the baseline pairwise model
#> 2024-12-16 21:52:14.480766 computing pairwise statistics
#> 2024-12-16 21:52:14.55617 computing F-statistics
## Explore the results from registration_wrapper()
class(example_modeling_results)
#> [1] "list"
length(example_modeling_results)
#> [1] 3
names(example_modeling_results)
#> [1] "anova" "enrichment" "pairwise"
lapply(example_modeling_results, head)
#> $anova
#> NULL
#>
#> $enrichment
#> t_stat_G0 t_stat_G1 t_stat_G2M t_stat_S p_value_G0 p_value_G1
#> ENSG1 -0.1547374 1.5323608 -0.2348981 -1.29289250 0.88011484 0.15652594
#> ENSG2 0.5630611 1.4947773 -0.5105565 -1.76319136 0.58582739 0.16593179
#> ENSG3 0.4228896 -0.8317725 -0.4890891 1.03785001 0.68134877 0.42500953
#> ENSG4 -0.8090376 0.8315512 1.5976031 -1.75950259 0.43736743 0.42512872
#> ENSG5 -2.0758087 0.2188378 0.2130925 1.15619668 0.06472348 0.83119090
#> ENSG6 0.3913959 -1.8396065 1.6689318 0.07483167 0.70373612 0.09574789
#> p_value_G2M p_value_S fdr_G0 fdr_G1 fdr_G2M fdr_S logFC_G0
#> ENSG1 0.8190364 0.2252116 0.9852255 0.9966809 0.9953748 0.9681139 -0.1236594
#> ENSG2 0.6207620 0.1084407 0.9612662 0.9966809 0.9899698 0.9392089 0.5488682
#> ENSG3 0.6353470 0.3238615 0.9648540 0.9966809 0.9921856 0.9681139 0.1430331
#> ENSG4 0.1412712 0.1090902 0.9612662 0.9966809 0.9399839 0.9392089 -0.9237743
#> ENSG5 0.8355440 0.2745611 0.8631058 0.9966809 0.9953748 0.9681139 -2.1724966
#> ENSG6 0.1261463 0.9418292 0.9648540 0.9966809 0.9399839 0.9906646 0.3550924
#> logFC_G1 logFC_G2M logFC_S ensembl gene
#> ENSG1 0.8654897 -0.1777803 -0.84406114 ENSG1 gene1
#> ENSG2 1.0494805 -0.4733543 -1.34394213 ENSG2 gene2
#> ENSG3 -0.2158331 -0.1565191 0.29666354 ENSG3 gene3
#> ENSG4 0.7435077 1.5938565 -1.59726925 ENSG4 gene4
#> ENSG5 0.2144746 0.2525529 1.19887499 ENSG5 gene5
#> ENSG6 -1.1400513 1.2795838 0.06030075 ENSG6 gene6
#>
#> $pairwise
#> t_stat_G0-G1 t_stat_G0-G2M t_stat_G0-S t_stat_G1-G2M t_stat_G1-S
#> ENSG1 -0.6671464 0.2089510 0.6599089 0.91003998 1.5430022
#> ENSG2 -0.1147743 0.9762769 1.5297496 1.24964041 2.0208873
#> ENSG3 0.5773407 0.4415661 -0.2753587 -0.06404534 -0.9687469
#> ENSG4 -0.9551094 -1.4267069 0.3614961 -0.70335585 1.4860622
#> ENSG5 -1.6015394 -1.5242463 -2.1000859 -0.17030985 -0.8625983
#> ENSG6 1.1007274 -0.8249394 0.1399109 -2.05967232 -1.0228113
#> t_stat_G2M-S p_value_G0-G1 p_value_G0-G2M p_value_G0-S p_value_G1-G2M
#> ENSG1 0.4932384 0.5234599 0.8397112 0.52785817 0.38940816
#> ENSG2 0.5901220 0.9114537 0.3575335 0.16462386 0.24676596
#> ENSG3 -0.8005264 0.5795972 0.6705001 0.79002220 0.95050622
#> ENSG4 2.0036167 0.3675007 0.1915203 0.72709410 0.50179611
#> ENSG5 -0.6032018 0.1479422 0.1659702 0.06894394 0.86899666
#> ENSG6 1.0822474 0.3030434 0.4333238 0.89219028 0.07341321
#> p_value_G1-S p_value_G2M-S fdr_G0-G1 fdr_G0-G2M fdr_G0-S fdr_G1-G2M
#> ENSG1 0.16142199 0.6351035 0.9984915 0.9764704 0.9742266 0.9854617
#> ENSG2 0.07796735 0.5714040 0.9984915 0.9531915 0.9683756 0.9854617
#> ENSG3 0.36105562 0.4465356 0.9984915 0.9764704 0.9742266 0.9977609
#> ENSG4 0.17558652 0.0800825 0.9984915 0.8839550 0.9742266 0.9901259
#> ENSG5 0.41347911 0.5630883 0.9984915 0.8492243 0.8695062 0.9947237
#> ENSG6 0.33634080 0.3106958 0.9984915 0.9531915 0.9864940 0.9854617
#> fdr_G1-S fdr_G2M-S logFC_G0-G1 logFC_G0-G2M logFC_G0-S logFC_G1-G2M
#> ENSG1 0.9983017 0.9742209 -0.5605347 0.2040786 0.6323864 0.76461332
#> ENSG2 0.9860485 0.9695084 -0.1085720 1.0735394 1.6504823 1.18211143
#> ENSG3 0.9983017 0.9651275 0.2209283 0.1964203 -0.1201808 -0.02450793
#> ENSG4 0.9983017 0.9297596 -1.0540875 -1.8303321 0.4550345 -0.77624467
#> ENSG5 0.9983017 0.9695084 -1.9220357 -2.1264276 -2.8746043 -0.20439186
#> ENSG6 0.9983017 0.9651275 0.9692943 -0.8444414 0.1405221 -1.81373577
#> logFC_G1-S logFC_G2M-S ensembl gene
#> ENSG1 1.1929212 0.4283078 ENSG1 gene1
#> ENSG2 1.7590544 0.5769429 ENSG2 gene2
#> ENSG3 -0.3411090 -0.3166011 ENSG3 gene3
#> ENSG4 1.5091219 2.2853666 ENSG4 gene4
#> ENSG5 -0.9525685 -0.7481767 ENSG5 gene5
#> ENSG6 -0.8287723 0.9849635 ENSG6 gene6
#>