This function is provided for convenience. It runs all the functions
required for computing the modeling_results
. This can be useful for
finding marker genes on a new spatially-resolved transcriptomics dataset
and thus using it for run_app()
. The results from this function can also be
used for performing spatial registration through layer_stat_cor()
and
related functions of sc/snRNA-seq datasets.
registration_wrapper(
sce,
var_registration,
var_sample_id,
covars = NULL,
gene_ensembl = NULL,
gene_name = NULL,
suffix = "",
min_ncells = 10,
pseudobulk_rds_file = NULL
)
A SingleCellExperiment-class object or one that inherits its properties.
A character(1)
specifying the colData(sce)
variable of interest against which will be used for computing the relevant
statistics.
A character(1)
specifying the colData(sce)
variable
with the sample ID.
A character()
with names of sample-level covariates.
A character(1)
specifying the rowData(sce_pseudo)
column with the ENSEMBL gene IDs. This will be used by layer_stat_cor()
.
A character(1)
specifying the rowData(sce_pseudo)
column with the gene names (symbols).
A character(1)
specifying the suffix to use for the
F-statistics column. This is particularly useful if you will run this
function more than once and want to be able to merge the results.
An integer(1)
greater than 0 specifying the minimum
number of cells (for scRNA-seq) or spots (for spatial) that are combined
when pseudo-bulking. Pseudo-bulked samples with less than min_ncells
on
sce_pseudo$ncells
will be dropped.
A character(1)
specifying the path for saving
an RDS file with the pseudo-bulked object. It's useful to specify this since
pseudo-bulking can take hours to run on large datasets.
A list()
of data.frame()
with the statistical results. This is
similar to fetch_data("modeling_results")
.
We chose a default of min_ncells = 10
based on OSCA from section 4.3
at
http://bioconductor.org/books/3.15/OSCA.multisample/multi-sample-comparisons.html.
They cite https://doi.org/10.1038/s41467-020-19894-4 as the paper where
they came up with the definition of "very low" being 10. You might want
to use registration_pseudobulk()
and manually explore sce_pseudo$ncells
to choose the best cutoff.
Other spatial registration and statistical modeling functions:
registration_block_cor()
,
registration_model()
,
registration_pseudobulk()
,
registration_stats_anova()
,
registration_stats_enrichment()
,
registration_stats_pairwise()
## Ensure reproducibility of example data
set.seed(20220907)
## Generate example data
sce <- scuttle::mockSCE()
## Add some sample IDs
sce$sample_id <- sample(LETTERS[1:5], ncol(sce), replace = TRUE)
## Add a sample-level covariate: age
ages <- rnorm(5, mean = 20, sd = 4)
names(ages) <- LETTERS[1:5]
sce$age <- ages[sce$sample_id]
## Add gene-level information
rowData(sce)$ensembl <- paste0("ENSG", seq_len(nrow(sce)))
rowData(sce)$gene_name <- paste0("gene", seq_len(nrow(sce)))
## Compute all modeling results
example_modeling_results <- registration_wrapper(
sce,
"Cell_Cycle", "sample_id", c("age"), "ensembl", "gene_name", "wrapper"
)
#> 2024-07-26 23:49:13.890602 make pseudobulk object
#> 2024-07-26 23:49:14.048728 dropping 9 pseudo-bulked samples that are below 'min_ncells'.
#> 2024-07-26 23:49:14.070448 drop lowly expressed genes
#> 2024-07-26 23:49:14.125018 normalize expression
#> 2024-07-26 23:49:14.211229 create model matrix
#> 2024-07-26 23:49:14.221755 run duplicateCorrelation()
#> 2024-07-26 23:49:16.590766 The estimated correlation is: -0.0783081238514532
#> 2024-07-26 23:49:16.592969 computing enrichment statistics
#> 2024-07-26 23:49:16.720585 extract and reformat enrichment results
#> 2024-07-26 23:49:16.746398 running the baseline pairwise model
#> 2024-07-26 23:49:16.76428 computing pairwise statistics
#> 2024-07-26 23:49:16.837624 computing F-statistics