Spatial registration: wrapper function

This function is provided for convenience. It runs all the functions required for computing the modeling_results. This can be useful for finding marker genes on a new spatially-resolved transcriptomics dataset and thus using it for run_app(). The results from this function can also be used for performing spatial registration through layer_stat_cor() and related functions of sc/snRNA-seq datasets.

registration_wrapper(
  sce,
  var_registration,
  var_sample_id,
  covars = NULL,
  gene_ensembl = NULL,
  gene_name = NULL,
  suffix = "",
  min_ncells = 10,
  pseudobulk_rds_file = NULL
)

Arguments

sce: A SingleCellExperiment-class object or one that inherits its properties.
var_registration: A character(1) specifying the colData(sce) variable of interest against which will be used for computing the relevant statistics. This should be a categorical variable, with all categories syntaticly valid (could be used as an R variable, no special characters or leading numbers), ex. 'L1.2', 'celltype2' not 'L1/2' or '2'.
var_sample_id: A character(1) specifying the colData(sce) variable with the sample ID.
covars: A character() with names of sample-level covariates.
gene_ensembl: A character(1) specifying the rowData(sce_pseudo) column with the ENSEMBL gene IDs. This will be used by layer_stat_cor().
gene_name: A character(1) specifying the rowData(sce_pseudo) column with the gene names (symbols).
suffix: A character(1) specifying the suffix to use for the F-statistics column. This is particularly useful if you will run this function more than once and want to be able to merge the results.
min_ncells: An integer(1) greater than 0 specifying the minimum number of cells (for scRNA-seq) or spots (for spatial) that are combined when pseudo-bulking. Pseudo-bulked samples with less than min_ncells on sce_pseudo$ncells will be dropped.
pseudobulk_rds_file: A character(1) specifying the path for saving an RDS file with the pseudo-bulked object. It's useful to specify this since pseudo-bulking can take hours to run on large datasets.

Value

A list() of data.frame() with the statistical results. This is similar to fetch_data("modeling_results").

Details

We chose a default of min_ncells = 10 based on OSCA from section 4.3 at http://bioconductor.org/books/3.15/OSCA.multisample/multi-sample-comparisons.html. They cite https://doi.org/10.1038/s41467-020-19894-4 as the paper where they came up with the definition of "very low" being 10. You might want to use registration_pseudobulk() and manually explore sce_pseudo$ncells to choose the best cutoff.

Examples

## Ensure reproducibility of example data
set.seed(20220907)

## Generate example data
sce <- scuttle::mockSCE()

## Add some sample IDs
sce$sample_id <- sample(LETTERS[1:5], ncol(sce), replace = TRUE)

## Add a sample-level covariate: age
ages <- rnorm(5, mean = 20, sd = 4)
names(ages) <- LETTERS[1:5]
sce$age <- ages[sce$sample_id]

## Add gene-level information
rowData(sce)$ensembl <- paste0("ENSG", seq_len(nrow(sce)))
rowData(sce)$gene_name <- paste0("gene", seq_len(nrow(sce)))

## Compute all modeling results
example_modeling_results <- registration_wrapper(
    sce,
    var_registration = "Cell_Cycle",
    var_sample_id = "sample_id",
    covars = c("age"),
    gene_ensembl = "ensembl",
    gene_name = "gene_name",
    suffix = "wrapper"
)
#> 2025-05-09 16:31:20.238745 make pseudobulk object
#> 2025-05-09 16:31:20.382519 dropping 9 pseudo-bulked samples that are below 'min_ncells'.
#> 2025-05-09 16:31:20.403444 drop lowly expressed genes
#> 2025-05-09 16:31:20.451254 normalize expression
#> 2025-05-09 16:31:20.510644 create model matrix
#> 2025-05-09 16:31:20.522455 run duplicateCorrelation()
#> 2025-05-09 16:31:22.942661 The estimated correlation is: -0.0783081238514531
#> 2025-05-09 16:31:22.945319 computing enrichment statistics
#> 2025-05-09 16:31:23.068589 extract and reformat enrichment results
#> 2025-05-09 16:31:23.094926 running the baseline pairwise model
#> 2025-05-09 16:31:23.113743 computing pairwise statistics
#> 2025-05-09 16:31:23.189617 computing F-statistics

## Explore the results from registration_wrapper()
class(example_modeling_results)
#> [1] "list"
length(example_modeling_results)
#> [1] 3
names(example_modeling_results)
#> [1] "anova"      "enrichment" "pairwise"  
lapply(example_modeling_results, head)
#> $anova
#>   f_stat_wrapper p_value_wrapper fdr_wrapper AveExpr_wrapper ensembl  gene
#> 1      0.8288512       0.5142823    0.994456        5.571242   ENSG1 gene1
#> 2      1.5544879       0.2743019    0.994456        4.031342   ENSG2 gene2
#> 3      0.4152755       0.7468249    0.994456       10.658169   ENSG3 gene3
#> 4      1.7268258       0.2385473    0.994456        5.803268   ENSG4 gene4
#> 5      1.5278744       0.2803794    0.994456        2.855961   ENSG5 gene5
#> 6      1.4861428       0.2902358    0.994456        5.308988   ENSG6 gene6
#> 
#> $enrichment
#>        t_stat_G0  t_stat_G1 t_stat_G2M    t_stat_S p_value_G0 p_value_G1
#> ENSG1 -0.1547374  1.5323608 -0.2348981 -1.29289250 0.88011484 0.15652594
#> ENSG2  0.5630611  1.4947773 -0.5105565 -1.76319136 0.58582739 0.16593179
#> ENSG3  0.4228896 -0.8317725 -0.4890891  1.03785001 0.68134877 0.42500953
#> ENSG4 -0.8090376  0.8315512  1.5976031 -1.75950259 0.43736743 0.42512872
#> ENSG5 -2.0758087  0.2188378  0.2130925  1.15619668 0.06472348 0.83119090
#> ENSG6  0.3913959 -1.8396065  1.6689318  0.07483167 0.70373612 0.09574789
#>       p_value_G2M p_value_S    fdr_G0    fdr_G1   fdr_G2M     fdr_S   logFC_G0
#> ENSG1   0.8190364 0.2252116 0.9852255 0.9966809 0.9953748 0.9681139 -0.1236594
#> ENSG2   0.6207620 0.1084407 0.9612662 0.9966809 0.9899698 0.9392089  0.5488682
#> ENSG3   0.6353470 0.3238615 0.9648540 0.9966809 0.9921856 0.9681139  0.1430331
#> ENSG4   0.1412712 0.1090902 0.9612662 0.9966809 0.9399839 0.9392089 -0.9237743
#> ENSG5   0.8355440 0.2745611 0.8631058 0.9966809 0.9953748 0.9681139 -2.1724966
#> ENSG6   0.1261463 0.9418292 0.9648540 0.9966809 0.9399839 0.9906646  0.3550924
#>         logFC_G1  logFC_G2M     logFC_S ensembl  gene
#> ENSG1  0.8654897 -0.1777803 -0.84406114   ENSG1 gene1
#> ENSG2  1.0494805 -0.4733543 -1.34394213   ENSG2 gene2
#> ENSG3 -0.2158331 -0.1565191  0.29666354   ENSG3 gene3
#> ENSG4  0.7435077  1.5938565 -1.59726925   ENSG4 gene4
#> ENSG5  0.2144746  0.2525529  1.19887499   ENSG5 gene5
#> ENSG6 -1.1400513  1.2795838  0.06030075   ENSG6 gene6
#> 
#> $pairwise
#>       t_stat_G0-G1 t_stat_G0-G2M t_stat_G0-S t_stat_G1-G2M t_stat_G1-S
#> ENSG1   -0.6671464     0.2089510   0.6599089    0.91003998   1.5430022
#> ENSG2   -0.1147743     0.9762769   1.5297496    1.24964041   2.0208873
#> ENSG3    0.5773407     0.4415661  -0.2753587   -0.06404534  -0.9687469
#> ENSG4   -0.9551094    -1.4267069   0.3614961   -0.70335585   1.4860622
#> ENSG5   -1.6015394    -1.5242463  -2.1000859   -0.17030985  -0.8625983
#> ENSG6    1.1007274    -0.8249394   0.1399109   -2.05967232  -1.0228113
#>       t_stat_G2M-S p_value_G0-G1 p_value_G0-G2M p_value_G0-S p_value_G1-G2M
#> ENSG1    0.4932384     0.5234599      0.8397112   0.52785817     0.38940816
#> ENSG2    0.5901220     0.9114537      0.3575335   0.16462386     0.24676596
#> ENSG3   -0.8005264     0.5795972      0.6705001   0.79002220     0.95050622
#> ENSG4    2.0036167     0.3675007      0.1915203   0.72709410     0.50179611
#> ENSG5   -0.6032018     0.1479422      0.1659702   0.06894394     0.86899666
#> ENSG6    1.0822474     0.3030434      0.4333238   0.89219028     0.07341321
#>       p_value_G1-S p_value_G2M-S fdr_G0-G1 fdr_G0-G2M  fdr_G0-S fdr_G1-G2M
#> ENSG1   0.16142199     0.6351035 0.9984915  0.9764704 0.9742266  0.9854617
#> ENSG2   0.07796735     0.5714040 0.9984915  0.9531915 0.9683756  0.9854617
#> ENSG3   0.36105562     0.4465356 0.9984915  0.9764704 0.9742266  0.9977609
#> ENSG4   0.17558652     0.0800825 0.9984915  0.8839550 0.9742266  0.9901259
#> ENSG5   0.41347911     0.5630883 0.9984915  0.8492243 0.8695062  0.9947237
#> ENSG6   0.33634080     0.3106958 0.9984915  0.9531915 0.9864940  0.9854617
#>        fdr_G1-S fdr_G2M-S logFC_G0-G1 logFC_G0-G2M logFC_G0-S logFC_G1-G2M
#> ENSG1 0.9983017 0.9742209  -0.5605347    0.2040786  0.6323864   0.76461332
#> ENSG2 0.9860485 0.9695084  -0.1085720    1.0735394  1.6504823   1.18211143
#> ENSG3 0.9983017 0.9651275   0.2209283    0.1964203 -0.1201808  -0.02450793
#> ENSG4 0.9983017 0.9297596  -1.0540875   -1.8303321  0.4550345  -0.77624467
#> ENSG5 0.9983017 0.9695084  -1.9220357   -2.1264276 -2.8746043  -0.20439186
#> ENSG6 0.9983017 0.9651275   0.9692943   -0.8444414  0.1405221  -1.81373577
#>       logFC_G1-S logFC_G2M-S ensembl  gene
#> ENSG1  1.1929212   0.4283078   ENSG1 gene1
#> ENSG2  1.7590544   0.5769429   ENSG2 gene2
#> ENSG3 -0.3411090  -0.3166011   ENSG3 gene3
#> ENSG4  1.5091219   2.2853666   ENSG4 gene4
#> ENSG5 -0.9525685  -0.7481767   ENSG5 gene5
#> ENSG6 -0.8287723   0.9849635   ENSG6 gene6
#>

Arguments

Value

Details

See also

Examples