This function is provided for convenience. It runs all the functions required for computing the modeling_results. This can be useful for finding marker genes on a new spatially-resolved transcriptomics dataset and thus using it for run_app(). The results from this function can also be used for performing spatial registration through layer_stat_cor() and related functions of sc/snRNA-seq datasets.

  covars = NULL,
  gene_ensembl = NULL,
  gene_name = NULL,
  suffix = "",
  min_ncells = 10,
  pseudobulk_rds_file = NULL



A SingleCellExperiment-class object or one that inherits its properties.


A character(1) specifying the colData(sce) variable of interest against which will be used for computing the relevant statistics.


A character(1) specifying the colData(sce) variable with the sample ID.


A character() with names of sample-level covariates.


A character(1) specifying the rowData(sce_pseudo) column with the ENSEMBL gene IDs. This will be used by layer_stat_cor().


A character(1) specifying the rowData(sce_pseudo) column with the gene names (symbols).


A character(1) specifying the suffix to use for the F-statistics column. This is particularly useful if you will run this function more than once and want to be able to merge the results.


An integer(1) greater than 0 specifying the minimum number of cells (for scRNA-seq) or spots (for spatial) that are combined when pseudo-bulking. Pseudo-bulked samples with less than min_ncells on sce_pseudo$ncells will be dropped.


A character(1) specifying the path for saving an RDS file with the pseudo-bulked object. It's useful to specify this since pseudo-bulking can take hours to run on large datasets.


A list() of data.frame() with the statistical results. This is similar to fetch_data("modeling_results").


We chose a default of min_ncells = 10 based on OSCA from section 4.3 at They cite as the paper where they came up with the definition of "very low" being 10. You might want to use registration_pseudobulk() and manually explore sce_pseudo$ncells to choose the best cutoff.

See also


## Ensure reproducibility of example data

## Generate example data
sce <- scuttle::mockSCE()

## Add some sample IDs
sce$sample_id <- sample(LETTERS[1:5], ncol(sce), replace = TRUE)

## Add a sample-level covariate: age
ages <- rnorm(5, mean = 20, sd = 4)
names(ages) <- LETTERS[1:5]
sce$age <- ages[sce$sample_id]

## Add gene-level information
rowData(sce)$ensembl <- paste0("ENSG", seq_len(nrow(sce)))
rowData(sce)$gene_name <- paste0("gene", seq_len(nrow(sce)))

## Compute all modeling results
example_modeling_results <- registration_wrapper(
    "Cell_Cycle", "sample_id", c("age"), "ensembl", "gene_name", "wrapper"
#> 2024-04-09 14:21:20.851814 make pseudobulk object
#> 2024-04-09 14:21:21.030539 dropping 9 pseudo-bulked samples that are below 'min_ncells'.
#> 2024-04-09 14:21:21.050947 drop lowly expressed genes
#> 2024-04-09 14:21:21.102902 normalize expression
#> 2024-04-09 14:21:21.158332 create model matrix
#> 2024-04-09 14:21:21.168367 run duplicateCorrelation()
#> 2024-04-09 14:21:23.551028 The estimated correlation is: -0.0783081238514532
#> 2024-04-09 14:21:23.553259 computing enrichment statistics
#> 2024-04-09 14:21:23.702768 extract and reformat enrichment results
#> 2024-04-09 14:21:23.728258 running the baseline pairwise model
#> 2024-04-09 14:21:23.746003 computing pairwise statistics
#> 2024-04-09 14:21:23.818679 computing F-statistics