Deconvolution Benchmark in Human DLPFC

Introduction

What is Deconvolution?

Inferring the composition of different cell types in a bulk RNA-seq data

Deconvolution is a analysis that aims to calculate the proportion of different cell types that make up a sample of bulk RNA-seq, based off of cell type gene expression profiles in a single cell/nuclei RNA-seq dataset.

Figure 1. Overview on how to use single cell data to infer the composition of bulk RNA-seq samples

Deconvolution Methods

There are 20+ published reference based deconvolution methods. Below are a selection of 6 methods we tested in our deconvolution benchmark study.

Approach	Method	Citation	Availability
weighted least squares	DWLS	Tsoucas et al, Nature Comm, 2019	R Package CRAN
Bias correction: Assay	Bisque	Jew et al, Nature Comm, 2020	R Package GitHub
Bias correction: Source	MuSiC	Wang et al, Nature Communications, 2019	R Package GitHub
Machine Learning	CIBERSORTx	Newman et al., Nature BioTech, 2019	Webtool
Bayesian	BayesPrism	Chu et al., Nature Cancer, 2022	Webtool/R Package
linear	Hspe	Hunt et al., Ann. Appl. Stat, 2021	R package GitHub

Goals of this Vignette

We will be demonstrating how to use DeconvoBuddies tools when applying deconvolution with the Bisque package.

Install and load required packages
Download DLPFC RNA-seq data, and reference snRNA-seq data
Find marker genes with DeconvoBuddies tools
Run deconvolution with BisqueRNA
Explore deconvolution output and create composition plots with DeconvoBuddies tools
Check proportion against RNAScope estimated proportions

Video Tutorial

Linked is a video from a presentation of an earlier version of this tutorial from our LIBD Rstats club.

Basics

1. Install and load required packages

R is an open-source statistical environment which can be easily modified to enhance its functionality via packages. DeconvoBuddies is a R package available via the Bioconductor repository for packages. R can be installed on any operating system from CRAN after which you can install DeconvoBuddies by using the following commands in your R session:

Install `DeconvoBuddies`

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("DeconvoBuddies")

## Check that you have a valid Bioconductor installation
BiocManager::valid()

Load Other Packages

Let’s load the packages we’ll be using in this vignette.

## Packages for different types of RNA-seq data structures in R
library("SummarizedExperiment")
library("SingleCellExperiment")
library("Biobase")

## For downloading data
library("spatialLIBD")

## Other helper packages for this vignette
library("dplyr")
library("tidyr")
library("tibble")
library("ggplot2")

## Our main package
library("DeconvoBuddies")

2. Download DLPFC RNA-seq data, and reference snRNA-seq data.

Bulk RNA-seq data

Access the 110 sample Human DLPFC bulk RNA-seq dataset for LIBD described in more detail here. These samples
are from 19 tissue blocks, and 10 neurotypical adult donors. Samples were sequenced with two different library_types (polyA and RiboZeroGold), and three different RNA_extraction (Cyto, Total, Nuc). There are in total n=110 samples after quality control.

## use fetch deconvo data to load rse_gene
rse_gene <- fetch_deconvo_data("rse_gene")
#> 2025-07-28 19:55:13.312778 Access ExperimentHub EH9625
#> see ?DeconvoBuddies and browseVignettes('DeconvoBuddies') for documentation
#> loading from cache
rse_gene
#> class: RangedSummarizedExperiment 
#> dim: 21745 110 
#> metadata(1): SPEAQeasy_settings
#> assays(2): counts logcounts
#> rownames(21745): ENSG00000227232.5 ENSG00000278267.1 ...
#>   ENSG00000210195.2 ENSG00000210196.2
#> rowData names(11): Length gencodeID ... gencodeTx passExprsCut
#> colnames(110): 2107UNHS-0291_Br2720_Mid_Bulk
#>   2107UNHS-0291_Br2720_Mid_Cyto ... AN00000906_Br8667_Mid_Cyto
#>   AN00000906_Br8667_Mid_Nuc
#> colData names(80): SAMPLE_ID Sample ... diagnosis qc_class
# lobstr::obj_size(rse_gene)
# 41.16 MB

## Use gene "Symbol" as identifiers for the genes in rownames(rse_gene)
rownames(rse_gene) <- rowData(rse_gene)$Symbol

## bulk RNA seq samples were sequenced with different library types,
## and RNA extractions
table(rse_gene$library_type, rse_gene$library_prep)
#>               
#>                Bulk Cyto Nuc
#>   polyA          19   18  18
#>   RiboZeroGold   19   19  17

Reference snRNA-seq data

This data is paired with a single nucleus RNA-seq data set from spatialLIBD. This dataset can be accessed with spatialLIBD::fetch_data().

## Use spatialLIBD to fetch the snRNA-seq dataset used in this project
sce_path_zip <- fetch_deconvo_data("sce")
#> 2025-07-28 19:55:17.447848 loading file /github/home/.cache/R/BiocFileCache/20110218742_sce_DLPFC_annotated.zip%3Fdl%3D1

## unzip and load the data
sce_path <- unzip(sce_path_zip, exdir = tempdir())
sce <- HDF5Array::loadHDF5SummarizedExperiment(
    file.path(tempdir(), "sce_DLPFC_annotated")
)

# lobstr::obj_size(sce)
# 172.28 MB

## exclude Ambiguous cell type
sce <- sce[, sce$cellType_broad_hc != "Ambiguous"]
sce$cellType_broad_hc <- droplevels(sce$cellType_broad_hc)

## Check the number of genes by the number of nuclei that we
## have to work with:
dim(sce)
#> [1] 36601 56447

## Check the broad cell type distribution
table(sce$cellType_broad_hc)
#> 
#>     Astro EndoMural     Micro     Oligo       OPC     Excit     Inhib 
#>      3979      2157      1601     10894      1940     24809     11067

## We're going to subset to the first 5k genes to save memory
## just for this example. You wouldn't do this on a full analysis.
sce <- sce[seq_len(5000), ]

Orthogonal Cell Type Proportion from RNAScope/IF

An alternative method for calculating cell type proportions is through imaging a slice of tissue with cell type probes using single molecule fluorescent in situ hybridization (smFISH) experiments performed with RNAScope/IF (ImmunoFluorescence). Then analyse the image to count cells and annotate them by cell type. In this study we used HALO from Indica Labs for this step.

The cell type proportions from the RNAScope/IF experiment will be used to evaluate the accuracy of cell type proportion estimates.

Figure 2. RNAScope/IF measures the cell type proportions through imaging

The RNAScope/IF proportion data is stored as a data.frame object in DeconvoBuddies::RNAScope_prop.

Key columns in RNAScope_prop:

SAMPLE_ID: DLPFC Tissue block + RNAScope combination.
Sample : DLFPC Tissue block (Donor BrNum + DLPFC position).
cell_type : The cell type measured.
n_cell : the number of cells counted for the Sample and cell type.
prop : the calculated cell type proportion from n_cell

# Access the RNAScope proportion data.frame
data("RNAScope_prop")
head(RNAScope_prop)
#>        SAMPLE_ID      Sample  Combo cell_type Confidence n_cell       prop
#> 1 Br2720P_CIRCLE Br2720_post Circle     Astro       High   3385 0.09357808
#> 2 Br2720P_CIRCLE Br2720_post Circle EndoMural       High   1680 0.04644348
#> 3 Br2720P_CIRCLE Br2720_post Circle     Inhib       High   3092 0.08547812
#> 4 Br2720P_CIRCLE Br2720_post Circle     Other       High  28016 0.77450032
#> 5   Br2720P_STAR Br2720_post   Star     Excit       High   8687 0.23132639
#> 6   Br2720P_STAR Br2720_post   Star     Micro       High   1416 0.03770671
#>   n_cell_sn    prop_sn
#> 1        81 0.04287983
#> 2        66 0.03493912
#> 3        94 0.04976178
#> 4        NA         NA
#> 5      1198 0.63419799
#> 6        47 0.02488089

## plot the RNAScope compositions
plot_composition_bar(
    prop_long = RNAScope_prop,
    sample_col = "SAMPLE_ID",
    x_col = "SAMPLE_ID",
    add_text = FALSE
) +
    facet_wrap(~Combo, nrow = 2, scales = "free_x")

Above we can see the cell proportions for each of the samples in either the Circle or the Star combination of RNAScope/IF probes. Each combination of RNAScope/IF probes was able to assess different sets of cell types. Two of them had to be used to measure the number of cell types studied in this case as there are limits on how many unique probes you can measure in a given RNAScope/IF experiment. For more details about the RNAScope/IF combinations, check the paper describing this study.

3. Select Marker Genes

Marker genes are genes with high expression in one cell type and low expression in other cell types, or “cell-type specific” expression. These genes can be used to learn more about the identity and function of cell types, but here we are interested in using a sets of cell type specific marker genes to reduce noise in deconvolution and increase accuracy.

We have developed a method for finding marker genes called the “Mean Ratio”. We calculate the MeanRatio for a target cell type for each gene by dividing the mean expression of the target cell by the mean expression of the next highest non-target cell type. Genes with the highest MeanRatio values are selected as marker genes.

For a tutorial on marker gene selection check out Vignette: Finding Marker Genes with Deconvo Buddies.

Figure 3. Mean Ratio calculation process compared to 1vALL Marker Gene selection

Use `get_mean_ratio()` to find marker genes.

The function DeconvoBuddies::get_mean_ratio() calculates the MeanRatio and the rank of genes for a specified cell type annotation in an SingleCellExperiment object.

# calculate the Mean Ratio of genes for each cell type
marker_stats <- get_mean_ratio(sce,
    cellType_col = "cellType_broad_hc",
    gene_ensembl = "gene_id",
    gene_name = "gene_name"
)

# check the top gene ranked gene for each cell type
marker_stats |>
    group_by(cellType.target) |>
    slice(1)
#> # A tibble: 7 × 10
#> # Groups:   cellType.target [7]
#>   gene       cellType.target mean.target cellType.2nd mean.2nd MeanRatio
#>   <chr>      <fct>                 <dbl> <fct>           <dbl>     <dbl>
#> 1 PRDM16     Astro                  1.97 EndoMural      0.142      13.9 
#> 2 SLC2A1     EndoMural              1.49 Astro          0.146      10.2 
#> 3 LINC01141  Micro                  1.57 Excit          0.0640     24.5 
#> 4 AC012494.1 Oligo                  2.37 OPC            0.147      16.1 
#> 5 MIR3681HG  OPC                    1.58 Excit          0.205       7.69
#> 6 AC011995.2 Excit                  1.01 Inhib          0.135       7.51
#> 7 LYPD6B     Inhib                  1.20 Excit          0.0967     12.4 
#> # ℹ 4 more variables: MeanRatio.rank <int>, MeanRatio.anno <chr>,
#> #   gene_ensembl <chr>, gene_name <chr>

The columns of this table are documented in detail at DeconvoBuddies::get_mean_ratio(). Though cellType.target lists the target cell type for which the MeanRatio is being calculated. It is the numerator of the MeanRatio. The cellType.2nd lists the cell type against which the target cell type is being compared. It is the denominator of the MeanRatio.

Plot the top marker genes

Use DeconvoBuddies plotting tools to quickly plot the gene expression of the top 4 Excitatory neuron marker genes across the cellType_broad_hc cell type annotations.

# plot expression across cell type the top 4 Excit marker genes
plot_marker_express(sce,
    stats = marker_stats,
    cell_type = "Excit",
    cellType_col = "cellType_broad_hc",
    rank_col = "MeanRatio.rank",
    anno_col = "MeanRatio.anno",
    gene_col = "gene"
)

Looks nice and cell type specific!

Note how plot_marker_express() lists the 2 cell types that are being compared that result in the specific numerical MeanRatio value being displayed. As we specified that our target cell type is "Excit" in our call to plot_marker_express(), the numerator is Excit in all the panels shown above.

Create a List of Marker Genes

With the MeanRatio calculated, we will select the top 25 highest Mean Ratio genes for each cell type, that also exists in the bulk data rse_gene.

# select top 25 marker genes for each cell type, that are also in rse_gene
marker_genes <- marker_stats |>
    filter(MeanRatio.rank <= 25 & gene %in% rownames(rse_gene))

# check how many genes for each cell type (some genes are not in both datasets)
marker_genes |> count(cellType.target)
#> # A tibble: 7 × 2
#>   cellType.target     n
#>   <fct>           <int>
#> 1 Astro              22
#> 2 EndoMural          25
#> 3 Micro              24
#> 4 Oligo              23
#> 5 OPC                22
#> 6 Excit              21
#> 7 Inhib              24

# create a vector of marker genes to subset data before deconvolution
marker_genes <- marker_genes |> pull(gene)

4. Prep Data and Run Bisque

Bisque is an R package for cell type deconvolution. In our deconvolution benchmark, we found it was a top performing method. Below we will briefly show how to run Bisque’s “reference based decomposition (deconvolution)”.

Prepare data

To run Bisque the snRNA-seq and bulk data must first be converted to Biobase::ExpressionSet() format. We will subset our data to our selected MeanRatio marker genes.

The snRNA-seq data must also be filtered for cells with no counts across marker genes.

# NOT RUN - no longer running Bisque - see next section for details

if(FALSE){
## convert bulk data to Expression set, sub-setting to marker genes
## include sample ID
exp_set_bulk <- Biobase::ExpressionSet(
    assayData = assays(rse_gene[marker_genes, ])$counts,
    phenoData = AnnotatedDataFrame(
        as.data.frame(colData(rse_gene))[c("SAMPLE_ID")]
    )
)

## convert snRNA-seq data to Expression set, sub-setting to marker genes
## include cell type and donor information
exp_set_sce <- Biobase::ExpressionSet(
    assayData = as.matrix(assays(sce[marker_genes, ])$counts),
    phenoData = AnnotatedDataFrame(
        as.data.frame(colData(sce))[, c("cellType_broad_hc", "BrNum")]
    )
)

## check for nuclei with 0 marker expression
zero_cell_filter <- colSums(exprs(exp_set_sce)) != 0
message("Exclude ", sum(!zero_cell_filter), " cells")

exp_set_sce <- exp_set_sce[, zero_cell_filter]
}

Run Bisque

Bisque needs the bulk and single cell ExpressionSet we prepared above, plus columns in the single cell data that specify the cell type annotation to use cellType_broad_hc and donor id (BrNum in this data).

NOTE: Bisque is no longer available on CRAN (as of July 2025) so cannot be run in this vignette. We’ll use pre-computed estimates for this vignette instead. (loaded in the next section).

## NOT RUN - Bisque unavailable on CRAN

## Run Bisque with bulk and single cell ExpressionSet inputs
## For running deconvolution

# But the development version can be installed from github via:
# devtools::install_github("cozygene/bisque")

if(FALSE){
# library("BisqueRNA")

est_prop <- ReferenceBasedDecomposition(
    bulk.eset = exp_set_bulk,
    sc.eset = exp_set_sce,
    cell.types = "cellType_broad_hc",
    subject.names = "BrNum",
    use.overlap = FALSE
)

## Examine the output from Bisque, transpose to make it easier to work with
est_prop <- t(est_prop$bulk.props)

}

Explore Output

Bisque predicts the proportion of the cell types in cellType_broad_hc for each sample in the bulk data.

## Load the pre-computed estimated proportions
data("est_prop")

## sample x cell type matrix
head(est_prop)
#>                                    Astro  EndoMural      Micro     Oligo
#> 2107UNHS-0291_Br2720_Mid_Bulk 0.03620838 0.01450229 0.00822044 0.1898306
#> 2107UNHS-0291_Br2720_Mid_Cyto 0.04270700 0.01480779 0.02118528 0.1624367
#> 2107UNHS-0291_Br2720_Mid_Nuc  0.04792806 0.03613114 0.02192201 0.1584692
#> 2107UNHS-0291_Br6432_Ant_Bulk 0.05450912 0.02715661 0.01473154 0.1668393
#> 2107UNHS-0291_Br6432_Ant_Cyto 0.05722515 0.02115855 0.03127993 0.1432068
#> 2107UNHS-0291_Br6432_Ant_Nuc  0.05762778 0.03489159 0.02267823 0.1543922
#>                                      OPC     Excit     Inhib
#> 2107UNHS-0291_Br2720_Mid_Bulk 0.06093697 0.4041093 0.2861921
#> 2107UNHS-0291_Br2720_Mid_Cyto 0.02924472 0.4800840 0.2495345
#> 2107UNHS-0291_Br2720_Mid_Nuc  0.03148864 0.4617759 0.2422850
#> 2107UNHS-0291_Br6432_Ant_Bulk 0.11035566 0.3911678 0.2352399
#> 2107UNHS-0291_Br6432_Ant_Cyto 0.05399435 0.5010104 0.1921248
#> 2107UNHS-0291_Br6432_Ant_Nuc  0.08059862 0.4445150 0.2052966

5. Explore deconvolution output and create composition plots with `DeconvoBuddies` tools

To visualize the cell type proportion predictions, we can plot cell type composition bar plots with DeconvoBuddies::plot_composition_bar(), either the prediction for each sample, or the average proportion over a group of samples.

## add Phenotype data to proportion estimates
pd <- colData(rse_gene) |>
    as.data.frame() |>
    select(SAMPLE_ID, Sample, library_combo)

## make proportion estimates long so they are ggplot2 friendly
prop_long <- est_prop |>
    as.data.frame() |>
    tibble::rownames_to_column("SAMPLE_ID") |>
    tidyr::pivot_longer(!SAMPLE_ID, names_to = "cell_type", values_to = "prop") |>
    left_join(pd)
#> Joining with `by = join_by(SAMPLE_ID)`

## create composition bar plots

## for all library preparations by sample n=110
## Remove the SAMPLE_ID names since they are very long using ggplot2::theme()
plot_composition_bar(
    prop_long = prop_long,
    sample_col = "SAMPLE_ID",
    x_col = "SAMPLE_ID",
    add_text = FALSE
) +
    theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())


## Average by brain donor
plot_composition_bar(
    prop_long = prop_long,
    sample_col = "SAMPLE_ID",
    x_col = "Sample",
    add_text = FALSE
)


## Each brain donor has up to 6 unique RNA library type and RNA extraction
## combinations
table(prop_long$Sample) / length(unique(prop_long$cell_type))
#> 
#>  Br2720_mid Br2720_post  Br2743_ant  Br3942_ant  Br3942_mid  Br6423_ant 
#>           5           6           5           6           6           6 
#> Br6423_post  Br6432_ant  Br6432_mid  Br6471_ant  Br6471_mid  Br6522_mid 
#>           6           6           6           6           6           6 
#> Br6522_post  Br8325_ant  Br8325_mid  Br8492_mid Br8492_post  Br8667_ant 
#>           6           6           5           5           6           6 
#>  Br8667_mid 
#>           6

## Here are the 6 "SAMPLE_ID" values for brain donor with ID "Br8667_mid"
unique(prop_long$SAMPLE_ID[prop_long$Sample == "Br8667_mid"])
#> [1] "AN00000904_Br8667_Mid_Bulk" "AN00000904_Br8667_Mid_Cyto"
#> [3] "AN00000904_Br8667_Mid_Nuc"  "AN00000906_Br8667_Mid_Bulk"
#> [5] "AN00000906_Br8667_Mid_Cyto" "AN00000906_Br8667_Mid_Nuc"

This is a more complex scenario than the one from the introductory vignette where we were using random data. The first plot shows each of the n=110 bulk RNA-seq samples we have. The second plot shows the composition using the average across the 6 RNA library types and RNA extractions for each brain donor.

6. Check proportion against RNAScope/IF estimated proportions

Note to compare the deconvolution results to the RNAScope/IF proportions, Oligo and OPC need to be added together.

## Combine Oligo and OPC into OligoOPC
prop_long_opc <- prop_long |>
    mutate(cell_type = gsub("Oligo|OPC", "OligoOPC", cell_type)) |>
    group_by(SAMPLE_ID, Sample, library_combo, cell_type) |>
    summarize(prop = sum(prop)) |>
    ungroup()
#> `summarise()` has grouped output by 'SAMPLE_ID', 'Sample', 'library_combo'. You
#> can override using the `.groups` argument.

prop_long_opc |> count(cell_type)
#> # A tibble: 6 × 2
#>   cell_type     n
#>   <chr>     <int>
#> 1 Astro       110
#> 2 EndoMural   110
#> 3 Excit       110
#> 4 Inhib       110
#> 5 Micro       110
#> 6 OligoOPC    110

## Join RNAScope/IF and Bisque cell type proportions
prop_compare <- prop_long_opc |>
    inner_join(
        RNAScope_prop |>
            select(Sample, cell_type, prop_RNAScope = prop, prop_sn),
        by = c("Sample", "cell_type")
    )

We can now calculate the correlation plot a scatter plot of the proportions. Note that you can change the type of correlation algorithm used in the cor() function. The default method is "pearson".

## compute correlation with RNAScope/IF proportions
cor(prop_compare$prop, prop_compare$prop_RNAScope)
#> [1] 0.5339794

## Scatter plot with RNAScope/IF proportions
prop_compare |>
    ggplot(aes(x = prop_RNAScope, y = prop, color = cell_type, shape = library_combo)) +
    geom_point() +
    geom_abline()



## correlation with snRNA-seq proportion
cor(prop_compare$prop, prop_compare$prop_sn)
#> [1] 0.8212221

## Scatter plot with RNAScope/IF proportions
prop_compare |>
    ggplot(aes(x = prop_sn, y = prop, color = cell_type, shape = library_combo)) +
    geom_point() +
    geom_abline()

In the first plot we can see how Bisque overestimates the proportion of excitatory neurons (Excit) when compared against RNAScope/IF as all Excit points are higher than the diagonal line shown in black.

In the second plot we can see how Bisque proportions are closer to the snRNA-seq proportions. After all, given the challenges in generating orthogonal cell type proportion data, Bisque and other deconvolution methods were developed by comparing against simulated data, sometimes using pseudo-bulk sc/snRNA-seq data.

7. How to run deconvolution with `hspe`

hspe (formerly called dtangle) is another R package for deconvolution. It was also a top performing method in our deconvolution benchmark.

hspe is downloadable from GitHub but can’t be shown on this vignette as Bioconductor packages cannot use packages from GitHub.

Below we show some example code to prepare input data and run hspe (not run here):

if (FALSE) {
    ## Install hspe
    # if (!requireNamespace("hspe", quietly = TRUE)) {
    #     ## Install version 0.1 which is the one listed on the main documentation
    #     ## at https://github.com/gjhunt/hspe/tree/main?tab=readme-ov-file#software
    #     remotes::install_url("https://github.com/gjhunt/hspe/raw/main/hspe_0.1.tar.gz")
    #     ## Alternatively, install from the latest version on GitHub with:
    #     # remotes::install_github("gjhunt/hspe", subdir = "lib_hspe")
    #
    #     ## As of 2024-08-23, it's been 3 years since files were last modified
    #     ## at https://github.com/gjhunt/hspe/tree/main/lib_hspe.
    # }
    # library("hspe")

    # pseudobulk the sce data by sample + cell type
    sce_pb <- spatialLIBD::registration_pseudobulk(sce,
        var_registration = "cellType_broad_hc",
        var_sample_id = "Sample"
    )

    ## extract the gene expression from the bulk rse_gene
    mixture_samples <- t(assays(rse_gene)$logcounts)
    mixture_samples[1:5, 1:5]

    ## create a vector of indexes of the different cell types
    pure_samples <- rafalib::splitit(sce_pb$cellType_broad_hc)

    ## extract the the pseudobulked logcounts
    reference_samples <- t(assays(sce_pb)$logcounts)
    reference_samples[1:5, 1:5]

    ## check the number of genes match in the bulk (mixture) and single cell (reference)
    ncol(mixture_samples) == ncol(reference_samples)

    ## run hspe
    est_prop_hspe <- hspe(
        Y = mixture_samples,
        reference = reference_samples,
        pure_samples = pure_samples,
        markers = marker_genes,
        seed = 10524
    )
}

Conclusion

In this vignette we have demonstrated some of the functions and data in the DeconvoBuddies package, and how to use them in a deconvolution workflow to predict cell type proportions. We used real data from a study on human brain.

Reproducibility

The DeconvoBuddies package (Huuki-Myers, Maynard, Hicks, Zandi, Kleinman, Hyde, Goes, and Collado-Torres, 2025) was made possible thanks to:

R (R Core Team, 2025)
BiocStyle (Oleś, 2025)
knitr (Xie, 2025)
RefManageR (McLean, 2017)
rmarkdown (Allaire, Xie, Dervieux, McPherson, Luraschi, Ushey, Atkins, Wickham, Cheng, Chang, and Iannone, 2024)
sessioninfo (Wickham, Chang, Flight, Müller, and Hester, 2025)
testthat (Wickham, 2011)

This package was developed using biocthis.

R session information.

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.5.1 (2025-06-13)
#>  os       Ubuntu 24.04.2 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language en
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       UTC
#>  date     2025-07-28
#>  pandoc   3.7.0.2 @ /usr/bin/ (via rmarkdown)
#>  quarto   1.6.42 @ /usr/local/bin/quarto
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  package              * version   date (UTC) lib source
#>  abind                  1.4-8     2024-09-12 [1] RSPM (R 4.5.0)
#>  AnnotationDbi          1.70.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  AnnotationHub          3.16.1    2025-07-23 [1] Bioconductor 3.21 (R 4.5.1)
#>  attempt                0.3.1     2020-05-03 [1] RSPM (R 4.5.0)
#>  backports              1.5.0     2024-05-23 [1] RSPM (R 4.5.0)
#>  beachmat               2.24.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  beeswarm               0.4.0     2021-06-01 [1] RSPM (R 4.5.0)
#>  benchmarkme            1.0.8     2022-06-12 [1] RSPM (R 4.5.0)
#>  benchmarkmeData        1.0.4     2020-04-23 [1] RSPM (R 4.5.0)
#>  bibtex                 0.5.1     2023-01-26 [1] RSPM (R 4.5.0)
#>  Biobase              * 2.68.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  BiocFileCache          2.16.1    2025-07-23 [1] Bioconductor 3.21 (R 4.5.1)
#>  BiocGenerics         * 0.54.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  BiocIO                 1.18.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  BiocManager            1.30.26   2025-06-05 [2] CRAN (R 4.5.1)
#>  BiocNeighbors          2.2.0     2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  BiocParallel           1.42.1    2025-06-01 [1] Bioconductor 3.21 (R 4.5.1)
#>  BiocSingular           1.24.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  BiocStyle            * 2.36.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  BiocVersion            3.21.1    2024-10-29 [2] Bioconductor 3.21 (R 4.5.1)
#>  Biostrings             2.76.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  bit                    4.6.0     2025-03-06 [1] RSPM (R 4.5.0)
#>  bit64                  4.6.0-1   2025-01-16 [1] RSPM (R 4.5.0)
#>  bitops                 1.0-9     2024-10-03 [1] RSPM (R 4.5.0)
#>  blob                   1.2.4     2023-03-17 [1] RSPM (R 4.5.0)
#>  bluster                1.18.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  bookdown               0.43      2025-04-15 [1] RSPM (R 4.5.0)
#>  bslib                  0.9.0     2025-01-30 [2] RSPM (R 4.5.0)
#>  cachem                 1.1.0     2024-05-16 [2] RSPM (R 4.5.0)
#>  circlize               0.4.16    2024-02-20 [1] RSPM (R 4.5.0)
#>  cli                    3.6.5     2025-04-23 [2] RSPM (R 4.5.0)
#>  clue                   0.3-66    2024-11-13 [1] RSPM (R 4.5.0)
#>  cluster                2.1.8.1   2025-03-12 [3] CRAN (R 4.5.1)
#>  codetools              0.2-20    2024-03-31 [3] CRAN (R 4.5.1)
#>  colorspace             2.1-1     2024-07-26 [1] RSPM (R 4.5.0)
#>  ComplexHeatmap         2.24.1    2025-06-25 [1] Bioconductor 3.21 (R 4.5.1)
#>  config                 0.3.2     2023-08-30 [1] RSPM (R 4.5.0)
#>  cowplot                1.2.0     2025-07-07 [1] RSPM (R 4.5.0)
#>  crayon                 1.5.3     2024-06-20 [2] RSPM (R 4.5.0)
#>  curl                   6.4.0     2025-06-22 [2] RSPM (R 4.5.0)
#>  data.table             1.17.8    2025-07-10 [1] RSPM (R 4.5.0)
#>  DBI                    1.2.3     2024-06-02 [1] RSPM (R 4.5.0)
#>  dbplyr                 2.5.0     2024-03-19 [1] RSPM (R 4.5.0)
#>  DeconvoBuddies       * 1.1.5     2025-07-28 [1] Bioconductor
#>  DelayedArray           0.34.1    2025-04-17 [1] Bioconductor 3.21 (R 4.5.0)
#>  DelayedMatrixStats     1.30.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  desc                   1.4.3     2023-12-10 [2] RSPM (R 4.5.0)
#>  digest                 0.6.37    2024-08-19 [2] RSPM (R 4.5.0)
#>  doParallel             1.0.17    2022-02-07 [1] RSPM (R 4.5.0)
#>  dplyr                * 1.1.4     2023-11-17 [1] RSPM (R 4.5.0)
#>  dqrng                  0.4.1     2024-05-28 [1] RSPM (R 4.5.0)
#>  DT                     0.33      2024-04-04 [1] RSPM (R 4.5.0)
#>  edgeR                  4.6.3     2025-07-09 [1] Bioconductor 3.21 (R 4.5.1)
#>  evaluate               1.0.4     2025-06-18 [2] RSPM (R 4.5.0)
#>  ExperimentHub          2.16.1    2025-07-23 [1] Bioconductor 3.21 (R 4.5.1)
#>  farver                 2.1.2     2024-05-13 [1] RSPM (R 4.5.0)
#>  fastmap                1.2.0     2024-05-15 [2] RSPM (R 4.5.0)
#>  filelock               1.0.3     2023-12-11 [1] RSPM (R 4.5.0)
#>  foreach                1.5.2     2022-02-02 [1] RSPM (R 4.5.0)
#>  fs                     1.6.6     2025-04-12 [2] RSPM (R 4.5.0)
#>  generics             * 0.1.4     2025-05-09 [1] RSPM (R 4.5.0)
#>  GenomeInfoDb         * 1.44.1    2025-07-23 [1] Bioconductor 3.21 (R 4.5.1)
#>  GenomeInfoDbData       1.2.14    2025-05-24 [1] Bioconductor
#>  GenomicAlignments      1.44.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  GenomicRanges        * 1.60.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  GetoptLong             1.0.5     2020-12-15 [1] RSPM (R 4.5.0)
#>  ggbeeswarm             0.7.2     2023-04-29 [1] RSPM (R 4.5.0)
#>  ggplot2              * 3.5.2     2025-04-09 [1] RSPM (R 4.5.0)
#>  ggrepel                0.9.6     2024-09-07 [1] RSPM (R 4.5.0)
#>  GlobalOptions          0.1.2     2020-06-10 [1] RSPM (R 4.5.0)
#>  glue                   1.8.0     2024-09-30 [2] RSPM (R 4.5.0)
#>  golem                  0.5.1     2024-08-27 [1] RSPM (R 4.5.0)
#>  gridExtra              2.3       2017-09-09 [1] RSPM (R 4.5.0)
#>  gtable                 0.3.6     2024-10-25 [1] RSPM (R 4.5.0)
#>  h5mread                1.0.1     2025-05-21 [1] Bioconductor 3.21 (R 4.5.0)
#>  HDF5Array              1.36.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  htmltools              0.5.8.1   2024-04-04 [2] RSPM (R 4.5.0)
#>  htmlwidgets            1.6.4     2023-12-06 [2] RSPM (R 4.5.0)
#>  httpuv                 1.6.16    2025-04-16 [2] RSPM (R 4.5.0)
#>  httr                   1.4.7     2023-08-15 [1] RSPM (R 4.5.0)
#>  igraph                 2.1.4     2025-01-23 [1] RSPM (R 4.5.0)
#>  IRanges              * 2.42.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  irlba                  2.3.5.1   2022-10-03 [1] RSPM (R 4.5.0)
#>  iterators              1.0.14    2022-02-05 [1] RSPM (R 4.5.0)
#>  jquerylib              0.1.4     2021-04-26 [2] RSPM (R 4.5.0)
#>  jsonlite               2.0.0     2025-03-27 [2] RSPM (R 4.5.0)
#>  KEGGREST               1.48.1    2025-06-22 [1] Bioconductor 3.21 (R 4.5.1)
#>  knitr                  1.50      2025-03-16 [2] RSPM (R 4.5.0)
#>  labeling               0.4.3     2023-08-29 [1] RSPM (R 4.5.0)
#>  later                  1.4.2     2025-04-08 [2] RSPM (R 4.5.0)
#>  lattice                0.22-7    2025-04-02 [3] CRAN (R 4.5.1)
#>  lazyeval               0.2.2     2019-03-15 [1] RSPM (R 4.5.0)
#>  lifecycle              1.0.4     2023-11-07 [2] RSPM (R 4.5.0)
#>  limma                  3.64.1    2025-05-25 [1] Bioconductor 3.21 (R 4.5.1)
#>  locfit                 1.5-9.12  2025-03-05 [1] RSPM (R 4.5.0)
#>  lubridate              1.9.4     2024-12-08 [1] RSPM (R 4.5.0)
#>  magick                 2.8.7     2025-06-06 [1] RSPM (R 4.5.0)
#>  magrittr               2.0.3     2022-03-30 [2] RSPM (R 4.5.0)
#>  Matrix                 1.7-3     2025-03-11 [3] CRAN (R 4.5.1)
#>  MatrixGenerics       * 1.20.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  matrixStats          * 1.5.0     2025-01-07 [1] RSPM (R 4.5.0)
#>  memoise                2.0.1     2021-11-26 [2] RSPM (R 4.5.0)
#>  metapod                1.16.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  mime                   0.13      2025-03-17 [2] RSPM (R 4.5.0)
#>  paletteer              1.6.0     2024-01-21 [1] RSPM (R 4.5.0)
#>  pillar                 1.11.0    2025-07-04 [2] RSPM (R 4.5.0)
#>  pkgconfig              2.0.3     2019-09-22 [2] RSPM (R 4.5.0)
#>  pkgdown                2.1.3     2025-05-25 [2] RSPM (R 4.5.0)
#>  plotly                 4.11.0    2025-06-19 [1] RSPM (R 4.5.0)
#>  plyr                   1.8.9     2023-10-02 [1] RSPM (R 4.5.0)
#>  png                    0.1-8     2022-11-29 [1] RSPM (R 4.5.0)
#>  promises               1.3.3     2025-05-29 [2] RSPM (R 4.5.0)
#>  purrr                  1.1.0     2025-07-10 [2] RSPM (R 4.5.0)
#>  R6                     2.6.1     2025-02-15 [2] RSPM (R 4.5.0)
#>  rafalib                1.0.4     2025-04-08 [1] RSPM (R 4.5.0)
#>  ragg                   1.4.0     2025-04-10 [2] RSPM (R 4.5.0)
#>  rappdirs               0.3.3     2021-01-31 [2] RSPM (R 4.5.0)
#>  RColorBrewer           1.1-3     2022-04-03 [1] RSPM (R 4.5.0)
#>  Rcpp                   1.1.0     2025-07-02 [2] RSPM (R 4.5.0)
#>  RCurl                  1.98-1.17 2025-03-22 [1] RSPM (R 4.5.0)
#>  RefManageR           * 1.4.0     2022-09-30 [1] RSPM (R 4.5.0)
#>  rematch2               2.1.2     2020-05-01 [1] RSPM (R 4.5.0)
#>  reshape2               1.4.4     2020-04-09 [1] RSPM (R 4.5.0)
#>  restfulr               0.0.16    2025-06-27 [1] RSPM (R 4.5.1)
#>  rhdf5                  2.52.1    2025-06-08 [1] Bioconductor 3.21 (R 4.5.1)
#>  rhdf5filters           1.20.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  Rhdf5lib               1.30.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  rjson                  0.2.23    2024-09-16 [1] RSPM (R 4.5.0)
#>  rlang                  1.1.6     2025-04-11 [2] RSPM (R 4.5.0)
#>  rmarkdown              2.29      2024-11-04 [2] RSPM (R 4.5.0)
#>  Rsamtools              2.24.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  RSQLite                2.4.2     2025-07-18 [1] RSPM (R 4.5.0)
#>  rsvd                   1.0.5     2021-04-16 [1] RSPM (R 4.5.0)
#>  rtracklayer            1.68.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  S4Arrays               1.8.1     2025-06-01 [1] Bioconductor 3.21 (R 4.5.1)
#>  S4Vectors            * 0.46.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  sass                   0.4.10    2025-04-11 [2] RSPM (R 4.5.0)
#>  ScaledMatrix           1.16.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  scales                 1.4.0     2025-04-24 [1] RSPM (R 4.5.0)
#>  scater                 1.36.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  scran                  1.36.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  scuttle                1.18.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  sessioninfo          * 1.2.3     2025-02-05 [2] RSPM (R 4.5.0)
#>  shape                  1.4.6.1   2024-02-23 [1] RSPM (R 4.5.0)
#>  shiny                  1.11.1    2025-07-03 [2] RSPM (R 4.5.0)
#>  shinyWidgets           0.9.0     2025-02-21 [1] RSPM (R 4.5.0)
#>  SingleCellExperiment * 1.30.1    2025-05-07 [1] Bioconductor 3.21 (R 4.5.0)
#>  SparseArray            1.8.1     2025-07-23 [1] Bioconductor 3.21 (R 4.5.1)
#>  sparseMatrixStats      1.20.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  SpatialExperiment    * 1.18.1    2025-05-11 [1] Bioconductor 3.21 (R 4.5.0)
#>  spatialLIBD          * 1.20.1    2025-05-01 [1] Bioconductor 3.21 (R 4.5.0)
#>  statmod                1.5.0     2023-01-06 [1] RSPM (R 4.5.0)
#>  stringi                1.8.7     2025-03-27 [2] RSPM (R 4.5.0)
#>  stringr                1.5.1     2023-11-14 [2] RSPM (R 4.5.0)
#>  SummarizedExperiment * 1.38.1    2025-04-30 [1] Bioconductor 3.21 (R 4.5.0)
#>  systemfonts            1.2.3     2025-04-30 [2] RSPM (R 4.5.0)
#>  textshaping            1.0.1     2025-05-01 [2] RSPM (R 4.5.0)
#>  tibble               * 3.3.0     2025-06-08 [2] RSPM (R 4.5.0)
#>  tidyr                * 1.3.1     2024-01-24 [1] RSPM (R 4.5.0)
#>  tidyselect             1.2.1     2024-03-11 [1] RSPM (R 4.5.0)
#>  timechange             0.3.0     2024-01-18 [1] RSPM (R 4.5.0)
#>  UCSC.utils             1.4.0     2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  utf8                   1.2.6     2025-06-08 [2] RSPM (R 4.5.0)
#>  vctrs                  0.6.5     2023-12-01 [2] RSPM (R 4.5.0)
#>  vipor                  0.4.7     2023-12-18 [1] RSPM (R 4.5.0)
#>  viridis                0.6.5     2024-01-29 [1] RSPM (R 4.5.0)
#>  viridisLite            0.4.2     2023-05-02 [1] RSPM (R 4.5.0)
#>  withr                  3.0.2     2024-10-28 [2] RSPM (R 4.5.0)
#>  xfun                   0.52      2025-04-02 [2] RSPM (R 4.5.0)
#>  XML                    3.99-0.18 2025-01-01 [1] RSPM (R 4.5.0)
#>  xml2                   1.3.8     2025-03-14 [2] RSPM (R 4.5.0)
#>  xtable                 1.8-4     2019-04-21 [2] RSPM (R 4.5.0)
#>  XVector                0.48.0    2025-04-15 [1] Bioconductor 3.21 (R 4.5.0)
#>  yaml                   2.3.10    2024-07-26 [2] RSPM (R 4.5.0)
#> 
#>  [1] /__w/_temp/Library
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/local/lib/R/library
#>  * ── Packages attached to the search path.
#> 
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Bibliography

This vignette was generated using BiocStyle (Oleś, 2025) with knitr (Xie, 2025) and rmarkdown (Allaire, Xie, Dervieux et al., 2024) running behind the scenes.

Citations made with RefManageR (McLean, 2017).

[1] J. Allaire, Y. Xie, C. Dervieux, et al. rmarkdown: Dynamic Documents for R. R package version 2.29. 2024. URL: https://github.com/rstudio/rmarkdown.

[2] L. A. Huuki-Myers, K. R. Maynard, S. C. Hicks, et al. DeconvoBuddies: a R/Bioconductor package with deconvolution helper functions. https://github.com/LieberInstitute/DeconvoBuddies/DeconvoBuddies - R package version 1.1.5. 2025. DOI: 10.18129/B9.bioc.DeconvoBuddies. URL: http://www.bioconductor.org/packages/DeconvoBuddies.

[3] M. W. McLean. “RefManageR: Import and Manage BibTeX and BibLaTeX References in R”. In: The Journal of Open Source Software (2017). DOI: 10.21105/joss.00338.

[4] A. Oleś. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.36.0. 2025. DOI: 10.18129/B9.bioc.BiocStyle. URL: https://bioconductor.org/packages/BiocStyle.

[5] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2025. URL: https://www.R-project.org/.

[6] H. Wickham. “testthat: Get Started with Testing”. In: The R Journal 3 (2011), pp. 5–10. URL: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.

[7] H. Wickham, W. Chang, R. Flight, et al. sessioninfo: R Session Information. R package version 1.2.3. 2025. URL: https://github.com/r-lib/sessioninfo#readme.

[8] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.50. 2025. URL: https://yihui.org/knitr/.

Louise Huuki-Myers

28 July 2025

Introduction

What is Deconvolution?

Deconvolution Methods

Goals of this Vignette

Video Tutorial

Basics

1. Install and load required packages

Install `DeconvoBuddies`

Load Other Packages

2. Download DLPFC RNA-seq data, and reference snRNA-seq data.

Bulk RNA-seq data

Reference snRNA-seq data

Orthogonal Cell Type Proportion from RNAScope/IF

3. Select Marker Genes

Use `get_mean_ratio()` to find marker genes.

Plot the top marker genes

Create a List of Marker Genes

4. Prep Data and Run Bisque

Prepare data

Run Bisque

Explore Output

5. Explore deconvolution output and create composition plots with `DeconvoBuddies` tools

6. Check proportion against RNAScope/IF estimated proportions

7. How to run deconvolution with `hspe`

Conclusion

Reproducibility

Bibliography

Deconvolution Benchmark in Human DLPFC

Louise Huuki-Myers

28 July 2025

Introduction

What is Deconvolution?

Deconvolution Methods

Goals of this Vignette

Video Tutorial

Basics

1. Install and load required packages

Install DeconvoBuddies

Load Other Packages

2. Download DLPFC RNA-seq data, and reference snRNA-seq data.

Bulk RNA-seq data

Reference snRNA-seq data

Orthogonal Cell Type Proportion from RNAScope/IF

3. Select Marker Genes

Use get_mean_ratio() to find marker genes.

Plot the top marker genes

Create a List of Marker Genes

4. Prep Data and Run Bisque

Prepare data

Run Bisque

Explore Output

5. Explore deconvolution output and create composition plots with DeconvoBuddies tools

6. Check proportion against RNAScope/IF estimated proportions

7. How to run deconvolution with hspe

Conclusion

Reproducibility

Bibliography

Install `DeconvoBuddies`

Use `get_mean_ratio()` to find marker genes.

5. Explore deconvolution output and create composition plots with `DeconvoBuddies` tools

7. How to run deconvolution with `hspe`