Get Mean Ratio for Each Gene x Cell Type — get_mean

Calculate the Mean Ratio value and rank for each gene for each cell type in the sce object, to identify effective marker genes for deconvolution.

get_mean_ratio(
  sce,
  cellType_col,
  assay_name = "logcounts",
  gene_ensembl = NULL,
  gene_name = NULL
)

Arguments

sce: SummarizedExperiment-class (or any derivative class) object containing single cell/nucleus gene expression data.
cellType_col: A character(1) name of the column in the colData() of sce that denotes the cell type or group of interest.
assay_name: A character(1) specifying the name of the assay() in the sce object to use to rank expression values. Defaults to logcounts since it typically contains the normalized expression values.
gene_ensembl: A character(1) specifying the rowData(sce_pseudo) column with the ENSEMBL gene IDs. This will be used by layer_stat_cor().
gene_name: A character(1) specifying the rowData(sce_pseudo) column with the gene names (symbols).

Value

A tibble::tibble() with the MeanRatio values for each gene x cell type.

gene is the name of the gene (from rownames(sce)).
cellType.target is the cell type we're finding marker genes for.
mean.target is the mean expression of gene for cellType.target.
cellType.2nd is the second highest non-target cell type.
mean.2nd is the mean expression of gene for cellType.2nd.
MeanRatio is the ratio of mean.target/mean.2nd.
MeanRatio.rank is the rank of MeanRatio for the cell type.
MeanRatio.anno is an annotation of the MeanRatio calculation helpful for plotting.
gene_ensembl & gene_name optional columns from rowData(sce) specified by the user to add gene information.

Details

Note if a cell type has < 10 cells the MeanRatio results may be unstable. See rational in OSCA: https://bioconductor.org/books/3.19/OSCA.multisample/multi-sample-comparisons.html#performing-the-de-analysis.

Examples

## load example SingleCellExperiment
if (!exists("sce_DLPFC_example")) sce_DLPFC_example <- fetch_deconvo_data("sce_DLPFC_example")
#> 2024-08-25 05:52:24.942148 loading file /github/home/.cache/R/BiocFileCache/29d2f32da4a_sce_DLPFC_example.Rdata%3Frlkey%3Dv3z4u8ru0d2y12zgdl1az07q9%26st%3D1dcfqc1i%26dl%3D1
## Explore properties of the sce object
sce_DLPFC_example
#> class: SingleCellExperiment 
#> dim: 557 10000 
#> metadata(3): Samples cell_type_colors cell_type_colors_broad
#> assays(1): logcounts
#> rownames(557): GABRD PRDM16 ... AFF2 MAMLD1
#> rowData names(7): source type ... gene_type binomial_deviance
#> colnames(10000): 8_AGTGACTGTAGTTACC-1 17_GCAGCCAGTGAGTCAG-1 ...
#>   12_GGACGTCTCTGACAGT-1 1_GGTTAACTCTCTCTAA-1
#> colData names(32): Sample Barcode ... cellType_layer layer_annotation
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):

## this data contains logcounts of gene expression
SummarizedExperiment::assays(sce_DLPFC_example)$logcounts[1:5, 1:5]
#>           8_AGTGACTGTAGTTACC-1 17_GCAGCCAGTGAGTCAG-1 3_CTGGACGAGCTTCATG-1
#> GABRD                        0             0.9249246             0.000000
#> PRDM16                       0             0.0000000             0.000000
#> MICOS10                      0             0.0000000             0.000000
#> LINC01141                    0             0.0000000             0.000000
#> ADGRB2                       0             0.9249246             2.253612
#>           13_CCCTCAAAGTCTAGCT-1 11_TGTAAGCCATTCTGTT-1
#> GABRD                  0.000000             0.0000000
#> PRDM16                 0.000000             0.0000000
#> MICOS10                0.000000             0.6528615
#> LINC01141              0.000000             0.0000000
#> ADGRB2                 2.253454             0.0000000

## nuclei are classified in to cell types
table(sce_DLPFC_example$cellType_broad_hc)
#> 
#>     Astro EndoMural     Micro     Oligo       OPC     Excit     Inhib 
#>       692       417       316      1970       350      4335      1920 

## Get the mean ratio for each gene for each cell type defined in
## `cellType_broad_hc`
get_mean_ratio(sce_DLPFC_example, cellType_col = "cellType_broad_hc")
#> # A tibble: 762 × 8
#>    gene       cellType.target mean.target cellType.2nd mean.2nd MeanRatio
#>    <chr>      <fct>                 <dbl> <fct>           <dbl>     <dbl>
#>  1 CD22       Oligo                  1.36 OPC            0.0730      18.6
#>  2 LINC01608  Oligo                  2.39 Micro          0.142       16.8
#>  3 FOLH1      Oligo                  1.59 OPC            0.101       15.7
#>  4 SLC5A11    Oligo                  2.14 Micro          0.145       14.7
#>  5 AC012494.1 Oligo                  2.42 OPC            0.169       14.3
#>  6 ST18       Oligo                  4.65 OPC            0.329       14.1
#>  7 MAG        Oligo                  1.44 Astro          0.103       14.0
#>  8 ANLN       Oligo                  1.60 Micro          0.115       13.9
#>  9 CLDN11     Oligo                  1.82 EndoMural      0.146       12.5
#> 10 MOG        Oligo                  2.06 OPC            0.185       11.1
#> # ℹ 752 more rows
#> # ℹ 2 more variables: MeanRatio.rank <int>, MeanRatio.anno <chr>

# Option to specify gene_name as the "Symbol" column from rowData
# this will be added to the marker stats output
SummarizedExperiment::rowData(sce_DLPFC_example)
#> DataFrame with 557 rows and 7 columns
#>             source     type         gene_id gene_version   gene_name
#>           <factor> <factor>     <character>  <character> <character>
#> GABRD       HAVANA     gene ENSG00000187730            9       GABRD
#> PRDM16      HAVANA     gene ENSG00000142611           17      PRDM16
#> MICOS10     HAVANA     gene ENSG00000173436           15     MICOS10
#> LINC01141   HAVANA     gene ENSG00000236963            7   LINC01141
#> ADGRB2      HAVANA     gene ENSG00000121753           12      ADGRB2
#> ...            ...      ...             ...          ...         ...
#> TRPC5       HAVANA     gene ENSG00000072315            3       TRPC5
#> LAMP2       HAVANA     gene ENSG00000005893           15       LAMP2
#> RTL8C       HAVANA     gene ENSG00000134590           14       RTL8C
#> AFF2        HAVANA     gene ENSG00000155966           14        AFF2
#> MAMLD1      HAVANA     gene ENSG00000013619           14      MAMLD1
#>                gene_type binomial_deviance
#>              <character>         <numeric>
#> GABRD     protein_coding           69168.8
#> PRDM16    protein_coding           81602.5
#> MICOS10   protein_coding           96788.7
#> LINC01141         lncRNA           35228.6
#> ADGRB2    protein_coding           81087.8
#> ...                  ...               ...
#> TRPC5     protein_coding          134934.0
#> LAMP2     protein_coding          132756.1
#> RTL8C     protein_coding           98554.5
#> AFF2      protein_coding          111683.6
#> MAMLD1    protein_coding           75492.1

## specify rowData col names for gene_name and gene_ensembl
get_mean_ratio(sce_DLPFC_example,
    cellType_col = "cellType_broad_hc",
    gene_name = "gene_name",
    gene_ensembl = "gene_id"
)
#> # A tibble: 762 × 10
#>    gene       cellType.target mean.target cellType.2nd mean.2nd MeanRatio
#>    <chr>      <fct>                 <dbl> <fct>           <dbl>     <dbl>
#>  1 CD22       Oligo                  1.36 OPC            0.0730      18.6
#>  2 LINC01608  Oligo                  2.39 Micro          0.142       16.8
#>  3 FOLH1      Oligo                  1.59 OPC            0.101       15.7
#>  4 SLC5A11    Oligo                  2.14 Micro          0.145       14.7
#>  5 AC012494.1 Oligo                  2.42 OPC            0.169       14.3
#>  6 ST18       Oligo                  4.65 OPC            0.329       14.1
#>  7 MAG        Oligo                  1.44 Astro          0.103       14.0
#>  8 ANLN       Oligo                  1.60 Micro          0.115       13.9
#>  9 CLDN11     Oligo                  1.82 EndoMural      0.146       12.5
#> 10 MOG        Oligo                  2.06 OPC            0.185       11.1
#> # ℹ 752 more rows
#> # ℹ 4 more variables: MeanRatio.rank <int>, MeanRatio.anno <chr>,
#> #   gene_ensembl <chr>, gene_name <chr>