Calculate the Mean Ratio value and rank for each gene for each cell type in the sce object, to identify effective marker genes for deconvolution.

get_mean_ratio(
  sce,
  cellType_col,
  assay_name = "logcounts",
  gene_ensembl = NULL,
  gene_name = NULL
)

Arguments

sce

SummarizedExperiment-class (or any derivative class) object containing single cell/nucleus gene expression data.

cellType_col

A character(1) name of the column in the colData() of sce that denotes the cell type or group of interest.

assay_name

A character(1) specifying the name of the assay() in the sce object to use to rank expression values. Defaults to logcounts since it typically contains the normalized expression values.

gene_ensembl

A character(1) specifying the rowData(sce_pseudo) column with the ENSEMBL gene IDs. This will be used by layer_stat_cor().

gene_name

A character(1) specifying the rowData(sce_pseudo) column with the gene names (symbols).

Value

A tibble::tibble() with the MeanRatio values for each gene x cell type.

  • gene is the name of the gene (from rownames(sce)).

  • cellType.target is the cell type we're finding marker genes for.

  • mean.target is the mean expression of gene for cellType.target.

  • cellType.2nd is the second highest non-target cell type.

  • mean.2nd is the mean expression of gene for cellType.2nd.

  • MeanRatio is the ratio of mean.target/mean.2nd.

  • MeanRatio.rank is the rank of MeanRatio for the cell type.

  • MeanRatio.anno is an annotation of the MeanRatio calculation helpful for plotting.

  • gene_ensembl & gene_name optional columns from rowData(sce) specified by the user to add gene information.

Details

Note if a cell type has < 10 cells the MeanRatio results may be unstable. See rational in OSCA: https://bioconductor.org/books/3.19/OSCA.multisample/multi-sample-comparisons.html#performing-the-de-analysis.

Examples

## load example SingleCellExperiment
if (!exists("sce_DLPFC_example")) sce_DLPFC_example <- fetch_deconvo_data("sce_DLPFC_example")
#> 2024-08-25 05:52:24.942148 loading file /github/home/.cache/R/BiocFileCache/29d2f32da4a_sce_DLPFC_example.Rdata%3Frlkey%3Dv3z4u8ru0d2y12zgdl1az07q9%26st%3D1dcfqc1i%26dl%3D1
## Explore properties of the sce object
sce_DLPFC_example
#> class: SingleCellExperiment 
#> dim: 557 10000 
#> metadata(3): Samples cell_type_colors cell_type_colors_broad
#> assays(1): logcounts
#> rownames(557): GABRD PRDM16 ... AFF2 MAMLD1
#> rowData names(7): source type ... gene_type binomial_deviance
#> colnames(10000): 8_AGTGACTGTAGTTACC-1 17_GCAGCCAGTGAGTCAG-1 ...
#>   12_GGACGTCTCTGACAGT-1 1_GGTTAACTCTCTCTAA-1
#> colData names(32): Sample Barcode ... cellType_layer layer_annotation
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):

## this data contains logcounts of gene expression
SummarizedExperiment::assays(sce_DLPFC_example)$logcounts[1:5, 1:5]
#>           8_AGTGACTGTAGTTACC-1 17_GCAGCCAGTGAGTCAG-1 3_CTGGACGAGCTTCATG-1
#> GABRD                        0             0.9249246             0.000000
#> PRDM16                       0             0.0000000             0.000000
#> MICOS10                      0             0.0000000             0.000000
#> LINC01141                    0             0.0000000             0.000000
#> ADGRB2                       0             0.9249246             2.253612
#>           13_CCCTCAAAGTCTAGCT-1 11_TGTAAGCCATTCTGTT-1
#> GABRD                  0.000000             0.0000000
#> PRDM16                 0.000000             0.0000000
#> MICOS10                0.000000             0.6528615
#> LINC01141              0.000000             0.0000000
#> ADGRB2                 2.253454             0.0000000

## nuclei are classified in to cell types
table(sce_DLPFC_example$cellType_broad_hc)
#> 
#>     Astro EndoMural     Micro     Oligo       OPC     Excit     Inhib 
#>       692       417       316      1970       350      4335      1920 

## Get the mean ratio for each gene for each cell type defined in
## `cellType_broad_hc`
get_mean_ratio(sce_DLPFC_example, cellType_col = "cellType_broad_hc")
#> # A tibble: 762 × 8
#>    gene       cellType.target mean.target cellType.2nd mean.2nd MeanRatio
#>    <chr>      <fct>                 <dbl> <fct>           <dbl>     <dbl>
#>  1 CD22       Oligo                  1.36 OPC            0.0730      18.6
#>  2 LINC01608  Oligo                  2.39 Micro          0.142       16.8
#>  3 FOLH1      Oligo                  1.59 OPC            0.101       15.7
#>  4 SLC5A11    Oligo                  2.14 Micro          0.145       14.7
#>  5 AC012494.1 Oligo                  2.42 OPC            0.169       14.3
#>  6 ST18       Oligo                  4.65 OPC            0.329       14.1
#>  7 MAG        Oligo                  1.44 Astro          0.103       14.0
#>  8 ANLN       Oligo                  1.60 Micro          0.115       13.9
#>  9 CLDN11     Oligo                  1.82 EndoMural      0.146       12.5
#> 10 MOG        Oligo                  2.06 OPC            0.185       11.1
#> # ℹ 752 more rows
#> # ℹ 2 more variables: MeanRatio.rank <int>, MeanRatio.anno <chr>

# Option to specify gene_name as the "Symbol" column from rowData
# this will be added to the marker stats output
SummarizedExperiment::rowData(sce_DLPFC_example)
#> DataFrame with 557 rows and 7 columns
#>             source     type         gene_id gene_version   gene_name
#>           <factor> <factor>     <character>  <character> <character>
#> GABRD       HAVANA     gene ENSG00000187730            9       GABRD
#> PRDM16      HAVANA     gene ENSG00000142611           17      PRDM16
#> MICOS10     HAVANA     gene ENSG00000173436           15     MICOS10
#> LINC01141   HAVANA     gene ENSG00000236963            7   LINC01141
#> ADGRB2      HAVANA     gene ENSG00000121753           12      ADGRB2
#> ...            ...      ...             ...          ...         ...
#> TRPC5       HAVANA     gene ENSG00000072315            3       TRPC5
#> LAMP2       HAVANA     gene ENSG00000005893           15       LAMP2
#> RTL8C       HAVANA     gene ENSG00000134590           14       RTL8C
#> AFF2        HAVANA     gene ENSG00000155966           14        AFF2
#> MAMLD1      HAVANA     gene ENSG00000013619           14      MAMLD1
#>                gene_type binomial_deviance
#>              <character>         <numeric>
#> GABRD     protein_coding           69168.8
#> PRDM16    protein_coding           81602.5
#> MICOS10   protein_coding           96788.7
#> LINC01141         lncRNA           35228.6
#> ADGRB2    protein_coding           81087.8
#> ...                  ...               ...
#> TRPC5     protein_coding          134934.0
#> LAMP2     protein_coding          132756.1
#> RTL8C     protein_coding           98554.5
#> AFF2      protein_coding          111683.6
#> MAMLD1    protein_coding           75492.1

## specify rowData col names for gene_name and gene_ensembl
get_mean_ratio(sce_DLPFC_example,
    cellType_col = "cellType_broad_hc",
    gene_name = "gene_name",
    gene_ensembl = "gene_id"
)
#> # A tibble: 762 × 10
#>    gene       cellType.target mean.target cellType.2nd mean.2nd MeanRatio
#>    <chr>      <fct>                 <dbl> <fct>           <dbl>     <dbl>
#>  1 CD22       Oligo                  1.36 OPC            0.0730      18.6
#>  2 LINC01608  Oligo                  2.39 Micro          0.142       16.8
#>  3 FOLH1      Oligo                  1.59 OPC            0.101       15.7
#>  4 SLC5A11    Oligo                  2.14 Micro          0.145       14.7
#>  5 AC012494.1 Oligo                  2.42 OPC            0.169       14.3
#>  6 ST18       Oligo                  4.65 OPC            0.329       14.1
#>  7 MAG        Oligo                  1.44 Astro          0.103       14.0
#>  8 ANLN       Oligo                  1.60 Micro          0.115       13.9
#>  9 CLDN11     Oligo                  1.82 EndoMural      0.146       12.5
#> 10 MOG        Oligo                  2.06 OPC            0.185       11.1
#> # ℹ 752 more rows
#> # ℹ 4 more variables: MeanRatio.rank <int>, MeanRatio.anno <chr>,
#> #   gene_ensembl <chr>, gene_name <chr>