Calculate the Mean Ratio value and rank for each gene for each cell type in
the sce
object, to identify effective marker genes for deconvolution.
get_mean_ratio(
sce,
cellType_col,
assay_name = "logcounts",
gene_ensembl = NULL,
gene_name = NULL
)
SummarizedExperiment-class (or any derivative class) object containing single cell/nucleus gene expression data.
A character(1)
name of the column in the
colData() of sce
that
denotes the cell type or group of interest.
A character(1)
specifying the name of the
assay() in the
sce
object to use to rank expression values. Defaults to logcounts
since
it typically contains the normalized expression values.
A character(1)
specifying the rowData(sce_pseudo)
column with the ENSEMBL gene IDs. This will be used by layer_stat_cor()
.
A character(1)
specifying the rowData(sce_pseudo)
column with the gene names (symbols).
A tibble::tibble()
with the MeanRatio
values for each gene x cell
type.
gene
is the name of the gene (from rownames(sce
)).
cellType.target
is the cell type we're finding marker genes for.
mean.target
is the mean expression of gene
for cellType.target
.
cellType.2nd
is the second highest non-target cell type.
mean.2nd
is the mean expression of gene
for cellType.2nd
.
MeanRatio
is the ratio of mean.target/mean.2nd
.
MeanRatio.rank
is the rank of MeanRatio
for the cell type.
MeanRatio.anno
is an annotation of the MeanRatio
calculation helpful
for plotting.
gene_ensembl
& gene_name
optional columns from rowData(sce)
specified
by the user to add gene information.
Note if a cell type has < 10 cells the MeanRatio results may be unstable. See rational in OSCA: https://bioconductor.org/books/3.19/OSCA.multisample/multi-sample-comparisons.html#performing-the-de-analysis.
## load example SingleCellExperiment
if (!exists("sce_DLPFC_example")) sce_DLPFC_example <- fetch_deconvo_data("sce_DLPFC_example")
#> 2024-10-31 13:27:05.794058 loading file /github/home/.cache/R/BiocFileCache/66141272054e_sce_DLPFC_example.Rdata%3Frlkey%3Dv3z4u8ru0d2y12zgdl1az07q9%26st%3D1dcfqc1i%26dl%3D1
## Explore properties of the sce object
sce_DLPFC_example
#> class: SingleCellExperiment
#> dim: 557 10000
#> metadata(3): Samples cell_type_colors cell_type_colors_broad
#> assays(1): logcounts
#> rownames(557): GABRD PRDM16 ... AFF2 MAMLD1
#> rowData names(7): source type ... gene_type binomial_deviance
#> colnames(10000): 8_AGTGACTGTAGTTACC-1 17_GCAGCCAGTGAGTCAG-1 ...
#> 12_GGACGTCTCTGACAGT-1 1_GGTTAACTCTCTCTAA-1
#> colData names(32): Sample Barcode ... cellType_layer layer_annotation
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
## this data contains logcounts of gene expression
SummarizedExperiment::assays(sce_DLPFC_example)$logcounts[1:5, 1:5]
#> 8_AGTGACTGTAGTTACC-1 17_GCAGCCAGTGAGTCAG-1 3_CTGGACGAGCTTCATG-1
#> GABRD 0 0.9249246 0.000000
#> PRDM16 0 0.0000000 0.000000
#> MICOS10 0 0.0000000 0.000000
#> LINC01141 0 0.0000000 0.000000
#> ADGRB2 0 0.9249246 2.253612
#> 13_CCCTCAAAGTCTAGCT-1 11_TGTAAGCCATTCTGTT-1
#> GABRD 0.000000 0.0000000
#> PRDM16 0.000000 0.0000000
#> MICOS10 0.000000 0.6528615
#> LINC01141 0.000000 0.0000000
#> ADGRB2 2.253454 0.0000000
## nuclei are classified in to cell types
table(sce_DLPFC_example$cellType_broad_hc)
#>
#> Astro EndoMural Micro Oligo OPC Excit Inhib
#> 692 417 316 1970 350 4335 1920
## Get the mean ratio for each gene for each cell type defined in
## `cellType_broad_hc`
get_mean_ratio(sce_DLPFC_example, cellType_col = "cellType_broad_hc")
#> # A tibble: 762 × 8
#> gene cellType.target mean.target cellType.2nd mean.2nd MeanRatio
#> <chr> <fct> <dbl> <fct> <dbl> <dbl>
#> 1 CD22 Oligo 1.36 OPC 0.0730 18.6
#> 2 LINC01608 Oligo 2.39 Micro 0.142 16.8
#> 3 FOLH1 Oligo 1.59 OPC 0.101 15.7
#> 4 SLC5A11 Oligo 2.14 Micro 0.145 14.7
#> 5 AC012494.1 Oligo 2.42 OPC 0.169 14.3
#> 6 ST18 Oligo 4.65 OPC 0.329 14.1
#> 7 MAG Oligo 1.44 Astro 0.103 14.0
#> 8 ANLN Oligo 1.60 Micro 0.115 13.9
#> 9 CLDN11 Oligo 1.82 EndoMural 0.146 12.5
#> 10 MOG Oligo 2.06 OPC 0.185 11.1
#> # ℹ 752 more rows
#> # ℹ 2 more variables: MeanRatio.rank <int>, MeanRatio.anno <chr>
# Option to specify gene_name as the "Symbol" column from rowData
# this will be added to the marker stats output
SummarizedExperiment::rowData(sce_DLPFC_example)
#> DataFrame with 557 rows and 7 columns
#> source type gene_id gene_version gene_name
#> <factor> <factor> <character> <character> <character>
#> GABRD HAVANA gene ENSG00000187730 9 GABRD
#> PRDM16 HAVANA gene ENSG00000142611 17 PRDM16
#> MICOS10 HAVANA gene ENSG00000173436 15 MICOS10
#> LINC01141 HAVANA gene ENSG00000236963 7 LINC01141
#> ADGRB2 HAVANA gene ENSG00000121753 12 ADGRB2
#> ... ... ... ... ... ...
#> TRPC5 HAVANA gene ENSG00000072315 3 TRPC5
#> LAMP2 HAVANA gene ENSG00000005893 15 LAMP2
#> RTL8C HAVANA gene ENSG00000134590 14 RTL8C
#> AFF2 HAVANA gene ENSG00000155966 14 AFF2
#> MAMLD1 HAVANA gene ENSG00000013619 14 MAMLD1
#> gene_type binomial_deviance
#> <character> <numeric>
#> GABRD protein_coding 69168.8
#> PRDM16 protein_coding 81602.5
#> MICOS10 protein_coding 96788.7
#> LINC01141 lncRNA 35228.6
#> ADGRB2 protein_coding 81087.8
#> ... ... ...
#> TRPC5 protein_coding 134934.0
#> LAMP2 protein_coding 132756.1
#> RTL8C protein_coding 98554.5
#> AFF2 protein_coding 111683.6
#> MAMLD1 protein_coding 75492.1
## specify rowData col names for gene_name and gene_ensembl
get_mean_ratio(sce_DLPFC_example,
cellType_col = "cellType_broad_hc",
gene_name = "gene_name",
gene_ensembl = "gene_id"
)
#> # A tibble: 762 × 10
#> gene cellType.target mean.target cellType.2nd mean.2nd MeanRatio
#> <chr> <fct> <dbl> <fct> <dbl> <dbl>
#> 1 CD22 Oligo 1.36 OPC 0.0730 18.6
#> 2 LINC01608 Oligo 2.39 Micro 0.142 16.8
#> 3 FOLH1 Oligo 1.59 OPC 0.101 15.7
#> 4 SLC5A11 Oligo 2.14 Micro 0.145 14.7
#> 5 AC012494.1 Oligo 2.42 OPC 0.169 14.3
#> 6 ST18 Oligo 4.65 OPC 0.329 14.1
#> 7 MAG Oligo 1.44 Astro 0.103 14.0
#> 8 ANLN Oligo 1.60 Micro 0.115 13.9
#> 9 CLDN11 Oligo 1.82 EndoMural 0.146 12.5
#> 10 MOG Oligo 2.06 OPC 0.185 11.1
#> # ℹ 752 more rows
#> # ℹ 4 more variables: MeanRatio.rank <int>, MeanRatio.anno <chr>,
#> # gene_ensembl <chr>, gene_name <chr>