R/gene_set_enrichment.R
gene_set_enrichment.Rd
Using the layer-level (group-level) data, this function evaluates whether
list of gene sets (Ensembl gene IDs) are enriched among the significant
genes (FDR < 0.1 by default) genes for a given model type result. Test the
alternative hypothesis that OR > 1, i.e. that gene set is over-represented in the
set of enriched genes. If you want to check depleted genes, change reverse
to TRUE
.
gene_set_enrichment(
gene_list,
fdr_cut = 0.1,
modeling_results = fetch_data(type = "modeling_results"),
model_type = names(modeling_results)[1],
reverse = FALSE
)
A named list
object (could be a data.frame
) where each
element of the list is a character vector of Ensembl gene IDs.
A numeric(1)
specifying the FDR cutoff to use for
determining significance among the modeling results genes.
Defaults to the output of
fetch_data(type = 'modeling_results')
. This is a list of tables with the
columns f_stat_*
or t_stat_*
as well as p_value_*
and fdr_*
plus
ensembl
. The column name is used to extract the statistic results, the
p-values, and the FDR adjusted p-values. Then the ensembl
column is used
for matching in some cases. See fetch_data()
for more details. Typically
this is the set of reference statistics used in layer_stat_cor()
.
A named element of the modeling_results
list. By default
that is either enrichment
for the model that tests one human brain layer
against the rest (one group vs the rest), pairwise
which compares two
layers (groups) denoted by layerA-layerB
such that layerA
is greater
than layerB
, and anova
which determines if any layer (group) is different
from the rest adjusting for the mean expression level. The statistics for
enrichment
and pairwise
are t-statistics while the anova
model ones
are F-statistics.
A logical(1)
indicating whether to multiply by -1
the
input statistics and reverse the layerA-layerB
column names (using the -
)
into layerB-layerA
.
A table in long format with the enrichment results using
stats::fisher.test()
.
OR
odds ratio.
Pval
p-value for fisher.test()
.
test
group or layer in the modeling_results
.
NumSig
Number of genes from the gene set present in modeling_results
&
with fdr < fdr_cut
and t_stat > 0
(unless reverse = TRUE) for test
in
modeling results.
SetSize
Number of genes from modeling_results
present in gene_set
.
ID
name of gene set.
model_type
record of input model type from modeling results
.
fdr_cut
record of input frd_cut
.
Check https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/Layer_Guesses/check_clinical_gene_sets.R to see a full script from where this family of functions is derived from.
Other Gene set enrichment functions:
gene_set_enrichment_plot()
## Read in the SFARI gene sets included in the package
asd_sfari <- utils::read.csv(
system.file(
"extdata",
"SFARI-Gene_genes_01-03-2020release_02-04-2020export.csv",
package = "spatialLIBD"
),
as.is = TRUE
)
## Format them appropriately
asd_sfari_geneList <- list(
Gene_SFARI_all = asd_sfari$ensembl.id,
Gene_SFARI_high = asd_sfari$ensembl.id[asd_sfari$gene.score < 3],
Gene_SFARI_syndromic = asd_sfari$ensembl.id[asd_sfari$syndromic == 1]
)
## Obtain the necessary data
if (!exists("modeling_results")) {
modeling_results <- fetch_data(type = "modeling_results")
}
#> 2024-12-16 21:50:56.291183 loading file /github/home/.cache/R/BiocFileCache/5db39a09009_Human_DLPFC_Visium_modeling_results.Rdata%3Fdl%3D1
## Compute the gene set enrichment results
asd_sfari_enrichment <- gene_set_enrichment(
gene_list = asd_sfari_geneList,
modeling_results = modeling_results,
model_type = "enrichment"
)
## Explore the results
asd_sfari_enrichment
#> OR Pval test NumSig SetSize ID model_type
#> 1 1.2659915 1.761332e-03 WM 231 869 Gene_SFARI_all enrichment
#> 2 1.1819109 9.895949e-02 WM 90 355 Gene_SFARI_high enrichment
#> 3 1.2333378 1.853021e-01 WM 31 118 Gene_SFARI_syndromic enrichment
#> 4 0.9702022 6.130806e-01 Layer1 71 869 Gene_SFARI_all enrichment
#> 5 0.7192630 9.493328e-01 Layer1 22 355 Gene_SFARI_high enrichment
#> 6 1.1216176 4.054532e-01 Layer1 11 118 Gene_SFARI_syndromic enrichment
#> 7 2.7377140 5.096514e-21 Layer2 137 869 Gene_SFARI_all enrichment
#> 8 2.7066379 8.845390e-10 Layer2 57 355 Gene_SFARI_high enrichment
#> 9 2.6632367 3.564638e-04 Layer2 19 118 Gene_SFARI_syndromic enrichment
#> 10 1.3579958 1.687561e-01 Layer3 14 869 Gene_SFARI_all enrichment
#> 11 1.1738012 4.264658e-01 Layer3 5 355 Gene_SFARI_high enrichment
#> 12 2.8947133 5.518757e-02 Layer3 4 118 Gene_SFARI_syndromic enrichment
#> 13 1.2423009 1.544115e-01 Layer4 29 869 Gene_SFARI_all enrichment
#> 14 1.1445522 3.748009e-01 Layer4 11 355 Gene_SFARI_high enrichment
#> 15 2.6106289 1.575232e-02 Layer4 8 118 Gene_SFARI_syndromic enrichment
#> 16 2.0969125 7.366596e-07 Layer5 60 869 Gene_SFARI_all enrichment
#> 17 2.0956628 9.450654e-04 Layer5 25 355 Gene_SFARI_high enrichment
#> 18 0.7064982 7.951889e-01 Layer5 3 118 Gene_SFARI_syndromic enrichment
#> 19 2.6716353 1.472539e-07 Layer6 41 869 Gene_SFARI_all enrichment
#> 20 2.6206690 5.845493e-04 Layer6 17 355 Gene_SFARI_high enrichment
#> 21 2.2573853 7.927915e-02 Layer6 5 118 Gene_SFARI_syndromic enrichment
#> fdr_cut
#> 1 0.1
#> 2 0.1
#> 3 0.1
#> 4 0.1
#> 5 0.1
#> 6 0.1
#> 7 0.1
#> 8 0.1
#> 9 0.1
#> 10 0.1
#> 11 0.1
#> 12 0.1
#> 13 0.1
#> 14 0.1
#> 15 0.1
#> 16 0.1
#> 17 0.1
#> 18 0.1
#> 19 0.1
#> 20 0.1
#> 21 0.1