Evaluate the enrichment for a list of gene sets

Using the layer-level (group-level) data, this function evaluates whether list of gene sets (Ensembl gene IDs) are enriched among the significant genes (FDR < 0.1 by default) genes for a given model type result. Test the alternative hypothesis that OR > 1, i.e. that gene set is over-represented in the set of enriched genes. If you want to check depleted genes, change reverse to TRUE.

gene_set_enrichment(
  gene_list,
  fdr_cut = 0.1,
  modeling_results = fetch_data(type = "modeling_results"),
  model_type = names(modeling_results)[1],
  reverse = FALSE
)

Arguments

gene_list: A named list object (could be a data.frame) where each element of the list is a character vector of Ensembl gene IDs.
fdr_cut: A numeric(1) specifying the FDR cutoff to use for determining significance among the modeling results genes.
modeling_results: Defaults to the output of fetch_data(type = 'modeling_results'). This is a list of tables with the columns f_stat_* or t_stat_* as well as p_value_* and fdr_* plus ensembl. The column name is used to extract the statistic results, the p-values, and the FDR adjusted p-values. Then the ensembl column is used for matching in some cases. See fetch_data() for more details. Typically this is the set of reference statistics used in layer_stat_cor().
model_type: A named element of the modeling_results list. By default that is either enrichment for the model that tests one human brain layer against the rest (one group vs the rest), pairwise which compares two layers (groups) denoted by layerA-layerB such that layerA is greater than layerB, and anova which determines if any layer (group) is different from the rest adjusting for the mean expression level. The statistics for enrichment and pairwise are t-statistics while the anova model ones are F-statistics.
reverse: A logical(1) indicating whether to multiply by -1 the input statistics and reverse the layerA-layerB column names (using the -) into layerB-layerA.

Value

A table in long format with the enrichment results using stats::fisher.test().

OR odds ratio.
Pval p-value for fisher.test().
test group or layer in the modeling_results.
NumSig Number of genes from the gene set present in modeling_results & with fdr < fdr_cut and t_stat > 0 (unless reverse = TRUE) for test in modeling results.
SetSize Number of genes from modeling_results present in gene_set.
ID name of gene set.
model_type record of input model type from modeling results.
fdr_cut record of input frd_cut.

Details

Check https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/Layer_Guesses/check_clinical_gene_sets.R to see a full script from where this family of functions is derived from.

Author

Andrew E Jaffe, Leonardo Collado-Torres

Examples


## Read in the SFARI gene sets included in the package
asd_sfari <- utils::read.csv(
    system.file(
        "extdata",
        "SFARI-Gene_genes_01-03-2020release_02-04-2020export.csv",
        package = "spatialLIBD"
    ),
    as.is = TRUE
)

## Format them appropriately
asd_sfari_geneList <- list(
    Gene_SFARI_all = asd_sfari$ensembl.id,
    Gene_SFARI_high = asd_sfari$ensembl.id[asd_sfari$gene.score < 3],
    Gene_SFARI_syndromic = asd_sfari$ensembl.id[asd_sfari$syndromic == 1]
)

## Obtain the necessary data
if (!exists("modeling_results")) {
    modeling_results <- fetch_data(type = "modeling_results")
}
#> 2025-05-09 16:30:08.190226 loading file /github/home/.cache/R/BiocFileCache/65ec6f775f8c_Human_DLPFC_Visium_modeling_results.Rdata%3Fdl%3D1

## Compute the gene set enrichment results
asd_sfari_enrichment <- gene_set_enrichment(
    gene_list = asd_sfari_geneList,
    modeling_results = modeling_results,
    model_type = "enrichment"
)

## Explore the results
asd_sfari_enrichment
#>           OR         Pval   test NumSig SetSize                   ID model_type
#> 1  1.2659915 1.761332e-03     WM    231     869       Gene_SFARI_all enrichment
#> 2  1.1819109 9.895949e-02     WM     90     355      Gene_SFARI_high enrichment
#> 3  1.2333378 1.853021e-01     WM     31     118 Gene_SFARI_syndromic enrichment
#> 4  0.9702022 6.130806e-01 Layer1     71     869       Gene_SFARI_all enrichment
#> 5  0.7192630 9.493328e-01 Layer1     22     355      Gene_SFARI_high enrichment
#> 6  1.1216176 4.054532e-01 Layer1     11     118 Gene_SFARI_syndromic enrichment
#> 7  2.7377140 5.096514e-21 Layer2    137     869       Gene_SFARI_all enrichment
#> 8  2.7066379 8.845390e-10 Layer2     57     355      Gene_SFARI_high enrichment
#> 9  2.6632367 3.564638e-04 Layer2     19     118 Gene_SFARI_syndromic enrichment
#> 10 1.3579958 1.687561e-01 Layer3     14     869       Gene_SFARI_all enrichment
#> 11 1.1738012 4.264658e-01 Layer3      5     355      Gene_SFARI_high enrichment
#> 12 2.8947133 5.518757e-02 Layer3      4     118 Gene_SFARI_syndromic enrichment
#> 13 1.2423009 1.544115e-01 Layer4     29     869       Gene_SFARI_all enrichment
#> 14 1.1445522 3.748009e-01 Layer4     11     355      Gene_SFARI_high enrichment
#> 15 2.6106289 1.575232e-02 Layer4      8     118 Gene_SFARI_syndromic enrichment
#> 16 2.0969125 7.366596e-07 Layer5     60     869       Gene_SFARI_all enrichment
#> 17 2.0956628 9.450654e-04 Layer5     25     355      Gene_SFARI_high enrichment
#> 18 0.7064982 7.951889e-01 Layer5      3     118 Gene_SFARI_syndromic enrichment
#> 19 2.6716353 1.472539e-07 Layer6     41     869       Gene_SFARI_all enrichment
#> 20 2.6206690 5.845493e-04 Layer6     17     355      Gene_SFARI_high enrichment
#> 21 2.2573853 7.927915e-02 Layer6      5     118 Gene_SFARI_syndromic enrichment
#>    fdr_cut
#> 1      0.1
#> 2      0.1
#> 3      0.1
#> 4      0.1
#> 5      0.1
#> 6      0.1
#> 7      0.1
#> 8      0.1
#> 9      0.1
#> 10     0.1
#> 11     0.1
#> 12     0.1
#> 13     0.1
#> 14     0.1
#> 15     0.1
#> 16     0.1
#> 17     0.1
#> 18     0.1
#> 19     0.1
#> 20     0.1
#> 21     0.1

Arguments

Value

Details

See also

Author

Examples