Welcome to the smokingMouse
project! Here you’ll be able to access the mouse expression data used for the analysis of the smoking-nicotine-mouse LIBD project.
This bulk RNA-sequencing project consisted of a differential expression analysis (DEA) involving 4 data types: genes, transcripts, exons, and exon-exon junctions. The main goal of this study was to explore the effects of prenatal exposure to smoking and nicotine on the developing mouse brain. As secondary objectives, this work evaluated: 1) the affected genes by each exposure in the adult female brain in order to compare offspring and adult results, and 2) the effects of smoking on adult blood and brain to search for overlapping biomarkers in both tissues. Finally, DEGs identified in mouse were compared against previously published results in human (Semick et al., 2020 and Toikumo et al., 2023).
The next table summarizes the analyses done at each level.
All R
scripts created to perform such analyses can be found in code on GitHub.
The mouse datasets contain the following data in a single R
RangedSummarizedExperiment
* object for each feature (genes, transcripts, exons, and exon-exon junctions):
Moreover, you can find human data generated in Semick et al., (2018) in Mol Psychiatry (DOI: https://doi.org/10.1038/s41380-018-0223-1) that contain the results of a DEA for cigarette smoke exposure in adult and prenatal human brain.
*For more details, check the documentation for RangedSummarizedExperiment
objects.
rse_gene
object) the gene RSE object contains the raw and log-normalized expression data of 55,401 mouse genes across the 208 samples from brain and blood of control and nicotine/smoking-exposed pup and adult mice.rse_tx
object) the tx RSE object contains the raw and log-scaled expression data of 142,604 mouse transcripts across the 208 samples from brain and blood of control and nicotine/smoking-exposed pup and adult mice.rse_exon
object) the exon RSE object contains the raw and log-normalized expression data of 447,670 mouse exons across the 208 samples from brain and blood of control and nicotine/smoking-exposed pup and adult mice.rse_jx
object) the jx RSE object contains the raw and log-normalized expression data of 1,436,068 mouse exon-exon junctions across the 208 samples from brain and blood of control and nicotine/smoking-exposed pup and adult mice.All the above datasets contain the sample and feature metadata and additional data of the results obtained in the filtering steps and the DEA.
de_genes_prenatal_human_brain_smoking
object) data frame with DE statistics of 18,067 human genes for cigarette smoke exposure in prenatal human cortical tissue.de_genes_adult_human_brain_smoking
object) data frame with DE statistics of 18,067 human genes for cigarette smoke exposure in adult human cortical tissue.Get the latest stable R
release from CRAN. Then install smokingMouse
from Bioconductor using the following code:
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("smokingMouse")
And the development version from GitHub with:
BiocManager::install("LieberInstitute/smokingMouse")
Below there’s example code on how to access the mouse and human gene data but can do the same for any of the datasets previously described. The datasets are retrieved from Bioconductor ExperimentHub
.
## Connect to ExperimentHub
library(ExperimentHub)
#> Loading required package: BiocGenerics
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#> colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#> get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#> match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,
#> tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: AnnotationHub
#> Loading required package: BiocFileCache
#> Loading required package: dbplyr
eh <- ExperimentHub::ExperimentHub()
## Load the datasets of the package
myfiles <- query(eh, "smokingMouse")
########################
# Mouse data
########################
## Download the mouse gene data
rse_gene <- myfiles[['EH8313']]
#> Warning: package 'GenomicRanges' was built under R version 4.4.1
## This is a RangedSummarizedExperiment object
rse_gene
#> class: RangedSummarizedExperiment
#> dim: 55401 208
#> metadata(1): Obtained_from
#> assays(2): counts logcounts
#> rownames(55401): ENSMUSG00000102693.1 ENSMUSG00000064842.1 ...
#> ENSMUSG00000064371.1 ENSMUSG00000064372.1
#> rowData names(13): Length gencodeID ... DE_in_pup_brain_nicotine
#> DE_in_pup_brain_smoking
#> colnames: NULL
#> colData names(71): SAMPLE_ID FQCbasicStats ...
#> retained_after_QC_sample_filtering
#> retained_after_manual_sample_filtering
## Check sample info
colData(rse_gene)[1:5, 1:5]
#> DataFrame with 5 rows and 5 columns
#> SAMPLE_ID FQCbasicStats perBaseQual perTileQual perSeqQual
#> <character> <character> <character> <character> <character>
#> 1 Sample_2914 PASS PASS PASS PASS
#> 2 Sample_4042 PASS PASS PASS PASS
#> 3 Sample_4043 PASS PASS PASS PASS
#> 4 Sample_4044 PASS PASS PASS PASS
#> 5 Sample_4045 PASS PASS PASS PASS
## Check gene info
rowData(rse_gene)[1:5, 1:5]
#> DataFrame with 5 rows and 5 columns
#> Length gencodeID ensemblID
#> <integer> <character> <character>
#> ENSMUSG00000102693.1 1070 ENSMUSG00000102693.1 ENSMUSG00000102693
#> ENSMUSG00000064842.1 110 ENSMUSG00000064842.1 ENSMUSG00000064842
#> ENSMUSG00000051951.5 6094 ENSMUSG00000051951.5 ENSMUSG00000051951
#> ENSMUSG00000102851.1 480 ENSMUSG00000102851.1 ENSMUSG00000102851
#> ENSMUSG00000103377.1 2819 ENSMUSG00000103377.1 ENSMUSG00000103377
#> gene_type EntrezID
#> <character> <character>
#> ENSMUSG00000102693.1 TEC 71042
#> ENSMUSG00000064842.1 snRNA NA
#> ENSMUSG00000051951.5 protein_coding 497097
#> ENSMUSG00000102851.1 processed_pseudogene 100418032
#> ENSMUSG00000103377.1 TEC NA
## Access the original counts
original_counts <- assays(rse_gene)$counts
## Access the log-normalized counts
logcounts <- assays(rse_gene)$logcounts
########################
# Human data
########################
## Download the human gene data
de_genes_prenatal_human_brain_smoking <- myfiles[['EH8317']]
## This is a data frame
de_genes_prenatal_human_brain_smoking[1:5, ]
#> GRanges object with 5 ranges and 9 metadata columns:
#> seqnames ranges strand | Length Symbol
#> <Rle> <IRanges> <Rle> | <integer> <character>
#> ENSG00000080709 chr5 113696642-113832337 + | 3995 KCNN2
#> ENSG00000070886 chr1 22890057-22930087 + | 5358 EPHA8
#> ENSG00000218336 chr4 183065140-183724177 + | 11983 TENM3
#> ENSG00000189108 chrX 103810996-105011822 + | 3146 IL1RAPL2
#> ENSG00000186732 chr22 43807202-43903728 + | 5821 MPPED1
#> EntrezID logFC AveExpr t P.Value adj.P.Val
#> <integer> <numeric> <numeric> <numeric> <numeric> <numeric>
#> ENSG00000080709 3781 -0.694069 2.86444 -6.09779 2.59861e-06 0.0469491
#> ENSG00000070886 2046 1.545861 1.58351 5.67106 7.49034e-06 0.0477263
#> ENSG00000218336 55714 0.804367 6.31125 5.55661 9.97733e-06 0.0477263
#> ENSG00000189108 26280 -1.035988 1.62624 -5.53375 1.05665e-05 0.0477263
#> ENSG00000186732 758 0.384536 9.34706 5.41518 1.42396e-05 0.0514535
#> B
#> <numeric>
#> ENSG00000080709 4.44638
#> ENSG00000070886 3.18385
#> ENSG00000218336 3.53830
#> ENSG00000189108 2.83949
#> ENSG00000186732 3.19865
#> -------
#> seqinfo: 25 sequences from an unspecified genome; no seqlengths
## Access data of human genes as normally do with data frames
Below is the citation output from using citation('smokingMouse')
in R. Please run this yourself to check for any updates on how to cite smokingMouse.
print(citation('smokingMouse'), bibtex = TRUE)
#> To cite package 'smokingMouse' in publications use:
#>
#> Gonzalez-Padilla D, Collado-Torres L (2024). _Provides access to
#> smokingMouse project data_. doi:10.18129/B9.bioc.smokingMouse
#> <https://doi.org/10.18129/B9.bioc.smokingMouse>,
#> https://github.com/LieberInstitute/smokingMouse/smokingMouse - R
#> package version 1.3.0,
#> <http://www.bioconductor.org/packages/smokingMouse>.
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Manual{,
#> title = {Provides access to smokingMouse project data},
#> author = {Daianna Gonzalez-Padilla and Leonardo Collado-Torres},
#> year = {2024},
#> url = {http://www.bioconductor.org/packages/smokingMouse},
#> note = {https://github.com/LieberInstitute/smokingMouse/smokingMouse - R package version 1.3.0},
#> doi = {10.18129/B9.bioc.smokingMouse},
#> }
#>
#>
#> To cite the original smoking-nicotine mouse work please use:
#>
#> (TODO)
#>
#>
#> To cite the original work from which human data come please use the following citation:
#>
#> Semick, S. A., Collado-Torres, L., Markunas, C. A., Shin, J. H., Deep-Soboslay, A., Tao, R., ...
#> & Jaffe, A. E. (2020). Developmental effects of maternal smoking during pregnancy on the human
#> frontal cortex transcriptome. Molecular psychiatry, 25(12), 3267-3277.
#>
Please note that the smokingMouse
package and the study analyses were only made possible thanks to many other R
and bioinformatics software authors, which are cited either in the vignette and/or the paper describing this study.
Please note that the smokingMouse
project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
For more details, check the dev
directory.
This package was developed using biocthis.