Code and results for the recount-brain project that enhances the recount2 project project. The recount_brain table can be accessed via the recount (Collado-Torres, Nellore, Kammers, Ellis, et al., 2017) Bioconductor package using recount::add_metadata(source = 'recount_brain_v2').
recount_brain from the Sequence Read Archive (SRA) that have at least 4 samples and over 70% of the samples are from the brain. It creates the list of candidate projects saved in projects_lists.txt.recount_brain. Note that not all candidate studies were brain studies so the final number of projects considered is 62.recount_brain table that can be easily accessed via recount (Collado-Torres, Nellore, Kammers, Ellis, et al., 2017) using the add_metadata() function. The document merging_data describes how the recount_brain was created using the files from SRA_metadata and includes some brief examples on how to explore the recount_brain table. You can access this initial version of recount_brain using recount::add_metadata(source = 'recount_brain_v1').recount_brain and any manipulations required to do so. The cross_studies_metadata directory also contains a second document, recount_brain_ontologies, with the code used for adding Broadmann area, disease and tissue ontology information to recount_brain. This final table is the one you can access using recount::add_metadata(source = 'recount_brain_v2').recount_brain_v2 and MetaSRA (Bernstein, Doan, and Dewey, 2017) as described in the metasra_comp html document.recount_brain can be used for a gene differential expression analysis. See the full example for more information: example_SRP027383. You can also access the pdf version if you prefer over the HTML version.recount_brain and combining them with specific tissue data from The Cancer Genome Atlas (TCGA). See the full example for more information: example_multistudy. You can also access the pdf version if you prefer it over the HTML version.This information is also available as a csv file at SupplementaryTable1.csv.
age: Age of donorage_units: Units of age - (Years / Months / Post Conception Weeks)assay_type_s: Sequencing technique - (RNA-Seq)avgspotlen_l: Average length of sequenced readbioproject_s: NCBI BioProject IDbiosample_s: NCBI BioSample IDbrain_bank: Brain tissue repository sourcebrodmann_area: Brodmann area for tissue from cerebral cortex - (1-52)cell_line: Cell line descriptioncenter_name_s: Project centerclinical_stage_1: Clinically relevant tissue sample informationclinical_stage_2: Clinically relevant tissue sample informationconsent_s: Data availability - (Public)development: Stage of human development - (Fetus / Infant / Child / Adolescent / Adult)disease: Disease descriptiondisease_status: Nature of tissue - (Disease / Control)experiment_s: NCBI Experiment IDhemisphere: Cerebral hemisphere - (Left / Right)insertsize_l: Length of sequence between adaptorsinstrument_s: High throughput sequencing systemlibrary_name_s: Internal sample ID used by original studylibrarylayout_s: Sequencing layout - (Single / Paired)libraryselection_s: Sequencing library - (cDNA)librarysource_s: Sequencing source - (Transcriptomic)loaddate_s: Sequencing load datembases_l: Megabasesmbytes_l: Megabytesorganism_s: Organism - (Homo sapiens)pathology: Tissue pathologyplatform_s: Sequencing platform - (Illumina)pmi: Postmortem intervalpmi_units: Units of postmortem interval - (Hours)preparation: Specimen preparation - (Frozen)present_in_recount: Expression data present in recount2race: Race of donor - (Asian / Black / Hispanic / White)releasedate_s: Sequencing release daterin: RNA integrity numberrun_s: NCBI Run IDsample_name_s: GEO Accession IDsample_origin: Tissue origin - (Brain / iPSC)sex: Sex of donor - (Female / Male)sra_sample_s: NCBI SRA Sample IDsra_study_s: NCBI SRA Study IDtissue_site_1: Anatomic site of tissuetissue_site_2: Anatomic site of tissue, further specifiedtissue_site_3: Anatomic site of tissue, further specifiedtumor_type: Type of tumor - (Glioblastoma / Astrocytoma / Ependymoma / Oligodendroglioma)viability: Tissue viability - (Postmortem / Biopsy)You can access this initial version with recount::add_metadata(source = 'recount_brain_v1').
List of variables present in recount_brain_v2.
Study_full: either the SRA study accession, GTEX or TCGA.drugName_full: the drug name for TCGA samples.drug_info_full: logical, whether the sample has drug information; only for TCGA.drug_type_full: the drug classification (chemotherapy, immunotherapy, …); only for TCGA.full_260_280: the 260 to 280 ratio; only for TCGA.count_file_identifier: the SRA run accession or the TCGA run (sample) identifier. Useful for merging with the rest of recount2 metadata.Dataset: either SRA, GTEX or TCGA.brodmann_ontology: URL for the Brodmann region ontology. See the recount_brain_ontologies file for how this information was added.brodmann_synonyms: synonyms used for the Brodmann regions. These facilitate text based searches. Separated by |.brodmann_parents: URLs for the Brodmann ontology parents. Separated by |.brodmann_parents_label: Brodmann ontology parent text preferred labels. Separated by |.disease_ontology: URL for the disease ontology.tissue: tissue as prioritized by tissue_site_3 over tissue_site_2 over tissue_site_1.tissue_ontology: URL for the tissue ontology.tissue_synonyms: tissue synonyms which facilitate text based searches. Separated by |.tissue_parents: URLs for the tissue ontology parents. Separated by |.tissue_parents_label: tissue ontology parent text preferred labels. Separated by |.You can access this version with recount::add_metadata(source = 'recount_brain_v2').
recount_brainWe recommend opening the interactive recount_brain exploration in another window.
This application is a custom version of shinycsv (Collado-Torres, Semick, and Jaffe, 2018). The code for making this application is available in the shinytable directory.
If you have any questions about recount_brain please post them as an issue at LieberInstitute/recount-brain and include the relevant session information using the following code. Thank you!
library('sessioninfo')
options(width = 120)
session_info()
The analyses were made possible thanks to BioPortal (Whetzel, Noy, Shah, Alexander, et al., 2011), MetaSRA (Bernstein, Doan, and Dewey, 2017), and:
[1] Z. Bao, H. Chen, M. Yang, C. Zhang, et al. “RNA-seq of 272 gliomas revealed a novel, recurrentPTPRZ1-METfusion transcript in secondary glioblastomas”. In: Genome Research 24.11 (Aug. 2014), pp. 1765–1773. DOI: 10.1101/gr.165126.113. URL: https://doi.org/10.1101/gr.165126.113.
[2] M. N. Bernstein, A. Doan, and C. N. Dewey. “MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive”. In: Bioinformatics 33.18 (May. 2017). Ed. by J. Wren, pp. 2914–2923. DOI: 10.1093/bioinformatics/btx334. URL: https://doi.org/10.1093/bioinformatics/btx334.
[3] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.8. 2017. URL: https://CRAN.R-project.org/package=knitcitations.
[4] W. Chang. downloader: Download Files over HTTP and HTTPS. R package version 0.4. 2015. URL: https://CRAN.R-project.org/package=downloader.
[5] L. Collado-Torres, A. Nellore, K. Kammers, S. E. Ellis, et al. “Reproducible RNA-seq analysis using recount2”. In: Nature Biotechnology (2017). DOI: 10.1038/nbt.3838. URL: http://www.nature.com/nbt/journal/v35/n4/full/nbt.3838.html.
[6] L. Collado-Torres, S. Semick, and A. E. Jaffe. shinycsv: Explore a table interactively in a shiny application. R package version 0.99.8. 2018. URL: https://github.com/LieberInstitute/shinycsv.
[7] G. Csárdi, R. core, H. Wickham, W. Chang, et al. sessioninfo: R Session Information. R package version 1.1.1. 2018. URL: https://CRAN.R-project.org/package=sessioninfo.
[8] S. E. Ellis, L. Collado-Torres, A. E. Jaffe, and J. T. Leek. “Improving the value of public RNA-seq expression data by phenotype prediction”. In: Nucl. Acids Res. (2018). DOI: 10.1093/nar/gky102. URL: https://doi.org/10.1093/nar/gky102.
[9] P. G. Ferreira, M. Muñoz-Aguirre, F. Reverter, C. P. S. Godinho, et al. “The effects of death and post-mortem cold ischemia on human tissue transcriptomes”. In: Nature Communications 9.1 (Feb. 2018). DOI: 10.1038/s41467-017-02772-x. URL: https://doi.org/10.1038/s41467-017-02772-x.
[10] A. Oleś, M. Morgan, and W. Huber. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.10.0. 2018. URL: https://github.com/Bioconductor/BiocStyle.
[11] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2019. URL: https://www.R-project.org/.
[12] P. L. Whetzel, N. F. Noy, N. H. Shah, P. R. Alexander, et al. “BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications”. In: Nucleic Acids Research 39.suppl (Jun. 2011), pp. W541–W545. DOI: 10.1093/nar/gkr469. URL: https://doi.org/10.1093/nar/gkr469.
[13] H. Wickham. tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1. 2017. URL: https://CRAN.R-project.org/package=tidyverse.
[14] H. Wickham, J. Hester, and W. Chang. devtools: Tools to Make Developing R Packages Easier. R package version 2.0.2. 2019. URL: https://CRAN.R-project.org/package=devtools.
[15] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. URL: http://www.crcpress.com/product/isbn/9781466561595.
[16] Y. Xie, J. Cheng, and X. Tan. DT: A Wrapper of the JavaScript Library ‘DataTables’. R package version 0.5. 2018. URL: https://CRAN.R-project.org/package=DT.