Code and results for the recount-brain project that enhances the recount2 project project. The recount_brain
table can be accessed via the recount (Collado-Torres, Nellore, Kammers, Ellis, et al., 2017) Bioconductor package using recount::add_metadata(source = 'recount_brain_v2')
.
recount_brain
from the Sequence Read Archive (SRA) that have at least 4 samples and over 70% of the samples are from the brain. It creates the list of candidate projects saved in projects_lists.txt.recount_brain
. Note that not all candidate studies were brain studies so the final number of projects considered is 62.recount_brain
table that can be easily accessed via recount (Collado-Torres, Nellore, Kammers, Ellis, et al., 2017) using the add_metadata()
function. The document merging_data describes how the recount_brain
was created using the files from SRA_metadata
and includes some brief examples on how to explore the recount_brain
table. You can access this initial version of recount_brain
using recount::add_metadata(source = 'recount_brain_v1')
.recount_brain
and any manipulations required to do so. The cross_studies_metadata directory also contains a second document, recount_brain_ontologies, with the code used for adding Broadmann area, disease and tissue ontology information to recount_brain
. This final table is the one you can access using recount::add_metadata(source = 'recount_brain_v2')
.recount_brain_v2
and MetaSRA
(Bernstein, Doan, and Dewey, 2017) as described in the metasra_comp html document.recount_brain
can be used for a gene differential expression analysis. See the full example for more information: example_SRP027383. You can also access the pdf version if you prefer over the HTML version.recount_brain
and combining them with specific tissue data from The Cancer Genome Atlas (TCGA). See the full example for more information: example_multistudy. You can also access the pdf version if you prefer it over the HTML version.This information is also available as a csv file at SupplementaryTable1.csv.
age
: Age of donorage_units
: Units of age - (Years / Months / Post Conception Weeks)assay_type_s
: Sequencing technique - (RNA-Seq)avgspotlen_l
: Average length of sequenced readbioproject_s
: NCBI BioProject IDbiosample_s
: NCBI BioSample IDbrain_bank
: Brain tissue repository sourcebrodmann_area
: Brodmann area for tissue from cerebral cortex - (1-52)cell_line
: Cell line descriptioncenter_name_s
: Project centerclinical_stage_1
: Clinically relevant tissue sample informationclinical_stage_2
: Clinically relevant tissue sample informationconsent_s
: Data availability - (Public)development
: Stage of human development - (Fetus / Infant / Child / Adolescent / Adult)disease
: Disease descriptiondisease_status
: Nature of tissue - (Disease / Control)experiment_s
: NCBI Experiment IDhemisphere
: Cerebral hemisphere - (Left / Right)insertsize_l
: Length of sequence between adaptorsinstrument_s
: High throughput sequencing systemlibrary_name_s
: Internal sample ID used by original studylibrarylayout_s
: Sequencing layout - (Single / Paired)libraryselection_s
: Sequencing library - (cDNA)librarysource_s
: Sequencing source - (Transcriptomic)loaddate_s
: Sequencing load datembases_l
: Megabasesmbytes_l
: Megabytesorganism_s
: Organism - (Homo sapiens)pathology
: Tissue pathologyplatform_s
: Sequencing platform - (Illumina)pmi
: Postmortem intervalpmi_units
: Units of postmortem interval - (Hours)preparation
: Specimen preparation - (Frozen)present_in_recount
: Expression data present in recount2race
: Race of donor - (Asian / Black / Hispanic / White)releasedate_s
: Sequencing release daterin
: RNA integrity numberrun_s
: NCBI Run IDsample_name_s
: GEO Accession IDsample_origin
: Tissue origin - (Brain / iPSC)sex
: Sex of donor - (Female / Male)sra_sample_s
: NCBI SRA Sample IDsra_study_s
: NCBI SRA Study IDtissue_site_1
: Anatomic site of tissuetissue_site_2
: Anatomic site of tissue, further specifiedtissue_site_3
: Anatomic site of tissue, further specifiedtumor_type
: Type of tumor - (Glioblastoma / Astrocytoma / Ependymoma / Oligodendroglioma)viability
: Tissue viability - (Postmortem / Biopsy)You can access this initial version with recount::add_metadata(source = 'recount_brain_v1')
.
List of variables present in recount_brain_v2
.
Study_full
: either the SRA study accession, GTEX or TCGA.drugName_full
: the drug name for TCGA samples.drug_info_full
: logical, whether the sample has drug information; only for TCGA.drug_type_full
: the drug classification (chemotherapy, immunotherapy, …); only for TCGA.full_260_280
: the 260 to 280 ratio; only for TCGA.count_file_identifier
: the SRA run accession or the TCGA run (sample) identifier. Useful for merging with the rest of recount2 metadata.Dataset
: either SRA, GTEX or TCGA.brodmann_ontology
: URL for the Brodmann region ontology. See the recount_brain_ontologies
file for how this information was added.brodmann_synonyms
: synonyms used for the Brodmann regions. These facilitate text based searches. Separated by |
.brodmann_parents
: URLs for the Brodmann ontology parents. Separated by |
.brodmann_parents_label
: Brodmann ontology parent text preferred labels. Separated by |
.disease_ontology
: URL for the disease ontology.tissue
: tissue as prioritized by tissue_site_3
over tissue_site_2
over tissue_site_1
.tissue_ontology
: URL for the tissue ontology.tissue_synonyms
: tissue synonyms which facilitate text based searches. Separated by |
.tissue_parents
: URLs for the tissue ontology parents. Separated by |
.tissue_parents_label
: tissue ontology parent text preferred labels. Separated by |
.You can access this version with recount::add_metadata(source = 'recount_brain_v2')
.
recount_brain
We recommend opening the interactive recount_brain
exploration in another window.
This application is a custom version of shinycsv
(Collado-Torres, Semick, and Jaffe, 2018). The code for making this application is available in the shinytable directory.
If you have any questions about recount_brain
please post them as an issue at LieberInstitute/recount-brain and include the relevant session information using the following code. Thank you!
library('sessioninfo')
options(width = 120)
session_info()
The analyses were made possible thanks to BioPortal
(Whetzel, Noy, Shah, Alexander, et al., 2011), MetaSRA
(Bernstein, Doan, and Dewey, 2017), and:
[1] Z. Bao, H. Chen, M. Yang, C. Zhang, et al. “RNA-seq of 272 gliomas revealed a novel, recurrentPTPRZ1-METfusion transcript in secondary glioblastomas”. In: Genome Research 24.11 (Aug. 2014), pp. 1765–1773. DOI: 10.1101/gr.165126.113. URL: https://doi.org/10.1101/gr.165126.113.
[2] M. N. Bernstein, A. Doan, and C. N. Dewey. “MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive”. In: Bioinformatics 33.18 (May. 2017). Ed. by J. Wren, pp. 2914–2923. DOI: 10.1093/bioinformatics/btx334. URL: https://doi.org/10.1093/bioinformatics/btx334.
[3] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.8. 2017. URL: https://CRAN.R-project.org/package=knitcitations.
[4] W. Chang. downloader: Download Files over HTTP and HTTPS. R package version 0.4. 2015. URL: https://CRAN.R-project.org/package=downloader.
[5] L. Collado-Torres, A. Nellore, K. Kammers, S. E. Ellis, et al. “Reproducible RNA-seq analysis using recount2”. In: Nature Biotechnology (2017). DOI: 10.1038/nbt.3838. URL: http://www.nature.com/nbt/journal/v35/n4/full/nbt.3838.html.
[6] L. Collado-Torres, S. Semick, and A. E. Jaffe. shinycsv: Explore a table interactively in a shiny application. R package version 0.99.8. 2018. URL: https://github.com/LieberInstitute/shinycsv.
[7] G. Csárdi, R. core, H. Wickham, W. Chang, et al. sessioninfo: R Session Information. R package version 1.1.1. 2018. URL: https://CRAN.R-project.org/package=sessioninfo.
[8] S. E. Ellis, L. Collado-Torres, A. E. Jaffe, and J. T. Leek. “Improving the value of public RNA-seq expression data by phenotype prediction”. In: Nucl. Acids Res. (2018). DOI: 10.1093/nar/gky102. URL: https://doi.org/10.1093/nar/gky102.
[9] P. G. Ferreira, M. Muñoz-Aguirre, F. Reverter, C. P. S. Godinho, et al. “The effects of death and post-mortem cold ischemia on human tissue transcriptomes”. In: Nature Communications 9.1 (Feb. 2018). DOI: 10.1038/s41467-017-02772-x. URL: https://doi.org/10.1038/s41467-017-02772-x.
[10] A. Oleś, M. Morgan, and W. Huber. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.10.0. 2018. URL: https://github.com/Bioconductor/BiocStyle.
[11] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2019. URL: https://www.R-project.org/.
[12] P. L. Whetzel, N. F. Noy, N. H. Shah, P. R. Alexander, et al. “BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications”. In: Nucleic Acids Research 39.suppl (Jun. 2011), pp. W541–W545. DOI: 10.1093/nar/gkr469. URL: https://doi.org/10.1093/nar/gkr469.
[13] H. Wickham. tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1. 2017. URL: https://CRAN.R-project.org/package=tidyverse.
[14] H. Wickham, J. Hester, and W. Chang. devtools: Tools to Make Developing R Packages Easier. R package version 2.0.2. 2019. URL: https://CRAN.R-project.org/package=devtools.
[15] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. URL: http://www.crcpress.com/product/isbn/9781466561595.
[16] Y. Xie, J. Cheng, and X. Tan. DT: A Wrapper of the JavaScript Library ‘DataTables’. R package version 0.5. 2018. URL: https://CRAN.R-project.org/package=DT.