Merging recount_brain metadata with GTEx and TCGA metadata

Dustin Sokolowski1,2*, Leonardo Collado-Torres3,4** and Michael D Wilson1,2,5

1Program in Genetics and Genome Biology, Hospital for Sick Children, Toronto M5G 0A4, Canada
2Department of Molecular Genetics, University of Toronto, Toronto M5S 1A8, Canada
3Lieber Institute for Brain Development, Johns Hopkins Medical Campus
4Center for Computational Biology, Johns Hopkins University
5Heart and Stroke Richard Lewar Centre of Excellence in Cardiovascular Research, Toronto M5S 3H2, Canada

*djsokolowski95@gmail.com
**lcolladotor@gmail.com

13 November 2020

This document contains the code that creates the recount_brain version 2 table by merging recount-brain version 1 with the metadata in GTEx and TCGA. Dustin Sokolowski created the recount_brain_v2 analysis with supervision from Michael D Wilson. Leonardo Collado-Torres edited this document.

1 Load packages and files.

Here, there are two categories of files being loaded. Firstly, the recount_brain, TCGA, and GTEx data from recount are being downloaded using the add_metadata() and all_metadata() functions respectively. Secondly, some additional information about GTEx samples are added. Specifically, sample age, sex, and Hardy-Death classification are taken from the gtex_pheno.csv. Information in regards to to sample fixing and sample freezing are found in gtex_sampinfo.csv. These two files can be downloaded from https://github.com/LieberInstitute/recount-brain/tree/master/cross_studies_metadata/GTEx_extra. Furthermore, they are csv files adapted from links documented in the code.

library('recount')

# below are the files required to combine datasets

#GTEx & TCGA metadata from recount
recount_brain <- add_metadata(source = "recount_brain_v1")

## 2020-11-13 16:24:16 downloading the recount_brain metadata to /tmp/RtmpK9pZcs/recount_brain_v1.Rdata

## Loading objects:
##   recount_brain

GTEx <-  recount::all_metadata("gtex")

## 2020-11-13 16:24:17 downloading the metadata to /tmp/RtmpK9pZcs/metadata_clean_gtex.Rdata

tcga <- recount::all_metadata("tcga")

## 2020-11-13 16:24:18 downloading the metadata to /tmp/RtmpK9pZcs/metadata_clean_tcga.Rdata

# Read txt file downloaded from:
# "https://storage.googleapis.com/gtex_analysis_v7/annotations/GTEx_v7_Annotations_SubjectPhenotypesDS.txt"
# this dataset was also converted into a csv before being loaded into R

gtex_pheno <- read.csv("https://raw.githubusercontent.com/LieberInstitute/recount-brain/master/cross_studies_metadata/GTEx_extra/gtex_pheno.csv",
    header = T, as.is = T)

# Remaining phenotype information that may be useful for GTEx metadata
# https://storage.googleapis.com/gtex_analysis_v7/annotations/GTEx_v7_Annotations_SampleAttributesDS.txt
# This dataset was converted into a csv before being loaded into R
gtexSampinfo <- read.csv("https://raw.githubusercontent.com/LieberInstitute/recount-brain/master/cross_studies_metadata/GTEx_extra/gtex_sampinfo.csv",
    header = T, as.is = T)

# Supplementary table 1 to re-order columns
notes <- read.csv("https://raw.githubusercontent.com/LieberInstitute/recount-brain/master/SupplementaryTable1.csv",
    header= T, as.is = T)

# Generic funciton to convert factor into character vector
tochr <- function(x) return(as.character(levels(x))[x])

2 Process GTEx metadata

The code chunk below extracts brain samples from GTEx using the sms column. Secondly, the sample id’s are adjusted such that GTEx metadata from recount can be easily merged with GTEx metadata from the phenotype and sample file. Finally, these files are merged.

GTEX_brain <- GTEx[GTEx$smts == "Brain",] # brain samples in gtex

# Change the sample id of the "sampid" column to the first 9 or 10 characters so that the GTEX_brain and gtex_pheno columns can be merged
s <- substr(GTEX_brain$sampid, 1,10) 
s1 <- c()
for(i in s) {
  last <- substr(i, nchar(i), nchar(i))
  if(last == "-") {
    s1 <- c(s1, substr(i, 1, nchar(i)-1))
  } else {
    s1 <- c(s1, i)
  }
}
GTEX_brain$SUBJID <- s1

# merge GTEX_brain and gtex_pheno
GTEx_brain_merge <- merge(GTEX_brain, gtex_pheno, by = "SUBJID")

#GTEx brain samples
gtexSampinfo_brain <- intersect(GTEx_brain_merge$sampid, gtexSampinfo$SAMPID)

2.1 Tissue location

The code below processes the columns important to tissue location. Specifically, tissue location is found from smtsd, and locations are extracted from the same column. Finally, GTEx contains samples within the putamen, which is in the right hemisphere. Other sample regions are bilateral.

#Tissue location
tissue_1_gtex <- substr(GTEx_brain_merge$smtsd, 9,
    nchar(GTEx_brain_merge$smtsd))

# broadman locations
broadman_gtex <- c()
for(i in 1:nrow(GTEx_brain_merge)) {
  if(GTEx_brain_merge$smtsd[i] %in% "Brain - Anterior cingulate cortex (BA24)") {
    broadman_gtex[i] <- 24
    next
  }
  if(GTEx_brain_merge$smtsd[i] %in% "Brain - Frontal Cortex (BA9)") {
    broadman_gtex[i] <- 9
    next
  } 
  broadman_gtex[i] <- NA
}

# mapping putamen to right hemisphere
hemisphere_gtex <- c()
for(i in GTEx_brain_merge$smtsd) {
  if(i == "Brain - Putamen (basal ganglia)") {
    hemisphere_gtex <- c(hemisphere_gtex, "right")
  } else {
    hemisphere_gtex <- c(hemisphere_gtex, "bilateral")
  }
}

2.2 Age, sex, disease

The code below looks at age, sex, and disease. In terms of age, every brain sample is older than 20, therefore development is only adult. Disease is organized by the Hardy scale, where fast but natural and ventilator deaths make it difficult to determine disease. Otherwise, controls are violent/fast deaths and “disease” is individuals who were previously ill. Finally, recount_brain_v2 only uses public data and more in-depth information is private.

#developmental stage
development <- "adult"

# DTHHRDY explanation
stat <- c("ventilator", "violent_fast", "fast_natural", "ill_unexpected", "ill_expected")

#GTEx disease mapping
gtex_disease <- c()
gtex_disease_status <- c()
for(i in GTEx_brain_merge$DTHHRDY) {
  gtex_disease <- c(gtex_disease, stat[i+1])
  if(i == 1) {
    gtex_disease_status <- c(gtex_disease_status, "Control")
  }
  if(i == 2 | i == 0) {
    gtex_disease_status <- c(gtex_disease_status, "either")
  }
  if(i == 3 | i == 4) {
    gtex_disease_status <- c(gtex_disease_status, "Disease")
  }
} 

# mapping sex to character
sex_character <- c("male", "female")
sex <- c()
for(i in GTEx_brain_merge$SEX) sex <- c(sex, sex_character[i])

#RNA isolation type
technique <- paste0("RNA Seq, ", GTEx_brain_merge$smnabtcht)

gtex_sampleInfo_brain_merged <- merge(GTEx_brain_merge, gtexSampinfo,
    by.x = "sampid", by.y = "SAMPID", all = F)
rownames(gtex_sampleInfo_brain_merged) <- gtex_sampleInfo_brain_merged$sampid
rownames(gtexSampinfo) <- gtexSampinfo$SAMPID

2.3 Sample isolation

Sample isolation, fixed and frozen samples. These data were originally acquired from the pheno_sampid column. Time after isolation was taken from the SMTSISCH function and is currently being used as a rough proxy for Post-mortem interval. Afterwards, samples with SMTSPAX > 0 were fixed, SMTSISH < 0 were frozen. Otherwise it’s tough to tell.

gtexSampinfo$SMTSPAX[is.na(gtexSampinfo$SMTSPAX)] <- 0
gtexSampinfo$SMTSISCH[is.na(gtexSampinfo$SMTSISCH)] <- 0

isoTime <- rep(NA, nrow(GTEx_brain_merge))
names(isoTime) <- GTEx_brain_merge$sampid
count <- 1
for(i in names(isoTime)) {
  #print(which(rownames(gtexSampinfo) == i))
  isoTime[i] <- gtexSampinfo[i, "SMTSISCH"]
  
}

prep <- c()
for(i in 1:nrow(gtexSampinfo)) {
  if(gtexSampinfo$SMTSPAX[i] > 0) {
    prep[i] <- "fixed"
    next
  }
  if(gtexSampinfo$SMTSISCH[i] < 0) {
    prep[i] <- "frozen"
    next
  }
  prep[i] <- "unclear"
}
names(prep) <- rownames(gtexSampinfo)

prep_use <- rep(0, nrow(GTEx_brain_merge)) 
names(prep_use) <- GTEx_brain_merge$sampid


for(i in names(prep_use)){
  prep_use[i] <- prep[i]
}

2.4 Combine

Combine metadata in the order of recount_brain_v1. Some columns (e.g. age units, sample location, public availability…) were consistent across all samples, are these columns are a consistent character vector.

GTEX_combn <- cbind(GTEx_brain_merge$AGE, "Years", technique, GTEx_brain_merge$avg_read_length,
    NA, NA, "Laboratory, Data Analysis and Coordinating Center (LDACC)",
    broadman_gtex, NA, 
    "Laboratory, Data Analysis and Coordinating Center (LDACC)", NA,
    NA, "Public", "Adult", gtex_disease, gtex_disease_status, 
    GTEx_brain_merge$experiment,
    hemisphere_gtex, NA, "Illumina TruSeq RNA sequencing", NA, "paired",
    "cDNA", paste0("transcriptomic - ", GTEx_brain_merge$smcenter),
    GTEx_brain_merge$smnabtchd, NA,  NA, "Homo sapiens", NA, "Illumina",
    unlist(isoTime),
    "mins", unlist(prep_use), TRUE, "not public", GTEx_brain_merge$smnabtch,
    GTEx_brain_merge$smrin,
    GTEx_brain_merge$run, NA, GTEx_brain_merge$smts, sex,
    GTEx_brain_merge$sample, GTEx_brain_merge$project, tissue_1_gtex,
    NA, NA, NA, "Postmortem")

3 TCGA

The code below makes the adjustments to TCGA. This code chunk extracts the brain (i.e. Lower Grade Glioma and Glioblastoma) samples from TCGA.

# filter for brain samples 
tcga_brain_nums <- which(tcga$gdc_cases.project.project_id %in%
    c("TCGA-LGG", "TCGA-GBM"))
cd_brain.ol <- tcga[tcga_brain_nums,]

RNAseq file information. Average read length was calculated using the formula below:

\[ avgReadLength = auc / (mappedReadCount * numberEnds) \]

I.e. if RNA-seq was paired end the average read length was halved from

\[ auc/mappedReadCount\]

File size (mega bytes) is file size

\[ bytes / 1,000,000 \]

#Avg read length
TCGA_readlength <- cd_brain.ol$auc / cd_brain.ol$mapped_read_count * ifelse(cd_brain.ol$paired_end, 2, 1)
#file size in megabytes
mb_tcga <- cd_brain.ol$gdc_file_size / 1e6

3.1 Age and disease information

Age at diagnosis is used for age instead of age of treatment/death. These information can still be acquired from the TCGA metadata information. merge these data with toupper(cd_brain.ol$gdc_file_id), which is the identifier that maps to the row names of the TCGA count data. The youngest individual in the TCGA_brain dataset is 14, so for development samples are split into adolescent/adult.

Disease is information from gdc_cases.samples.sample_type. disease status is binary from if the tissue was disease or normal tissues (5 samples). Also tumour-cDNA or cDNA is split the same way.

#age at diagnosis
age_at_diag <- cd_brain.ol$cgc_case_age_at_diagnosis

#age normalized for developmental stage
development_tcga <- c()
for(i in 1:length(cd_brain.ol$cgc_case_age_at_diagnosis)) {
  if(is.na(age_at_diag[i])) {
    development_tcga[i] <- NA
    next
  } 
  if(age_at_diag[i] < 20) {
    development_tcga[i] <- "Adolescent"
    next
  }
  development_tcga[i] <- "Adult"
}

#cDNA type and solid tissue normal
disease_status_tcga <- c()
selection_tcga <- c()
for(i in 1:length(cd_brain.ol$gdc_cases.samples.sample_type)) {
  if(cd_brain.ol$gdc_cases.samples.sample_type[i] ==  "Solid Tissue Normal") {
    disease_status_tcga[i] <- "Control"
    selection_tcga[i] <- "cDNA"
  } else {
    disease_status_tcga[i] <- "Disease"
    selection_tcga[i] <- "ctDNA"
  }
}

# Histological grade, data is changed to match recount brain
neoP <- tochr(cd_brain.ol$xml_neoplasm_histologic_grade)
neoP[is.na(neoP)] <- "0"
grade_adjust <- c()
for(i in 1:length(neoP)) {
  if(neoP[i] ==  "G2") {
    grade_adjust[i] <- "Grade II"
    next
  }
  if(neoP[i] == "G3") {
    grade_adjust[i] <- "Grade III"
    next
  }
  grade_adjust[i] <- NA
}

path <- cd_brain.ol$xml_ldh1_mutation_found
pathology_comp <- c()
for(i in 1:length(path)) {
  if(is.na(path[i])) {
    pathology_comp[i] <- NA
    next
  }
  if(path[i] == "YES") {
    pathology_comp[i] <- "+ IDH1 Mutation"
    next
  }
  if(path[i] == "NO") {
    pathology_comp[i] <- "- IDH1 Mutation"
    next
  }
  pathology_comp[i] <- path[i]
}
table(pathology_comp)

## pathology_comp
## - IDH1 Mutation + IDH1 Mutation 
##              34              91

# LGG or GBM
cancer_type <- substr(x = cd_brain.ol$gdc_cases.project.project_id, 6,
    nchar(cd_brain.ol$gdc_cases.project.project_id))

3.2 Combine

Combining TCGA data into the recount_brain_v1 format, some columns are consistent (e.g. all sequencing data was paired end) As such, the paired end sequencing column is “paired”.

TCGA_combn <- cbind(cd_brain.ol$cgc_case_age_at_diagnosis, "Years", "RNA_seq", TCGA_readlength,
    cd_brain.ol$cgc_case_id, cd_brain.ol$xml_patient_id,
    cd_brain.ol$gdc_cases.tissue_source_site.name,
    NA, NA, cd_brain.ol$gdc_center.name, grade_adjust,
    cd_brain.ol$gdc_cases.samples.sample_type,
    cd_brain.ol$gdc_metadata_files.access.analysis, development_tcga,
    cd_brain.ol$gdc_cases.samples.sample_type, disease_status_tcga,
    cd_brain.ol$gdc_metadata_files.file_id.experiment, NA, NA,
    cd_brain.ol$gdc_platform, toupper(cd_brain.ol$gdc_file_id), "paired", 
    selection_tcga, "TRANSCRIPTOMIC", cd_brain.ol$cgc_file_upload_date, NA,
    cd_brain.ol$gdc_file_size / 1e6, "Homo sapiens", pathology_comp,
    "Illumina", NA, NA, "frozen soon after surgery", "TRUE",
    cd_brain.ol$gdc_cases.demographic.race,
    cd_brain.ol$cgc_file_published_date, NA, NA, NA,"Brain",
    cd_brain.ol$gdc_cases.demographic.gender, NA,NA,cancer_type,NA,NA,
    tochr(cd_brain.ol$xml_histological_type), "Biopsy")
                    
rownames(cd_brain.ol) <- toupper(cd_brain.ol$gdc_file_id)

3.3 Drug information

The drug information in the cgc_drug_therapy_drug_name column contains multiple typos and ambiguous drug names. The script below adjusts these drug names to allow for consistency. drug_info_T informs the presence of drug information. drug_therapy_type distinguishes between chemo, radiation etc. Finally, 260/280 is the TCGA proxy of RNA quality. Some older cancers (i.e. OV) have RIN, however LGG and GBM moved over to 260/280.

dN <- toupper(cd_brain.ol$cgc_drug_therapy_drug_name)
drugName <- c()
# fixed typos in TCGA drugs
for(i in 1:length(dN)) { 
  if(is.na(dN[i])) {
    drugName[i] <- NA
    next
  }
  if(dN[i] %in% c("TEMOZOLAMIDE", "TEMOZOLOMIDE")) {
    drugName[i] <- "TEMOZOLOMIDE"
    next
  }
  
  if(dN[i] %in% c("TEMADOR","TEMODAR", "TEMODAR (ESCALATION)", "METRONOMIC TEMODAR")) {
    drugName[i] <- "TEMODAR"
    next
  }
  
  if(dN[i] %in% c("LOMUSTINE (CCNU)","LOMUSTINE", "LOMUSTIN")) {
    drugName[i] <- "LOMUSTINE"
    next
  }
  
  if(dN[i] %in% c("ISOTRETINOIN","ISOTRECTINOIN (ACCCUTANE)")) {
    drugName[i] <- "ISOTRETINOIN"
    next
  }
  
  if(dN[i] %in% c("I 131 81C6","I131-81C6")) {
    drugName[i] <- "I-131-81C6"
    next
  }
  
  if(dN[i] %in% c("HYDROXYUREA","HYDROYUREA")) {
    drugName[i] <- "HYDROXYUREA"
    next
  }
  
  if(dN[i] %in% c("GLIADEL WAFER","GLIADEL WAFER (BCNU)", "GLIADEL")) {
    drugName[i] <- "GLIADEL"
    next
  }
  
  if(dN[i] %in% c("DEXAMETHASONE","DEXMETHASONE")) {
    drugName[i] <- "DEXAMETHASONE"
    next
  }
  
  if(dN[i] %in% c("CPT11","CPT-11")) {
    drugName[i] <- "CPT11"
    next
  }
  
  if(dN[i] %in% c("CARMUSTINE", "CARMUSTIN", "CARMUSTINE (BCNU)", "CARMUSTINE BCNU")) {
    drugName[i] <- "CARMUSTINE"
    next
  }
  
  if(dN[i] %in% c("BEVACIZUMAB","BEVACIZUMAB OR PLACEBO RTOG 0825")) {
    drugName[i] <- "BEVACIZUMAB"
    next
  }
  
  if(dN[i] %in% c("BCNU","BCNU (CARMUSTINE)")) {
    drugName[i] <- "BCNU"
    next
  }
  drugName[i] <- dN[i]
}

drug_info_T <- cd_brain.ol$xml_has_drugs_information
drug_therapy_type <- cd_brain.ol$cgc_drug_therapy_pharmaceutical_therapy_type

T_260_280 <- cd_brain.ol$gdc_cases.samples.portions.analytes.a260_a280_ratio

4 Combining all and cleaning

The code below readjusts the order of recount_brain_v1. This is completed by insuring that the order of columns in recount_brain matches TCGA, GTEx, and the recount website. All of the column names are then matches.

recount_brain_reorder = recount_brain[,gsub(' ','', notes$Variable[1:48])]

colnames(TCGA_combn) <- colnames(GTEX_combn) <- colnames(recount_brain_reorder)

The data below cleans up the colData related to combining the three datasets and making a consistent identifier. The Study is the name of SRA study, TCGA, or GTEX. The _full columns are TCGA columns with the correct number of rows filled up for recount_brain and GTEx. Finally, these columns are combined together and the dataset is saved.

Study <- sub("\\..*","", rownames(recount_brain) )
Study_full <- c(Study, rep("TCGA", nrow(TCGA_combn)),
    rep("GTEX", nrow(GTEX_combn)))
Dataset <- c(rep("recount_brain_v1",length(Study)),
    rep("TCGA", nrow(TCGA_combn)), rep("GTEX", nrow(GTEX_combn)))
drugName_full <- c(rep(NA, length(Study)), drugName, rep(NA, nrow(GTEX_combn) ))
drug_info_full <- c(rep(NA, length(Study)), drug_info_T,
    rep(NA, nrow(GTEX_combn) ))
drug_type_full <- c(rep(NA, length(Study)), drug_therapy_type,
    rep(NA, nrow(GTEX_combn) ))
full_260_280<- c(rep(NA, length(Study)), T_260_280, rep(NA, nrow(GTEX_combn) ))
count_file_identifier <- c(recount_brain$run_s, rownames(cd_brain.ol),
    GTEx_brain_merge$run)
brain_meta <- rbind(recount_brain_reorder, TCGA_combn, GTEX_combn )
metadata_complete <- cbind(brain_meta, Study_full, drugName_full,
    drug_info_full, drug_type_full, full_260_280, count_file_identifier, Dataset)

4.1 Consistent names

The code below adjusts some of the major columns within the dataset to account for different datasets using slightly different names. For example, if you filter for “Primary”, you get all primary tumors instead of just the recount_brain_v1 primary tumors.

#Tissue site 1 adjust

tsite1 <- c()
ts <- metadata_complete$tissue_site_1
for(i in 1:nrow(metadata_complete)) {
  
  if(ts[i] %in% c("Caudate (basal ganglia)", "Caudate")) {
    tsite1[i] <- "Caudate"
    next
  }
  
  if(ts[i] %in% c("Frontal Cortex", "Frontal Cortex (BA9)")) {
    tsite1[i] <- "Frontal Cortex"
    next
  }
  
  if(ts[i] %in% c("Nucleus accumbens", "Nucleus accumbens (basal ganglia)")) {
    tsite1[i] <- "Nucleus accumbens"
    next
  }
  
  if(ts[i] %in% c("Putamen", "Putamen (basal ganglia)")) {
    tsite1[i] <- "Putamen"
    next
  }
 tsite1[i] <- ts[i] 
}

metadata_complete$tissue_site_1 <- tsite1


# Adjusting the disease information so that tumour information is consistent 

dis <- c() # Note, In azheimer's disease and Parkinson's disease there was a minor error with the encoding of the apostrophe. You will likely need to adjust these individuals manually 
for(i in 1:length(metadata_complete$disease)) {
  if(metadata_complete$disease[i] %in% c("Brain tumor", "Tumor")) {
    dis[i] <- "brain tumor unspecified"
    next
  }
  dis[i] <- metadata_complete$disease[i]
}

metadata_complete$disease <- dis

clinStage2 <- c()
for(i in 1:length(metadata_complete$clinical_stage_2)) {
  if(is.na(metadata_complete$clinical_stage_2[i])) {
    clinStage2[i] <- NA
    next
  }
  if(metadata_complete$clinical_stage_2[i] %in% c("Primary Tumor")) {
    clinStage2[i] <- "Primary"
    next
  }
  if(metadata_complete$clinical_stage_2[i] %in% c("Recurrent Tumor")) {
    clinStage2[i] <- "Recurrent"
    next
  }
  clinStage2[i] <- metadata_complete$clinical_stage_2[i]
}

metadata_complete$clinical_stage_2 <- clinStage2

# Fixing capital in consernt
metadata_complete$consent_s <- toupper(metadata_complete$consent_s)

race_adjusted <- toupper(metadata_complete$race)
for(i in 1:length(race_adjusted)) {
  if(race_adjusted[i] %in% "BLACK OR AFRICAN AMERICAN") {
    race_adjusted[i] <- "BLACK"
  }  
  
}
metadata_complete$race <- race_adjusted


#Information on sample origin: iPSC conistency
origin <- metadata_complete$sample_origin
for(i in 1:length(origin)) {
    if(origin[i] %in% "iPSCs") {
    origin[i] <- "iPSC"
  }  
}
metadata_complete$sample_origin <- origin

# making sure that oligodendroglioma/oligodendrogliomas are different
t_type <- metadata_complete$tumor_type
for(i in 1:length(t_type)) {
  if(t_type[i] %in% "Anaplastic Oligodendrogliomas") {
    t_type[i] <- "Anaplastic Oligodendroglioma"
    next
  }  
  

}
metadata_complete$sample_origin <- origin
metadata_complete$tumor_type <- t_type

# Converting run_s to also contain the identifier. This allows recount_brain_v2 to be accessed via the "add_metadata()" function

metadata_complete$run_s <-  metadata_complete$count_file_identifier

4.2 Create `recount_brain_v2`

The final code chunk checks the final dimensions and md5sum object of recount_brain_v2 before saving it into an Rdata object and listing variables.

#Completed metadata is the combined and saved
recount_brain <- metadata_complete
dim(recount_brain)

## [1] 6547   55

## For compatibility with add_metadata()
recount_brain$run_s <- as.character(recount_brain$run_s)

## Re-cast some vars
recount_brain$count_file_identifier <- as.character(recount_brain$count_file_identifier)
recount_brain$drug_info_full <- recount_brain$drug_info_full == 'YES'
recount_brain$rin <- as.numeric(recount_brain$rin)
recount_brain$pmi <- as.numeric(recount_brain$pmi)
recount_brain$avgspotlen_l <- as.numeric(recount_brain$avgspotlen_l)
recount_brain$insertsize_l <- as.numeric(recount_brain$insertsize_l)
recount_brain$mbases_l <- as.integer(recount_brain$mbases_l)
recount_brain$mbytes_l <- as.numeric(recount_brain$mbytes_l)
recount_brain$brodmann_area <- as.integer(recount_brain$brodmann_area)
recount_brain$present_in_recount <- as.logical(recount_brain$present_in_recount)

## Simplify age by turning ranges such as 20-29 to mean(c(20, 29))
mean_age <- function(x) {
    mean(as.integer(strsplit(x, '-')[[1]]))
}
age <- as.numeric(recount_brain$age)

## Warning: NAs introduced by coercion

age[grepl('-', recount_brain$age)] <- sapply(
    recount_brain$age[grepl('-', recount_brain$age)], mean_age)
recount_brain$age <- age

## Between version 1 and 2, these are the columns that change types
r <- add_metadata(source = 'recount_brain_v1')

## 2020-11-13 16:24:28 downloading the recount_brain metadata to /tmp/RtmpK9pZcs/recount_brain_v1.Rdata

## Loading objects:
##   recount_brain

x <- sapply(r, class) == sapply(recount_brain[, colnames(r)], class)
sapply(recount_brain[, colnames(r)], class)[!x]

## avgspotlen_l insertsize_l     mbytes_l 
##    "numeric"    "numeric"    "numeric"

sapply(r, class)[!x]

## avgspotlen_l insertsize_l     mbytes_l 
##    "integer"    "integer"    "integer"

## Save the data
save(recount_brain, file = 'recount_brain_v2_noOntology.Rdata')
write.csv(recount_brain, file = 'recount_brain_v2_noOntology.csv', quote = TRUE,
    row.names = FALSE)

## Check md5sum for the resulting files
sapply(dir(pattern = 'recount_brain_v2'), tools::md5sum)

##     recount_brain_v2_noOntology.csv.recount_brain_v2_noOntology.csv 
##                                  "e7855403fac9dc4d6345908c1e5da5a7" 
## recount_brain_v2_noOntology.Rdata.recount_brain_v2_noOntology.Rdata 
##                                  "aa95cc6a34b77b9062e2f77da0cac286" 
##                           recount_brain_v2.csv.recount_brain_v2.csv 
##                                  "2ab643a4ce55d731c637456ff50ef36b" 
##                       recount_brain_v2.Rdata.recount_brain_v2.Rdata 
##                                  "0cc562916ced9f2bf4fb2b9a1a446121"

## List of all variables
colnames(recount_brain)

##  [1] "age"                   "age_units"             "assay_type_s"          "avgspotlen_l"         
##  [5] "bioproject_s"          "biosample_s"           "brain_bank"            "brodmann_area"        
##  [9] "cell_line"             "center_name_s"         "clinical_stage_1"      "clinical_stage_2"     
## [13] "consent_s"             "development"           "disease"               "disease_status"       
## [17] "experiment_s"          "hemisphere"            "insertsize_l"          "instrument_s"         
## [21] "library_name_s"        "librarylayout_s"       "libraryselection_s"    "librarysource_s"      
## [25] "loaddate_s"            "mbases_l"              "mbytes_l"              "organism_s"           
## [29] "pathology"             "platform_s"            "pmi"                   "pmi_units"            
## [33] "preparation"           "present_in_recount"    "race"                  "releasedate_s"        
## [37] "rin"                   "run_s"                 "sample_name_s"         "sample_origin"        
## [41] "sex"                   "sra_sample_s"          "sra_study_s"           "tissue_site_1"        
## [45] "tissue_site_2"         "tissue_site_3"         "tumor_type"            "viability"            
## [49] "Study_full"            "drugName_full"         "drug_info_full"        "drug_type_full"       
## [53] "full_260_280"          "count_file_identifier" "Dataset"

5 Explore `recount_brain_v2`

Below provides some summary statistics on the merged dataset. Below there are some pivot tables of columns split by the major dataset.

#Sex
table(recount_brain$sex, recount_brain$Dataset)

##         
##          GTEX recount_brain_v1 TCGA
##   female  442              259  298
##   male    967              695  402
##   pooled    0             2938    0

#Development 
table(recount_brain$development, recount_brain$Dataset)

##             
##              GTEX recount_brain_v1 TCGA
##   Adolescent    0               35    4
##   Adult      1409              963  696
##   Child         0               58    0
##   Fetus         0               38    0
##   Infant        0               47    0

#Tumor type
table(recount_brain$tumor_type, recount_brain$Dataset)

##                                     
##                                      GTEX recount_brain_v1 TCGA
##   Anaplastic Astrocytomas               0               24    0
##   Anaplastic Oligodendroastrocytomas    0               36    0
##   Anaplastic Oligodendroglioma          0               19    0
##   Astrocytoma                           0               63  196
##   Glioblastoma                          0              206    0
##   Glioblastoma Multiforme (GBM)         0                0    1
##   normal                                0                8    0
##   Oligoastrocytoma                      0                9  135
##   Oligodendroastrocytoma                0               37    0
##   Oligodendroglioma                     0               49  200
##   Treated primary GBM                   0                0    1
##   Untreated primary (de novo) GBM       0                0  167

# Clinical stage 2
table(recount_brain$clinical_stage_2, recount_brain$Dataset)

##                      
##                       GTEX recount_brain_v1 TCGA
##   Familial               0               16    0
##   Grade IV               0                7    0
##   Primary                0               64  671
##   Recurrent              0               61   31
##   Secondary              0               21    0
##   Solid Tissue Normal    0                0    5
##   Sporadic               0               20    0

# tissue_site 1
table(recount_brain$tissue_site_1, recount_brain$Dataset)

##                                   
##                                    GTEX recount_brain_v1 TCGA
##   Amygdala                           81                0    0
##   Anterior cingulate cortex (BA24)   99                0    0
##   Brainstem                           0                2    0
##   Caudate                           134                5    0
##   Cerebellar Hemisphere             118                0    0
##   Cerebellum                        145               29    0
##   Cerebral cortex                     0              638    0
##   Corpus callosum                     0               13    0
##   Cortex                            132                0    0
##   Dura mater                          0                1    0
##   Frontal Cortex                    120               27    0
##   GBM                                 0                0  175
##   Hippocampus                       103               25    0
##   Hypothalamus                      104                0    0
##   LGG                                 0                0  532
##   Lumbar spinal cord                  0               41    0
##   Mixed                               0                6    0
##   Nucleus accumbens                 123                1    0
##   Putamen                           103                6    0
##   Spinal cord (cervical c-1)         76                0    0
##   Substantia nigra                   71                1    0
##   Whole brain                         0                2    0

# present in recount

table(recount_brain$present_in_recount, recount_brain$Dataset)

##        
##         GTEX recount_brain_v1 TCGA
##   FALSE    0             1217    0
##   TRUE  1409             3214  707

Full summary:

summary(recount_brain)

##       age          age_units         assay_type_s        avgspotlen_l     bioproject_s       biosample_s       
##  Min.   :  1.00   Length:6547        Length:6547        Min.   :  27.00   Length:6547        Length:6547       
##  1st Qu.: 40.00   Class :character   Class :character   1st Qu.:  95.57   Class :character   Class :character  
##  Median : 54.50   Mode  :character   Mode  :character   Median : 152.00   Mode  :character   Mode  :character  
##  Mean   : 50.77                                         Mean   : 157.56                                        
##  3rd Qu.: 64.50                                         3rd Qu.: 200.00                                        
##  Max.   :106.00                                         Max.   :2017.00                                        
##  NA's   :3446                                                                                                  
##   brain_bank        brodmann_area    cell_line         center_name_s      clinical_stage_1   clinical_stage_2  
##  Length:6547        Min.   : 4.00   Length:6547        Length:6547        Length:6547        Length:6547       
##  Class :character   1st Qu.: 9.00   Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Median : 9.00   Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                     Mean   :14.61                                                                              
##                     3rd Qu.:24.00                                                                              
##                     Max.   :46.00                                                                              
##                     NA's   :6003                                                                               
##   consent_s         development          disease          disease_status     experiment_s        hemisphere       
##  Length:6547        Length:6547        Length:6547        Length:6547        Length:6547        Length:6547       
##  Class :character   Class :character   Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                                                                   
##                                                                                                                   
##                                                                                                                   
##                                                                                                                   
##   insertsize_l     instrument_s       library_name_s     librarylayout_s    libraryselection_s librarysource_s   
##  Min.   :  0.000   Length:6547        Length:6547        Length:6547        Length:6547        Length:6547       
##  1st Qu.:  0.000   Class :character   Class :character   Class :character   Class :character   Class :character  
##  Median :  0.000   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :  3.124                                                                                                 
##  3rd Qu.:  0.000                                                                                                 
##  Max.   :245.000                                                                                                 
##  NA's   :2116                                                                                                    
##   loaddate_s           mbases_l          mbytes_l      organism_s         pathology          platform_s       
##  Length:6547        Min.   :    0.0   Min.   :    0   Length:6547        Length:6547        Length:6547       
##  Class :character   1st Qu.:  787.5   1st Qu.:  640   Class :character   Class :character   Class :character  
##  Mode  :character   Median : 1542.0   Median : 1272   Mode  :character   Mode  :character   Mode  :character  
##                     Mean   : 2872.6   Mean   : 2488                                                           
##                     3rd Qu.: 2660.0   3rd Qu.: 3226                                                           
##                     Max.   :52310.0   Max.   :35161                                                           
##                     NA's   :2116      NA's   :1409                                                            
##       pmi          pmi_units         preparation        present_in_recount     race           releasedate_s     
##  Min.   :   0.0   Length:6547        Length:6547        Mode :logical      Length:6547        Length:6547       
##  1st Qu.:   0.0   Class :character   Class :character   FALSE:1217         Class :character   Class :character  
##  Median :   6.0   Mode  :character   Mode  :character   TRUE :5330         Mode  :character   Mode  :character  
##  Mean   : 152.1                                                                                                 
##  3rd Qu.:  21.0                                                                                                 
##  Max.   :1442.0                                                                                                 
##  NA's   :5988                                                                                                   
##       rin           run_s           sample_name_s      sample_origin          sex            sra_sample_s      
##  Min.   :1.500   Length:6547        Length:6547        Length:6547        Length:6547        Length:6547       
##  1st Qu.:6.500   Class :character   Class :character   Class :character   Class :character   Class :character  
##  Median :7.100   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :7.209                                                                                                 
##  3rd Qu.:7.900                                                                                                 
##  Max.   :9.800                                                                                                 
##  NA's   :4828                                                                                                  
##  sra_study_s        tissue_site_1      tissue_site_2      tissue_site_3       tumor_type         viability        
##  Length:6547        Length:6547        Length:6547        Length:6547        Length:6547        Length:6547       
##  Class :character   Class :character   Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                                                                   
##                                                                                                                   
##                                                                                                                   
##                                                                                                                   
##   Study_full        drugName_full      drug_info_full  drug_type_full      full_260_280   count_file_identifier
##  Length:6547        Length:6547        Mode :logical   Length:6547        Min.   :1.500   Length:6547          
##  Class :character   Class :character   FALSE:278       Class :character   1st Qu.:1.800   Class :character     
##  Mode  :character   Mode  :character   TRUE :422       Mode  :character   Median :1.810   Mode  :character     
##                                        NA's :5847                         Mean   :1.835                        
##                                                                           3rd Qu.:1.880                        
##                                                                           Max.   :2.270                        
##                                                                           NA's   :5972                         
##    Dataset         
##  Length:6547       
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
##

6 Reproducibility

This document was made possible thanks to:

R (R Core Team, 2020)
BiocStyle (Oleś, Morgan, and Huber, 2020)
devtools (Wickham, Hester, and Chang, 2020)
knitcitations (Boettiger, 2019)
knitr (Xie, 2014)
recount (Collado-Torres, Nellore, Kammers, Ellis, et al., 2017)
rmarkdown (Allaire, Xie, McPherson, Luraschi, et al., 2020)

Code for creating this document

## Create the vignette
library('rmarkdown')
system.time(render('cross_studies_metadata.Rmd', 'BiocStyle::html_document'))

Reproducibility information for this document.

## Reproducibility info
proc.time()

##    user  system elapsed 
##  33.407   3.364  51.681

message(Sys.time())

## 2020-11-13 16:24:28

options(width = 120)
session_info()

## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                                      
##  version  R version 4.0.2 Patched (2020-06-24 r78746)
##  os       CentOS Linux 7 (Core)                      
##  system   x86_64, linux-gnu                          
##  ui       X11                                        
##  language (EN)                                       
##  collate  en_US.UTF-8                                
##  ctype    en_US.UTF-8                                
##  tz       US/Eastern                                 
##  date     2020-11-13                                 
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package              * version  date       lib source        
##  AnnotationDbi          1.50.3   2020-07-25 [2] Bioconductor  
##  askpass                1.1      2019-01-13 [2] CRAN (R 4.0.0)
##  assertthat             0.2.1    2019-03-21 [2] CRAN (R 4.0.0)
##  backports              1.2.0    2020-11-02 [1] CRAN (R 4.0.2)
##  base64enc              0.1-3    2015-07-28 [2] CRAN (R 4.0.0)
##  bibtex                 0.4.2.3  2020-09-19 [2] CRAN (R 4.0.2)
##  Biobase              * 2.48.0   2020-04-27 [2] Bioconductor  
##  BiocFileCache          1.12.1   2020-08-04 [2] Bioconductor  
##  BiocGenerics         * 0.34.0   2020-04-27 [2] Bioconductor  
##  BiocManager            1.30.10  2019-11-16 [2] CRAN (R 4.0.0)
##  BiocParallel           1.22.0   2020-04-27 [2] Bioconductor  
##  BiocStyle            * 2.16.1   2020-09-25 [1] Bioconductor  
##  biomaRt                2.44.4   2020-10-13 [2] Bioconductor  
##  Biostrings             2.56.0   2020-04-27 [2] Bioconductor  
##  bit                    4.0.4    2020-08-04 [2] CRAN (R 4.0.2)
##  bit64                  4.0.5    2020-08-30 [2] CRAN (R 4.0.2)
##  bitops                 1.0-6    2013-08-17 [2] CRAN (R 4.0.0)
##  blob                   1.2.1    2020-01-20 [2] CRAN (R 4.0.0)
##  bookdown               0.21     2020-10-13 [1] CRAN (R 4.0.2)
##  BSgenome               1.56.0   2020-04-27 [2] Bioconductor  
##  bumphunter             1.30.0   2020-04-27 [2] Bioconductor  
##  callr                  3.5.1    2020-10-13 [2] CRAN (R 4.0.2)
##  checkmate              2.0.0    2020-02-06 [2] CRAN (R 4.0.0)
##  cli                    2.1.0    2020-10-12 [2] CRAN (R 4.0.2)
##  cluster                2.1.0    2019-06-19 [3] CRAN (R 4.0.2)
##  codetools              0.2-16   2018-12-24 [3] CRAN (R 4.0.2)
##  colorspace             1.4-1    2019-03-18 [2] CRAN (R 4.0.0)
##  crayon                 1.3.4    2017-09-16 [2] CRAN (R 4.0.0)
##  curl                   4.3      2019-12-02 [2] CRAN (R 4.0.0)
##  data.table             1.13.2   2020-10-19 [2] CRAN (R 4.0.2)
##  DBI                    1.1.0    2019-12-15 [2] CRAN (R 4.0.0)
##  dbplyr                 2.0.0    2020-11-03 [1] CRAN (R 4.0.2)
##  DelayedArray         * 0.14.1   2020-07-14 [2] Bioconductor  
##  derfinder              1.22.0   2020-04-27 [2] Bioconductor  
##  derfinderHelper        1.22.0   2020-04-27 [2] Bioconductor  
##  desc                   1.2.0    2018-05-01 [2] CRAN (R 4.0.0)
##  devtools             * 2.3.2    2020-09-18 [2] CRAN (R 4.0.2)
##  digest                 0.6.27   2020-10-24 [1] CRAN (R 4.0.2)
##  doRNG                  1.8.2    2020-01-27 [2] CRAN (R 4.0.0)
##  downloader             0.4      2015-07-09 [2] CRAN (R 4.0.0)
##  dplyr                  1.0.2    2020-08-18 [2] CRAN (R 4.0.2)
##  ellipsis               0.3.1    2020-05-15 [2] CRAN (R 4.0.0)
##  evaluate               0.14     2019-05-28 [2] CRAN (R 4.0.0)
##  fansi                  0.4.1    2020-01-08 [2] CRAN (R 4.0.0)
##  foreach                1.5.1    2020-10-15 [2] CRAN (R 4.0.2)
##  foreign                0.8-80   2020-05-24 [3] CRAN (R 4.0.2)
##  Formula                1.2-4    2020-10-16 [2] CRAN (R 4.0.2)
##  fs                     1.5.0    2020-07-31 [1] CRAN (R 4.0.2)
##  generics               0.1.0    2020-10-31 [1] CRAN (R 4.0.2)
##  GenomeInfoDb         * 1.24.2   2020-06-15 [2] Bioconductor  
##  GenomeInfoDbData       1.2.3    2020-05-18 [2] Bioconductor  
##  GenomicAlignments      1.24.0   2020-04-27 [2] Bioconductor  
##  GenomicFeatures        1.40.1   2020-07-08 [2] Bioconductor  
##  GenomicFiles           1.24.0   2020-04-27 [2] Bioconductor  
##  GenomicRanges        * 1.40.0   2020-04-27 [2] Bioconductor  
##  GEOquery               2.56.0   2020-04-27 [2] Bioconductor  
##  ggplot2                3.3.2    2020-06-19 [2] CRAN (R 4.0.2)
##  glue                   1.4.2    2020-08-27 [1] CRAN (R 4.0.2)
##  gridExtra              2.3      2017-09-09 [2] CRAN (R 4.0.0)
##  gtable                 0.3.0    2019-03-25 [2] CRAN (R 4.0.0)
##  Hmisc                  4.4-1    2020-08-10 [2] CRAN (R 4.0.2)
##  hms                    0.5.3    2020-01-08 [2] CRAN (R 4.0.0)
##  htmlTable              2.1.0    2020-09-16 [2] CRAN (R 4.0.2)
##  htmltools              0.5.0    2020-06-16 [2] CRAN (R 4.0.2)
##  htmlwidgets            1.5.2    2020-10-03 [2] CRAN (R 4.0.2)
##  httr                   1.4.2    2020-07-20 [2] CRAN (R 4.0.2)
##  IRanges              * 2.22.2   2020-05-21 [2] Bioconductor  
##  iterators              1.0.13   2020-10-15 [2] CRAN (R 4.0.2)
##  jpeg                   0.1-8.1  2019-10-24 [2] CRAN (R 4.0.0)
##  jsonlite               1.7.1    2020-09-07 [2] CRAN (R 4.0.2)
##  knitcitations        * 1.0.10   2019-09-15 [1] CRAN (R 4.0.2)
##  knitr                  1.30     2020-09-22 [1] CRAN (R 4.0.2)
##  lattice                0.20-41  2020-04-02 [3] CRAN (R 4.0.2)
##  latticeExtra           0.6-29   2019-12-19 [2] CRAN (R 4.0.0)
##  lifecycle              0.2.0    2020-03-06 [2] CRAN (R 4.0.0)
##  limma                  3.44.3   2020-06-12 [2] Bioconductor  
##  locfit                 1.5-9.4  2020-03-25 [2] CRAN (R 4.0.0)
##  lubridate              1.7.9    2020-06-08 [1] CRAN (R 4.0.0)
##  magick                 2.5.2    2020-11-10 [1] CRAN (R 4.0.2)
##  magrittr               1.5      2014-11-22 [2] CRAN (R 4.0.0)
##  Matrix                 1.2-18   2019-11-27 [3] CRAN (R 4.0.2)
##  matrixStats          * 0.57.0   2020-09-25 [2] CRAN (R 4.0.2)
##  memoise                1.1.0    2017-04-21 [2] CRAN (R 4.0.0)
##  munsell                0.5.0    2018-06-12 [2] CRAN (R 4.0.0)
##  nnet                   7.3-14   2020-04-26 [3] CRAN (R 4.0.2)
##  openssl                1.4.3    2020-09-18 [2] CRAN (R 4.0.2)
##  pillar                 1.4.6    2020-07-10 [2] CRAN (R 4.0.2)
##  pkgbuild               1.1.0    2020-07-13 [2] CRAN (R 4.0.2)
##  pkgconfig              2.0.3    2019-09-22 [2] CRAN (R 4.0.0)
##  pkgload                1.1.0    2020-05-29 [2] CRAN (R 4.0.2)
##  plyr                   1.8.6    2020-03-03 [2] CRAN (R 4.0.0)
##  png                    0.1-7    2013-12-03 [2] CRAN (R 4.0.0)
##  prettyunits            1.1.1    2020-01-24 [2] CRAN (R 4.0.0)
##  processx               3.4.4    2020-09-03 [2] CRAN (R 4.0.2)
##  progress               1.2.2    2019-05-16 [2] CRAN (R 4.0.0)
##  ps                     1.4.0    2020-10-07 [2] CRAN (R 4.0.2)
##  purrr                  0.3.4    2020-04-17 [2] CRAN (R 4.0.0)
##  qvalue                 2.20.0   2020-04-27 [2] Bioconductor  
##  R6                     2.5.0    2020-10-28 [1] CRAN (R 4.0.2)
##  rappdirs               0.3.1    2016-03-28 [2] CRAN (R 4.0.0)
##  RColorBrewer           1.1-2    2014-12-07 [2] CRAN (R 4.0.0)
##  Rcpp                   1.0.5    2020-07-06 [2] CRAN (R 4.0.2)
##  RCurl                  1.98-1.2 2020-04-18 [2] CRAN (R 4.0.0)
##  readr                  1.4.0    2020-10-05 [2] CRAN (R 4.0.2)
##  recount              * 1.14.0   2020-04-27 [2] Bioconductor  
##  RefManageR             1.2.12   2019-04-03 [1] CRAN (R 4.0.2)
##  remotes                2.2.0    2020-07-21 [2] CRAN (R 4.0.2)
##  rentrez                1.2.2    2019-05-02 [2] CRAN (R 4.0.0)
##  reshape2               1.4.4    2020-04-09 [2] CRAN (R 4.0.0)
##  rlang                  0.4.8    2020-10-08 [1] CRAN (R 4.0.2)
##  rmarkdown            * 2.5      2020-10-21 [1] CRAN (R 4.0.2)
##  rngtools               1.5      2020-01-23 [2] CRAN (R 4.0.0)
##  rpart                  4.1-15   2019-04-12 [3] CRAN (R 4.0.2)
##  rprojroot              1.3-2    2018-01-03 [2] CRAN (R 4.0.0)
##  Rsamtools              2.4.0    2020-04-27 [2] Bioconductor  
##  RSQLite                2.2.1    2020-09-30 [2] CRAN (R 4.0.2)
##  rstudioapi             0.11     2020-02-07 [2] CRAN (R 4.0.0)
##  rtracklayer            1.48.0   2020-04-27 [2] Bioconductor  
##  S4Vectors            * 0.26.1   2020-05-16 [2] Bioconductor  
##  scales                 1.1.1    2020-05-11 [2] CRAN (R 4.0.0)
##  sessioninfo            1.1.1    2018-11-05 [2] CRAN (R 4.0.0)
##  stringi                1.5.3    2020-09-09 [2] CRAN (R 4.0.2)
##  stringr                1.4.0    2019-02-10 [2] CRAN (R 4.0.0)
##  SummarizedExperiment * 1.18.2   2020-07-09 [2] Bioconductor  
##  survival               3.2-3    2020-06-13 [3] CRAN (R 4.0.2)
##  testthat               3.0.0    2020-10-31 [1] CRAN (R 4.0.2)
##  tibble                 3.0.4    2020-10-12 [2] CRAN (R 4.0.2)
##  tidyr                  1.1.2    2020-08-27 [2] CRAN (R 4.0.2)
##  tidyselect             1.1.0    2020-05-11 [2] CRAN (R 4.0.0)
##  usethis              * 1.6.3    2020-09-17 [2] CRAN (R 4.0.2)
##  VariantAnnotation      1.34.0   2020-04-27 [2] Bioconductor  
##  vctrs                  0.3.4    2020-08-29 [1] CRAN (R 4.0.2)
##  withr                  2.3.0    2020-09-22 [2] CRAN (R 4.0.2)
##  xfun                   0.19     2020-10-30 [1] CRAN (R 4.0.2)
##  XML                    3.99-0.5 2020-07-23 [2] CRAN (R 4.0.2)
##  xml2                   1.3.2    2020-04-23 [2] CRAN (R 4.0.0)
##  XVector                0.28.0   2020-04-27 [2] Bioconductor  
##  yaml                   2.2.1    2020-02-01 [2] CRAN (R 4.0.0)
##  zlibbioc               1.34.0   2020-04-27 [2] Bioconductor  
## 
## [1] /users/neagles/R/4.0
## [2] /jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/svnR-4.0/R/4.0/lib64/R/site-library
## [3] /jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/svnR-4.0/R/4.0/lib64/R/library

7 Bibliography

This document was generated using BiocStyle (Oleś, Morgan, and Huber, 2020) with knitr (Xie, 2014) and rmarkdown (Allaire, Xie, McPherson, Luraschi, et al., 2020) running behind the scenes.

Citations made with knitcitations (Boettiger, 2019) and the bibliographical file is available here.

[1] J. Allaire, Y. Xie, J. McPherson, J. Luraschi, et al. rmarkdown: Dynamic Documents for R. R package version 2.5. 2020. <URL: https://github.com/rstudio/rmarkdown>.

[2] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.10. 2019. <URL: https://CRAN.R-project.org/package=knitcitations>.

[3] L. Collado-Torres, A. Nellore, K. Kammers, S. E. Ellis, et al. “Reproducible RNA-seq analysis using recount2”. In: Nature Biotechnology (2017). DOI: 10.1038/nbt.3838. <URL: http://www.nature.com/nbt/journal/v35/n4/full/nbt.3838.html>.

[4] A. Oleś, M. Morgan, and W. Huber. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.16.1. 2020. <URL: https://github.com/Bioconductor/BiocStyle>.

[5] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2020. <URL: https://www.R-project.org/>.

[6] H. Wickham, J. Hester, and W. Chang. devtools: Tools to Make Developing R Packages Easier. R package version 2.3.2. 2020. <URL: https://CRAN.R-project.org/package=devtools>.

[7] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. <URL: http://www.crcpress.com/product/isbn/9781466561595>.