MetaSRA
(Bernstein, Doan, and Dewey, 2017) contains “normalized metadata for the Sequence Read Archive” which is constructed using the SRA Run Info tables. The MetaSRA
(Bernstein, Doan, and Dewey, 2017) authors provide a website where you can query the samples by term such as the brain
which leads to metasra.biostat.wisc.edu/?and=UBERON:0000955. As of April 15th, 2019 they have 17,890 brain samples from 342 studies listed.
We can download the data using the following link:
## April 15, 2019
wget http://metasra.biostat.wisc.edu/api/v01/samples.csv?and=UBERON:0000955
Next, we load the required R packages.
library('recount')
library('tidyverse')
Now we can get all the required data
## Read the MetaSRA data
metasra <- read.csv('samples.csv?and=UBERON:0000955')
head(metasra)
## study_id study_title sample_id sample_name
## 1 SRP052546 Single Cell Analysis Program-Transcriptomics (SCAP-T) (UC San Diego site) SRS1341305
## 2 SRP052546 Single Cell Analysis Program-Transcriptomics (SCAP-T) (UC San Diego site) SRS1341312
## 3 SRP052546 Single Cell Analysis Program-Transcriptomics (SCAP-T) (UC San Diego site) SRS1341782
## 4 SRP052546 Single Cell Analysis Program-Transcriptomics (SCAP-T) (UC San Diego site) SRS1341315
## 5 SRP052546 Single Cell Analysis Program-Transcriptomics (SCAP-T) (UC San Diego site) SRS1341318
## 6 SRP052546 Single Cell Analysis Program-Transcriptomics (SCAP-T) (UC San Diego site) SRS1341322
## sample_type sample_type_confidence mapped_ontology_ids
## 1 primary cells 0.8799537 CL:0000540, EFO:0003534, UBERON:0003100, UBERON:0013541
## 2 primary cells 0.8799537 CL:0000540, EFO:0003534, UBERON:0003100, UBERON:0013541
## 3 primary cells 0.8799537 CL:0000540, EFO:0003534, UBERON:0003100, UBERON:0013541
## 4 primary cells 0.8799537 CL:0000540, EFO:0003534, UBERON:0003100, UBERON:0013541
## 5 primary cells 0.8799537 CL:0000540, EFO:0003534, UBERON:0003100, UBERON:0013541
## 6 primary cells 0.8799537 CL:0000540, EFO:0003534, UBERON:0003100, UBERON:0013541
## mapped_ontology_terms
## 1 neuron, dorsal telencephalon, female organism, Brodmann (1909) area 10
## 2 neuron, dorsal telencephalon, female organism, Brodmann (1909) area 10
## 3 neuron, dorsal telencephalon, female organism, Brodmann (1909) area 10
## 4 neuron, dorsal telencephalon, female organism, Brodmann (1909) area 10
## 5 neuron, dorsal telencephalon, female organism, Brodmann (1909) area 10
## 6 neuron, dorsal telencephalon, female organism, Brodmann (1909) area 10
## raw_SRA_metadata
## 1 analyte type: RNA; biospecimen repository: SCAP-T; body site: Left Brodmann's Area 10 (Prefrontal Cortex); gap_accession: phs000834; gap_consent_code: 1; gap_consent_short_name: GRU; gap_parent_phs: phs000833; histological type: Brain cell (Neuron); is technical control: Yes; is tumor: No; molecular data type: RNA Seq (NGS); sex: female; study design: Case Set; study name: Single Cell Analysis Program - Transcriptome (SCAP-T) (UCSD); submitter handle: SCAP-T
## 2 analyte type: RNA; biospecimen repository: SCAP-T; body site: Left Brodmann's Area 10 (Prefrontal Cortex); gap_accession: phs000834; gap_consent_code: 1; gap_consent_short_name: GRU; gap_parent_phs: phs000833; histological type: Brain cell (Neuron); is technical control: Yes; is tumor: No; molecular data type: RNA Seq (NGS); sex: female; study design: Case Set; study name: Single Cell Analysis Program - Transcriptome (SCAP-T) (UCSD); submitter handle: SCAP-T
## 3 analyte type: RNA; biospecimen repository: SCAP-T; body site: Left Brodmann's Area 10 (Prefrontal Cortex); gap_accession: phs000834; gap_consent_code: 1; gap_consent_short_name: GRU; gap_parent_phs: phs000833; histological type: Brain cell (Neuron); is technical control: Yes; is tumor: No; molecular data type: RNA Seq (NGS); sex: female; study design: Case Set; study name: Single Cell Analysis Program - Transcriptome (SCAP-T) (UCSD); submitter handle: SCAP-T
## 4 analyte type: RNA; biospecimen repository: SCAP-T; body site: Left Brodmann's Area 10 (Prefrontal Cortex); gap_accession: phs000834; gap_consent_code: 1; gap_consent_short_name: GRU; gap_parent_phs: phs000833; histological type: Brain cell (Neuron); is technical control: Yes; is tumor: No; molecular data type: RNA Seq (NGS); sex: female; study design: Case Set; study name: Single Cell Analysis Program - Transcriptome (SCAP-T) (UCSD); submitter handle: SCAP-T
## 5 analyte type: RNA; biospecimen repository: SCAP-T; body site: Left Brodmann's Area 10 (Prefrontal Cortex); gap_accession: phs000834; gap_consent_code: 1; gap_consent_short_name: GRU; gap_parent_phs: phs000833; histological type: Brain cell (Neuron); is technical control: Yes; is tumor: No; molecular data type: RNA Seq (NGS); sex: female; study design: Case Set; study name: Single Cell Analysis Program - Transcriptome (SCAP-T) (UCSD); submitter handle: SCAP-T
## 6 analyte type: RNA; biospecimen repository: SCAP-T; body site: Left Brodmann's Area 10 (Prefrontal Cortex); gap_accession: phs000834; gap_consent_code: 1; gap_consent_short_name: GRU; gap_parent_phs: phs000833; histological type: Brain cell (Neuron); is technical control: Yes; is tumor: No; molecular data type: RNA Seq (NGS); sex: female; study design: Case Set; study name: Single Cell Analysis Program - Transcriptome (SCAP-T) (UCSD); submitter handle: SCAP-T
## Get the unique 342 studies
metasra_study <- unique(metasra$study_id)
stopifnot(length(metasra_study) == 342)
## Get the recount2 metadata
meta <- all_metadata()
## 2020-11-13 16:25:45 downloading the metadata to /tmp/RtmpK9pZcs/metadata_clean_sra.Rdata
## Load the predictions
PredictedPhenotypes <- add_predictions(version = '0.0.03')
## 2020-11-13 16:25:47 downloading the predictions to /tmp/RtmpK9pZcs/PredictedPhenotypes_v0.0.03.rda
## Loading objects:
## PredictedPhenotypes
PredictedPhenotypes_latest <- add_predictions(version = '0.0.06')
## 2020-11-13 16:25:48 downloading the predictions to /tmp/RtmpK9pZcs/PredictedPhenotypes_v0.0.06.rda
## Loading objects:
## PredictedPhenotypes
## Get recount-brain using the recount Bioconductor package
recount_brain <- add_metadata(source = 'recount_brain_v2')
## 2020-11-13 16:25:48 downloading the recount_brain metadata to /tmp/RtmpK9pZcs/recount_brain_v2.Rdata
## Loading objects:
## recount_brain
MetaSRA
to recount_brain
First, we can check how many studies with at least one brain sample as detected with MetaSRA
are in either recount2
or recount_brain
.
## using tolower() doesn't change any of these numbers
addmargins(table(
'In recount2' = metasra_study %in% recount_abstract$project,
'In recount-brain' = metasra_study %in% unique(recount_brain$sra_study_s)
))
## In recount-brain
## In recount2 FALSE TRUE Sum
## FALSE 195 0 195
## TRUE 100 47 147
## Sum 295 47 342
## In percent
addmargins(table(
'In recount2' = metasra_study %in% recount_abstract$project,
'In recount-brain' = metasra_study %in% unique(recount_brain$sra_study_s)
)) / length(metasra_study) * 100
## In recount-brain
## In recount2 FALSE TRUE Sum
## FALSE 57.01754 0.00000 57.01754
## TRUE 29.23977 13.74269 42.98246
## Sum 86.25731 13.74269 100.00000
## Studies in MetaSRA and recount2 but not in recount_brain
studies_to_check <- metasra_study[
metasra_study %in% recount_abstract$project &
!metasra_study %in% unique(recount_brain$sra_study_s)
]
As a check, anything in recount_brain
has to be in recount2
by construction. We’ll later take a deeper look at the 100 studies present in MetaSRA
and recount2
yet absent from recount_brain
(excluding TCGA).
At the sample level we can find samples present in recount_brain
absent from recount2
which is not unexpected (recount2
was built to be only human RNA-seq samples). All the samples present in MetaSRA
and recount2
yet absent from recount_brain
from the studies we wanted to check.
## using tolower() doesn't change any of these numbers
addmargins(table(
'In recount2' = metasra$sample_id %in% meta$sample,
'In recount-brain' = metasra$sample_id %in% recount_brain$sra_sample_s
))
## In recount-brain
## In recount2 FALSE TRUE Sum
## FALSE 13291 1411 14702
## TRUE 2026 1162 3188
## Sum 15317 2573 17890
## in percent
addmargins(table(
'In recount2' = metasra$sample_id %in% meta$sample,
'In recount-brain' = metasra$sample_id %in% recount_brain$sra_sample_s
)) / nrow(metasra) * 100
## In recount-brain
## In recount2 FALSE TRUE Sum
## FALSE 74.292901 7.887088 82.179989
## TRUE 11.324762 6.495249 17.820011
## Sum 85.617663 14.382337 100.000000
## Samples in MetaSRA and recount2 but not in recount_brain
samples_to_check <- metasra$sample_id[
metasra$sample_id %in% meta$sample &
!metasra$sample_id %in% recount_brain$sra_sample_s
]
## All of them are from the studies we need to check
table(unique(meta$project[meta$sample %in% samples_to_check]) %in%
studies_to_check)
##
## TRUE
## 97
Note that these results exclude TCGA since they don’t have SRA sample IDs.
table('Has SRA sample id' = !is.na(recount_brain$sra_sample_s), recount_brain$Dataset)
##
## Has SRA sample id GTEX recount_brain_v1 TCGA
## FALSE 0 0 707
## TRUE 1409 4431 0
recount_brain
to MetaSRA
We can also do the reverse check and ask which studies or samples present in recount_brain
are present in MetaSRA
.
## At the study level
addmargins(table(
'In MetaSRA (project)' = unique(recount_brain$sra_study_s) %in%
metasra_study
))
## In MetaSRA (project)
## FALSE TRUE Sum
## 17 47 64
## in percent
addmargins(table(
'In MetaSRA (project)' = unique(recount_brain$sra_study_s) %in% metasra_study
)) / length(unique(recount_brain$sra_study_s)) * 100
## In MetaSRA (project)
## FALSE TRUE Sum
## 26.5625 73.4375 100.0000
## At the sample level
## Check whether it's all the large study SRP025982
addmargins(table(
'In MetaSRA (sample)' = recount_brain$sra_sample_s %in% metasra$sample_id,
'SRP025982' = recount_brain$sra_study_s == 'SRP025982',
'Dataset' = recount_brain$Dataset, useNA = 'ifany'
))
## , , Dataset = GTEX
##
## SRP025982
## In MetaSRA (sample) FALSE TRUE <NA> Sum
## FALSE 0 0 0 0
## TRUE 1409 0 0 1409
## Sum 1409 0 0 1409
##
## , , Dataset = recount_brain_v1
##
## SRP025982
## In MetaSRA (sample) FALSE TRUE <NA> Sum
## FALSE 659 2475 0 3134
## TRUE 874 423 0 1297
## Sum 1533 2898 0 4431
##
## , , Dataset = TCGA
##
## SRP025982
## In MetaSRA (sample) FALSE TRUE <NA> Sum
## FALSE 0 0 707 707
## TRUE 0 0 0 0
## Sum 0 0 707 707
##
## , , Dataset = Sum
##
## SRP025982
## In MetaSRA (sample) FALSE TRUE <NA> Sum
## FALSE 659 2475 707 3841
## TRUE 2283 423 0 2706
## Sum 2942 2898 707 6547
## Ok, it's not all SRP025982 so we can drop that comparison
## and show the table in percent
addmargins(table(
'In MetaSRA (sample)' = recount_brain$sra_sample_s %in% metasra$sample_id,
'Dataset' = recount_brain$Dataset, useNA = 'ifany'
)) / nrow(recount_brain) * 100
## Dataset
## In MetaSRA (sample) GTEX recount_brain_v1 TCGA Sum
## FALSE 0.00000 47.86925 10.79884 58.66809
## TRUE 21.52131 19.81060 0.00000 41.33191
## Sum 21.52131 67.67985 10.79884 100.00000
From these checks, we can see that 26.6% of the recount_brain
studies and 58.7% of the samples are missing from MetaSRA
, respectively.
Lets take a deeper look at the 100 studies present in MetaSRA
and recount2
yet absent from recount_brain
. The recount
package already has the study abstract and number of samples information. We can then construct the URL to explore manually these discrepant studies. Next, we can look at the phenopredict
(Ellis, Collado-Torres, Jaffe, and Leek, 2018) predictions we used (version 0.0.03) for selecting the studies as well and the latest (0.0.06) predictions. The prediction table also includes a reported_tissue
. Along with the predictions and the reported tissue, we can also look at MetaSRA
to identify the number of brain samples according to each source and the percent of brain samples per study. We can then evaluate whether the study passed or not our selection criteria of at least 4 brain samples with 70 percent of the study samples coming from the brain.
## Lets get the study-level information already present in the recount package
discrepant <- subset(recount_abstract, project %in% studies_to_check)
## Does the abstract mention the word brain?
discrepant$mentions_brain <- grepl('brain', tolower(discrepant$abstract))
## Next, the url
discrepant$url <- paste0(
'https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=',
discrepant$project
)
## Order by decreasing number of samples
discrepant <- discrepant[order(discrepant$number_samples, decreasing = TRUE), ]
## Get information at the sample level for each project
discrepant_studies_samples <- map(discrepant$project, function(x) {
y <- meta$run[meta$project == x]
m <- match(y, PredictedPhenotypes$sample_id)
m2 <- match(y, PredictedPhenotypes_latest$sample_id)
data.frame(
prediction_original = PredictedPhenotypes$predicted_tissue[m],
prediction_latest = PredictedPhenotypes_latest$predicted_tissue[m2],
sharq = PredictedPhenotypes_latest$reported_tissue[m2],
project = x,
sample_id = y,
stringsAsFactors = FALSE
)
})
## Summarize the information found for each study
discrepant <- cbind(discrepant, map_dfr(discrepant_studies_samples, function(x) {
data.frame(
brain_n_original = sum(x$prediction_original == 'Brain', na.rm = TRUE),
brain_n_latest = sum(x$prediction_latest == 'Brain', na.rm = TRUE),
brain_n_sharq = sum(x$sharq == 'Brain', na.rm = TRUE),
brain_n_metasra = sum(metasra$study_id == unique(x$project)),
brain_percent_original = sum(x$prediction_original == 'Brain',
na.rm = TRUE) / nrow(x) * 100,
brain_percent_latest = sum(x$prediction_latest == 'Brain',
na.rm = TRUE) / nrow(x) * 100,
brain_percent_sharq = sum(x$sharq == 'Brain',
na.rm = TRUE) / nrow(x) * 100,
brain_percent_metasra = sum(metasra$study_id == unique(x$project)) /
nrow(x) * 100,
stringsAsFactors = FALSE
)
}))
## Does it match the original criterial of at least 4 samples and greater than
## 70% brain samples in the study?
discrepant$criteria_original <- discrepant$number_samples >= 4 &
discrepant$brain_percent_original > 70
discrepant$criteria_latest <- discrepant$number_samples >= 4 &
discrepant$brain_percent_latest > 70
discrepant$criteria_sharq <- discrepant$number_samples >= 4 &
discrepant$brain_percent_sharq > 70
discrepant$criteria_metasra <- discrepant$number_samples >= 4 &
discrepant$brain_percent_metasra > 70
## Check the original criteria is all FALSE since they are absent from recount_brain
stopifnot(all(!discrepant$criteria_original))
Now that we have our detailed table for these 100 studies, we can look into them in more detail.
addmargins(with(discrepant,
table(criteria_latest, criteria_sharq, criteria_metasra)))
## , , criteria_metasra = FALSE
##
## criteria_sharq
## criteria_latest FALSE TRUE Sum
## FALSE 68 4 72
## TRUE 0 0 0
## Sum 68 4 72
##
## , , criteria_metasra = TRUE
##
## criteria_sharq
## criteria_latest FALSE TRUE Sum
## FALSE 19 4 23
## TRUE 2 3 5
## Sum 21 7 28
##
## , , criteria_metasra = Sum
##
## criteria_sharq
## criteria_latest FALSE TRUE Sum
## FALSE 87 8 95
## TRUE 2 3 5
## Sum 89 11 100
From the above output we can see that 28 of the 100 studies would match our study criteria had we used MetaSRA
, which includes 5 studies that now also match our criteria using the version 0.0.06 predictions.
## all
ggplot(discrepant,
aes(x = brain_percent_original, y = brain_percent_latest,
color = criteria_latest, size = number_samples,
shape = criteria_metasra)) +
geom_point() +
facet_grid( ~ criteria_sharq) +
geom_abline(linetype = 3, color = 'purple') +
labs(caption = 'Panels by criteria_sharq')
## just those with some TRUE criteria
ggplot(subset(discrepant,
criteria_sharq | criteria_metasra | criteria_latest),
aes(x = brain_percent_original, y = brain_percent_latest,
color = criteria_latest, size = number_samples,
shape = criteria_metasra)) +
geom_point() +
facet_grid( ~ criteria_sharq) +
geom_abline(linetype = 3, color = 'purple') +
labs(caption = 'Panels by criteria_sharq')
There are 4 studies that only pass the selection criteria based on the reported_tissue
information present in the predictions table. The reported_tissue
was extracted from SHARQ prototype as described in the phenopredict
manuscript (Ellis, Collado-Torres, Jaffe, and Leek, 2018).
subset(discrepant, criteria_sharq & !(criteria_metasra | criteria_latest))
## number_samples species
## 847 24 human
## 789 16 human
## 268 4 human
## 341 4 human
## abstract
## 847 Purpose: The purpose of this experiment is to identify a C9-ALS/FTD specific genomic profile in fibroblast lines that is distinct from sporadic ALS without C9orf72 expansion and non-neurologic control cells. The study will then evaluate the effect on this identified profile of ASO treatment targeting the sense strand RNA transcript of the C9orf72 gene. Methods: Expression profiling was performed on RNAs from fibroblasts of four C9orf72 patients, four control individuals and four sporadic ALS patients using Multiplex Analysis of PolyA-linked Sequences method. Results: Hierarchical clustering of expression values for all genes showed that the four C9orf72 patient lines had an expression profile distinct from control and sporadic ALS lines. Statistical comparison of expression values between the four C9orf72 lines and the four control lines revealed that 122 genes were upregulated (defined by a False Discovery Rate FDR<0.05) and 34 genes were downregulated (defined by a False Discovery Rate FDR <0.05) in C9orf72 patient fibroblasts. Conclusions: A genome wide RNA signature can be defined in fibroblasts with C9orf72 expansion. ASO-mediated reduction of C9orf72 RNA levels in fibroblasts with the hexanucleotide expansion efficiently reduced accumulation of GGGGCC RNA foci. This did not, however, generate a reversal of the C9orf72 RNA profile. Overall design: Use of Multiplex Analysis of PolyA-linked Sequences to identify expression changes in fibroblasts from amyotrophic lateral sclerosis and frontotemporal dementia patients harboring an hexanucleotide expansion in the C9orf72 gene.
## 789 MiRNAs are important negative regulators of protein coding gene expression, and have been studied intensively over the last few years. Several measurement platforms, designed to determine their relative RNA abundance levels in biological samples, have been developed. In this study, we systematically compared 12 commercially available miRNA expression platforms by measuring an identical set of 20 standardized positive and negative control samples, including human universal reference RNA, human brain RNA and titrations thereof, human serum samples, and synthetic spikes from miRNA family members with varying homology. We developed novel and robust quality metrics to objectively assess platform performance of very different technologies such as small RNA sequencing, RT-qPCR and (microarray) hybridization. We assessed reproducibility, sensitivity, accuracy, specificity, and concordance of differential expression. The results indicate that each method has its strengths and weaknesses, which helps to guide informed selection of a quantitative miRNA gene expression platform in function of particular study goals. Overall design: Sequencing of 20 miRQC samples on Illumina Genome Analyzer IIx System
## 268 Establishing the functional roles of genetic variants remains a significant challenge in the post-genomic era. Here, we present a method, allele-specific alternative mRNA processing (ASARP), to identify genetically influenced mRNA processing events using transcriptome sequencing (RNA-Seq) data. The method examines RNA-Seq data at both single nucleotide and whole-gene/isoform levels to identify allele-specific expression (ASE) and existence of allele-specific regulation of mRNA processing. We applied the methods to data obtained from the human glioblastoma cell line U87MG and primary breast cancer tissues and found that 26ââ\u0082¬â\u0080\u009c45% of all genes with sufficient read coverage demonstrated ASE, with significant overlap between the two cell types. Our methods predicted potential mechanisms underlying ASE due to regulations affecting either whole-gene-level expression or alternative mRNA processing, including alternative splicing, alternative polyadenylation and alternative transcriptional initiation. Allele-specific alternative splicing and alternative polyadenylation may explain ASE in hundreds of genes in each cell type. Reporter studies following these predictions identified the causal single nucleotide variants (SNVs) for several allele-specific alternative splicing events. Finally, many genes identified in our study were also reported as disease/phenotype-associated genes in genome-wide association studies. Future applications of our approach may provide ample insights for a better understanding of the genetic basis of gene regulation underlying phenotypic diversity and disease mechanisms. Overall design: Examine allele-specific gene expression and alternative RNA processing in U87MG cell line
## 341 RNA editing enhances the diversity of gene products at the post-transcriptional level. Approaches for genome-wide identification of RNA editing face two main challenges: separating true editing sites from false discoveries and accurate estimation of editing levels. We developed an approach to analyze transcriptome sequencing data (RNA-Seq) for global identification of RNA editing in cells for which whole-genome sequencing data are available. We applied the method to analyze RNA-Seq data of a human glioblastoma cell line, U87MG. Around 10,000 DNA-RNA differences were identified, the majority being putative A-to-I editing sites. These predicted A-to-I events were associated with a low false discovery rate (~5%). Moreover, the estimated editing levels from RNA-Seq correlated well with those based on traditional clonal sequencing. Our results further facilitated unbiased characterization of the sequence and evolutionary features flanking predicted A-to-I editing sites and discovery of a conserved RNA structural motif that may be functionally relevant to editing. Genes with predicted A-to-I editing were significantly enriched with those known to be involved in cancer, supporting the potential importance of cancer-specific RNA editing. A similar profile of DNA-RNA differences as in U87MG was predicted for another RNA-Seq data set obtained from primary breast cancer samples. Remarkably, significant overlap exists between the putative editing sites of the two transcriptomes despite their difference in cell type, cancer type and genomic backgrounds. Our approach enabled de novo identification of the RNA editome, which sets the stage for further mechanistic studies of this important step of post-transcriptional regulation. Overall design: Examine mRNA expression in U87MG cells following ADAR1 or control siRNA knockdown
## project mentions_brain url brain_n_original brain_n_latest
## 847 SRP032165 FALSE https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP032165 5 3
## 789 SRP028738 TRUE https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP028738 2 3
## 268 SRP006970 FALSE https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP006970 0 0
## 341 SRP009659 FALSE https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP009659 0 0
## brain_n_sharq brain_n_metasra brain_percent_original brain_percent_latest brain_percent_sharq brain_percent_metasra
## 847 24 6 20.83333 12.50 100 25.0
## 789 12 2 12.50000 18.75 75 12.5
## 268 4 1 0.00000 0.00 100 25.0
## 341 4 2 0.00000 0.00 100 50.0
## criteria_original criteria_latest criteria_sharq criteria_metasra
## 847 FALSE FALSE TRUE FALSE
## 789 FALSE FALSE TRUE FALSE
## 268 FALSE FALSE TRUE FALSE
## 341 FALSE FALSE TRUE FALSE
These are the 5 studies that would pass the selection criteria with the latest predictions which would all pass it with MetaSRA
data.
subset(discrepant, criteria_latest)
## number_samples species
## 1669 466 human
## 438 24 human
## 665 20 human
## 1862 7 human
## 567 5 human
## abstract
## 1669 We used single cell RNA sequencing on 466 cells to capture the cellular complexity of the adult and fetal human brain at a whole transcriptome level. Healthy adult temporal lobe tissue was obtained from epileptic patients during temporal lobectomy for medically refractory seizures. We were able to classify individual cells into all of the major neuronal, glial, and vascular cell types in the brain. Overall design: Examination of cell types in healthy human brain samples.
## 438 The expansion of the neocortex during mammalian brain evolution results primarily from an increase in neural progenitor cell divisions in its two principal germinal zones during development, the ventricular zone (VZ) and the subventricular zone (SVZ). Using mRNA sequencing, we analyzed the transcriptomes of fetal human and embryonic mouse VZ, SVZ and cortical plate (CP). We describe sets of genes that are up- or down-regulated in each germinal zone. These data suggest that cell adhesion and cell-extracellular matrix (ECM) interactions promote the proliferation and self-renewal of neural progenitors in the developing human neocortex. Notably, relevant ECM-associated genes include distinct sets of collagens, laminins, proteoglycans and integrins, along with specific sets of growth factors and morphogens. Our data establish a basis for identifying novel cell-type markers and open up avenues to unravel the molecular basis of neocortex expansion during evolution. Overall design: Total RNA was isolated from the VZ, inner SVZ (ISVZ), outer SVZ (OSVZ) and CP of six 13-16 weeks post-conception (w.p.c.) human fetuses and from the VZ, SVZ and CP of five E14.5 mouse embryos using laser capture microdissection of Nissl-stained cryosections of dorsolateral telencephalon. Poly A+ RNA was used as template for the preparation of cDNA which were then subjected to single-end 76-bp RNA-Seq.
## 665 MicroRNAs (miRNAs) are small (20-22 nucleotides) regulatory non-coding RNAs that strongly influence gene expression. Most prior studies addressing the role of miRNAs in neurodegenerative diseases (NDs) have focused on individual controls (n = 2), AD (n = 5), dementia with Lewy bodies (n = 4), hippocampal sclerosis of aging (n = 4), and frontotemporal lobar dementia (FTLD) (n = 5) cases, together accounting for the most prevalent ND subtypes. All cases had short postmortem intervals, relatively high-quality RNA, and state-of-the-art neuropathological diagnoses. The resulting data (over 113 million reads in total, averaging 5.6 million reads per sample) and secondary expression analyses constitute an unprecedented look into the human cerebral cortical miRNome at single nucleotide resolution. While we find no apparent changes in isomiR or miRNA editing patterns in correlation with ND pathology, our results validate and extend previous miRNA profiling studies with regard to quantitative changes in NDs. In agreement with this idea, we provide independent cohort validation for changes in miR-132 expression levels in AD (n = 8) and FTLD (n = 14) cases when compared to controls (n = 8). The identification of common and ND-specific putative novel brain miRNAs and/or short-hairpin molecules is also presented. The challenge now is to better understand the impact of these and other alterations on neuronal gene expression networks and neuropathologies. Overall design: Using RNA deep sequencing, we sought to analyze in detail the small RNAs (including miRNAs) in the temporal neocortex gray matter from non-demented controls (n = 2), AD (n = 5), dementia with Lewy bodies (n = 4), hippocampal sclerosis of aging (n = 4), and frontotemporal lobar dementia (FTLD) (n = 5) cases, together accounting for the most prevalent ND subtypes.
## 1862 Neuronal migration defects (NMDs) are among the most common and severe brain abnormalities in humans. Lack of disease models in mice or in human cells has hampered the identification of underlying mechanisms. From patients with severe NMDs we generated iPSCs then differentiated neural progenitor cells (NPCs). On artificial extracellular matrix, patient-derived neuronal cells showed defective migration and impaired neurite outgrowth. From a cohort of 107 families with NMDs, sequencing identified two homozygous C-terminal truncating mutations in CTNNA2, encoding aN-catenin, one of three paralogues of the a-catenin family, involved in epithelial integrity and cell polarity. Patient-derived or CRISPR-targeted CTNNA2- mutant neuronal cells showed defective migration and neurite stability. Recombinant aN-catenin was sufficient to bundle purified actin and to suppress the actin-branching activity of ARP2/3. Small molecule inhibitors of ARP2/3 rescued the CTNNA2 neurite defect. Thus, disease modeling in human cells could be used to understand NMD pathogenesis and develop treatments for associated disorders. Overall design: 2 biological replicates per individual (2 iPSC clone differentiations), excluding 1263A, which has one sample
## 567 TAF15, an RNA binding protein was recently implicated in Amyotrophic Lateral Sclerosis (ALS). ALS is a fatal neurodegenerative disease. We report the identification of the conserved neuronal RNA targets of TAF15 and the assessment of the impact of TAF15 depletion on the neuronal transcriptome. Our study uncovers regulation of splicing of sets of neuronal RNAs encoding proteins with essential roles in synaptic activities including glutamergic receptors such as zeta-1 subunit of the glutamate N-methyl-D-aspartate (NMDA) receptor (Grin1). Overall design: Identification of TAF15 neuronal targets using normal human brain samples and mouse neurons. Mouse background: E14Tg2a.4 wildtype cells derived from 129P2/OlaHsd.
## project mentions_brain url brain_n_original
## 1669 SRP057196 TRUE https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP057196 201
## 438 SRP013825 TRUE https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP013825 14
## 665 SRP021130 TRUE https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP021130 13
## 1862 SRP063669 TRUE https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP063669 0
## 567 SRP017777 TRUE https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP017777 3
## brain_n_latest brain_n_sharq brain_n_metasra brain_percent_original brain_percent_latest brain_percent_sharq
## 1669 350 0 466 43.13305 75.10730 0.00000
## 438 23 23 24 58.33333 95.83333 95.83333
## 665 18 19 20 65.00000 90.00000 95.00000
## 1862 6 0 7 0.00000 85.71429 0.00000
## 567 4 5 5 60.00000 80.00000 100.00000
## brain_percent_metasra criteria_original criteria_latest criteria_sharq criteria_metasra
## 1669 100 FALSE TRUE FALSE TRUE
## 438 100 FALSE TRUE TRUE TRUE
## 665 100 FALSE TRUE TRUE TRUE
## 1862 100 FALSE TRUE FALSE TRUE
## 567 100 FALSE TRUE TRUE TRUE
To explore the table in more detail, open the discrepant_studies.csv
file.
write.csv(discrepant, file = 'discrepant_studies.csv')
This document was made possible thanks to MetaSRA
(Bernstein, Doan, and Dewey, 2017) and :
Code for creating this document
## Create the vignette
library('rmarkdown')
system.time(render('metasra_comp.Rmd', 'BiocStyle::html_document'))
Reproducibility information for this document.
## Reproducibility info
proc.time()
## user system elapsed
## 75.865 5.957 135.840
message(Sys.time())
## 2020-11-13 16:25:52
options(width = 120)
library('sessioninfo')
session_info()
## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
## setting value
## version R version 4.0.2 Patched (2020-06-24 r78746)
## os CentOS Linux 7 (Core)
## system x86_64, linux-gnu
## ui X11
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz US/Eastern
## date 2020-11-13
##
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
## package * version date lib source
## AnnotationDbi 1.50.3 2020-07-25 [2] Bioconductor
## askpass 1.1 2019-01-13 [2] CRAN (R 4.0.0)
## assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.0.0)
## backports 1.2.0 2020-11-02 [1] CRAN (R 4.0.2)
## base64enc 0.1-3 2015-07-28 [2] CRAN (R 4.0.0)
## bibtex 0.4.2.3 2020-09-19 [2] CRAN (R 4.0.2)
## Biobase * 2.48.0 2020-04-27 [2] Bioconductor
## BiocFileCache 1.12.1 2020-08-04 [2] Bioconductor
## BiocGenerics * 0.34.0 2020-04-27 [2] Bioconductor
## BiocManager 1.30.10 2019-11-16 [2] CRAN (R 4.0.0)
## BiocParallel 1.22.0 2020-04-27 [2] Bioconductor
## BiocStyle * 2.16.1 2020-09-25 [1] Bioconductor
## biomaRt 2.44.4 2020-10-13 [2] Bioconductor
## Biostrings 2.56.0 2020-04-27 [2] Bioconductor
## bit 4.0.4 2020-08-04 [2] CRAN (R 4.0.2)
## bit64 4.0.5 2020-08-30 [2] CRAN (R 4.0.2)
## bitops 1.0-6 2013-08-17 [2] CRAN (R 4.0.0)
## blob 1.2.1 2020-01-20 [2] CRAN (R 4.0.0)
## bookdown 0.21 2020-10-13 [1] CRAN (R 4.0.2)
## broom 0.7.2 2020-10-20 [2] CRAN (R 4.0.2)
## BSgenome 1.56.0 2020-04-27 [2] Bioconductor
## bumphunter 1.30.0 2020-04-27 [2] Bioconductor
## callr 3.5.1 2020-10-13 [2] CRAN (R 4.0.2)
## cellranger 1.1.0 2016-07-27 [2] CRAN (R 4.0.0)
## checkmate 2.0.0 2020-02-06 [2] CRAN (R 4.0.0)
## cli 2.1.0 2020-10-12 [2] CRAN (R 4.0.2)
## cluster 2.1.0 2019-06-19 [3] CRAN (R 4.0.2)
## codetools 0.2-16 2018-12-24 [3] CRAN (R 4.0.2)
## colorspace 1.4-1 2019-03-18 [2] CRAN (R 4.0.0)
## crayon 1.3.4 2017-09-16 [2] CRAN (R 4.0.0)
## curl 4.3 2019-12-02 [2] CRAN (R 4.0.0)
## data.table 1.13.2 2020-10-19 [2] CRAN (R 4.0.2)
## DBI 1.1.0 2019-12-15 [2] CRAN (R 4.0.0)
## dbplyr 2.0.0 2020-11-03 [1] CRAN (R 4.0.2)
## DelayedArray * 0.14.1 2020-07-14 [2] Bioconductor
## derfinder 1.22.0 2020-04-27 [2] Bioconductor
## derfinderHelper 1.22.0 2020-04-27 [2] Bioconductor
## desc 1.2.0 2018-05-01 [2] CRAN (R 4.0.0)
## devtools * 2.3.2 2020-09-18 [2] CRAN (R 4.0.2)
## digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2)
## doRNG 1.8.2 2020-01-27 [2] CRAN (R 4.0.0)
## downloader 0.4 2015-07-09 [2] CRAN (R 4.0.0)
## dplyr * 1.0.2 2020-08-18 [2] CRAN (R 4.0.2)
## ellipsis 0.3.1 2020-05-15 [2] CRAN (R 4.0.0)
## evaluate 0.14 2019-05-28 [2] CRAN (R 4.0.0)
## fansi 0.4.1 2020-01-08 [2] CRAN (R 4.0.0)
## farver 2.0.3 2020-01-16 [2] CRAN (R 4.0.0)
## forcats * 0.5.0 2020-03-01 [2] CRAN (R 4.0.0)
## foreach 1.5.1 2020-10-15 [2] CRAN (R 4.0.2)
## foreign 0.8-80 2020-05-24 [3] CRAN (R 4.0.2)
## Formula 1.2-4 2020-10-16 [2] CRAN (R 4.0.2)
## fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
## generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.2)
## GenomeInfoDb * 1.24.2 2020-06-15 [2] Bioconductor
## GenomeInfoDbData 1.2.3 2020-05-18 [2] Bioconductor
## GenomicAlignments 1.24.0 2020-04-27 [2] Bioconductor
## GenomicFeatures 1.40.1 2020-07-08 [2] Bioconductor
## GenomicFiles 1.24.0 2020-04-27 [2] Bioconductor
## GenomicRanges * 1.40.0 2020-04-27 [2] Bioconductor
## GEOquery 2.56.0 2020-04-27 [2] Bioconductor
## ggplot2 * 3.3.2 2020-06-19 [2] CRAN (R 4.0.2)
## glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
## gridExtra 2.3 2017-09-09 [2] CRAN (R 4.0.0)
## gtable 0.3.0 2019-03-25 [2] CRAN (R 4.0.0)
## haven 2.3.1 2020-06-01 [2] CRAN (R 4.0.2)
## Hmisc 4.4-1 2020-08-10 [2] CRAN (R 4.0.2)
## hms 0.5.3 2020-01-08 [2] CRAN (R 4.0.0)
## htmlTable 2.1.0 2020-09-16 [2] CRAN (R 4.0.2)
## htmltools 0.5.0 2020-06-16 [2] CRAN (R 4.0.2)
## htmlwidgets 1.5.2 2020-10-03 [2] CRAN (R 4.0.2)
## httr 1.4.2 2020-07-20 [2] CRAN (R 4.0.2)
## IRanges * 2.22.2 2020-05-21 [2] Bioconductor
## iterators 1.0.13 2020-10-15 [2] CRAN (R 4.0.2)
## jpeg 0.1-8.1 2019-10-24 [2] CRAN (R 4.0.0)
## jsonlite 1.7.1 2020-09-07 [2] CRAN (R 4.0.2)
## knitcitations * 1.0.10 2019-09-15 [1] CRAN (R 4.0.2)
## knitr 1.30 2020-09-22 [1] CRAN (R 4.0.2)
## labeling 0.4.2 2020-10-20 [2] CRAN (R 4.0.2)
## lattice 0.20-41 2020-04-02 [3] CRAN (R 4.0.2)
## latticeExtra 0.6-29 2019-12-19 [2] CRAN (R 4.0.0)
## lifecycle 0.2.0 2020-03-06 [2] CRAN (R 4.0.0)
## limma 3.44.3 2020-06-12 [2] Bioconductor
## locfit 1.5-9.4 2020-03-25 [2] CRAN (R 4.0.0)
## lubridate 1.7.9 2020-06-08 [1] CRAN (R 4.0.0)
## magick 2.5.2 2020-11-10 [1] CRAN (R 4.0.2)
## magrittr 1.5 2014-11-22 [2] CRAN (R 4.0.0)
## Matrix 1.2-18 2019-11-27 [3] CRAN (R 4.0.2)
## matrixStats * 0.57.0 2020-09-25 [2] CRAN (R 4.0.2)
## memoise 1.1.0 2017-04-21 [2] CRAN (R 4.0.0)
## modelr 0.1.8 2020-05-19 [1] CRAN (R 4.0.0)
## munsell 0.5.0 2018-06-12 [2] CRAN (R 4.0.0)
## nnet 7.3-14 2020-04-26 [3] CRAN (R 4.0.2)
## openssl 1.4.3 2020-09-18 [2] CRAN (R 4.0.2)
## pillar 1.4.6 2020-07-10 [2] CRAN (R 4.0.2)
## pkgbuild 1.1.0 2020-07-13 [2] CRAN (R 4.0.2)
## pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.0.0)
## pkgload 1.1.0 2020-05-29 [2] CRAN (R 4.0.2)
## plyr 1.8.6 2020-03-03 [2] CRAN (R 4.0.0)
## png 0.1-7 2013-12-03 [2] CRAN (R 4.0.0)
## prettyunits 1.1.1 2020-01-24 [2] CRAN (R 4.0.0)
## processx 3.4.4 2020-09-03 [2] CRAN (R 4.0.2)
## progress 1.2.2 2019-05-16 [2] CRAN (R 4.0.0)
## ps 1.4.0 2020-10-07 [2] CRAN (R 4.0.2)
## purrr * 0.3.4 2020-04-17 [2] CRAN (R 4.0.0)
## qvalue 2.20.0 2020-04-27 [2] Bioconductor
## R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2)
## rappdirs 0.3.1 2016-03-28 [2] CRAN (R 4.0.0)
## RColorBrewer 1.1-2 2014-12-07 [2] CRAN (R 4.0.0)
## Rcpp 1.0.5 2020-07-06 [2] CRAN (R 4.0.2)
## RCurl 1.98-1.2 2020-04-18 [2] CRAN (R 4.0.0)
## readr * 1.4.0 2020-10-05 [2] CRAN (R 4.0.2)
## readxl 1.3.1 2019-03-13 [2] CRAN (R 4.0.0)
## recount * 1.14.0 2020-04-27 [2] Bioconductor
## RefManageR 1.2.12 2019-04-03 [1] CRAN (R 4.0.2)
## remotes 2.2.0 2020-07-21 [2] CRAN (R 4.0.2)
## rentrez 1.2.2 2019-05-02 [2] CRAN (R 4.0.0)
## reprex 0.3.0 2019-05-16 [1] CRAN (R 4.0.0)
## reshape2 1.4.4 2020-04-09 [2] CRAN (R 4.0.0)
## rlang 0.4.8 2020-10-08 [1] CRAN (R 4.0.2)
## rmarkdown * 2.5 2020-10-21 [1] CRAN (R 4.0.2)
## rngtools 1.5 2020-01-23 [2] CRAN (R 4.0.0)
## rpart 4.1-15 2019-04-12 [3] CRAN (R 4.0.2)
## rprojroot 1.3-2 2018-01-03 [2] CRAN (R 4.0.0)
## Rsamtools 2.4.0 2020-04-27 [2] Bioconductor
## RSQLite 2.2.1 2020-09-30 [2] CRAN (R 4.0.2)
## rstudioapi 0.11 2020-02-07 [2] CRAN (R 4.0.0)
## rtracklayer 1.48.0 2020-04-27 [2] Bioconductor
## rvest 0.3.6 2020-07-25 [2] CRAN (R 4.0.2)
## S4Vectors * 0.26.1 2020-05-16 [2] Bioconductor
## scales 1.1.1 2020-05-11 [2] CRAN (R 4.0.0)
## sessioninfo * 1.1.1 2018-11-05 [2] CRAN (R 4.0.0)
## stringi 1.5.3 2020-09-09 [2] CRAN (R 4.0.2)
## stringr * 1.4.0 2019-02-10 [2] CRAN (R 4.0.0)
## SummarizedExperiment * 1.18.2 2020-07-09 [2] Bioconductor
## survival 3.2-3 2020-06-13 [3] CRAN (R 4.0.2)
## testthat 3.0.0 2020-10-31 [1] CRAN (R 4.0.2)
## tibble * 3.0.4 2020-10-12 [2] CRAN (R 4.0.2)
## tidyr * 1.1.2 2020-08-27 [2] CRAN (R 4.0.2)
## tidyselect 1.1.0 2020-05-11 [2] CRAN (R 4.0.0)
## tidyverse * 1.3.0 2019-11-21 [1] CRAN (R 4.0.0)
## usethis * 1.6.3 2020-09-17 [2] CRAN (R 4.0.2)
## VariantAnnotation 1.34.0 2020-04-27 [2] Bioconductor
## vctrs 0.3.4 2020-08-29 [1] CRAN (R 4.0.2)
## withr 2.3.0 2020-09-22 [2] CRAN (R 4.0.2)
## xfun 0.19 2020-10-30 [1] CRAN (R 4.0.2)
## XML 3.99-0.5 2020-07-23 [2] CRAN (R 4.0.2)
## xml2 1.3.2 2020-04-23 [2] CRAN (R 4.0.0)
## XVector 0.28.0 2020-04-27 [2] Bioconductor
## yaml 2.2.1 2020-02-01 [2] CRAN (R 4.0.0)
## zlibbioc 1.34.0 2020-04-27 [2] Bioconductor
##
## [1] /users/neagles/R/4.0
## [2] /jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/svnR-4.0/R/4.0/lib64/R/site-library
## [3] /jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/svnR-4.0/R/4.0/lib64/R/library
This document was generated using BiocStyle (Oleś, Morgan, and Huber, 2020) with knitr (Xie, 2014) and rmarkdown (Allaire, Xie, McPherson, Luraschi, et al., 2020) running behind the scenes.
Citations made with knitcitations (Boettiger, 2019) and the bibliographical file is available here.
[1] J. Allaire, Y. Xie, J. McPherson, J. Luraschi, et al. rmarkdown: Dynamic Documents for R. R package version 2.5. 2020. <URL: https://github.com/rstudio/rmarkdown>.
[2] M. N. Bernstein, A. Doan, and C. N. Dewey. “MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive”. In: Bioinformatics 33.18 (May. 2017). Ed. by J. Wren, pp. 2914-2923. DOI: 10.1093/bioinformatics/btx334. <URL: https://doi.org/10.1093/bioinformatics/btx334>.
[3] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.10. 2019. <URL: https://CRAN.R-project.org/package=knitcitations>.
[4] G. Csárdi, R. core, H. Wickham, W. Chang, et al. sessioninfo: R Session Information. R package version 1.1.1. 2018. <URL: https://CRAN.R-project.org/package=sessioninfo>.
[5] S. E. Ellis, L. Collado-Torres, A. E. Jaffe, and J. T. Leek. “Improving the value of public RNA-seq expression data by phenotype prediction”. In: Nucl. Acids Res. (2018). DOI: 10.1093/nar/gky102. <URL: https://doi.org/10.1093/nar/gky102>.
[6] A. Oleś, M. Morgan, and W. Huber. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.16.1. 2020. <URL: https://github.com/Bioconductor/BiocStyle>.
[7] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2020. <URL: https://www.R-project.org/>.
[8] H. Wickham, M. Averick, J. Bryan, W. Chang, et al. “Welcome to the tidyverse”. In: Journal of Open Source Software 4.43 (2019), p. 1686. DOI: 10.21105/joss.01686.
[9] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. <URL: http://www.crcpress.com/product/isbn/9781466561595>.