R
is an open-source statistical environment which can be easily modified to enhance its functionality via packages. libdRSE is a R
package available via GitHub. R
can be installed on any operating system from CRAN after which you can install libdRSE by using the following commands in your R
session:
## If necessary
install.packages('remotes')
## Install libdRSE
remotes::install_github('LieberInstitute/libdRSE')
## Note: this package is currently private, so you'll need to follow these steps:
## Add the GITHUB_PAT environment variable
## The process is described in the help page for remotes::install_github
## Once you have GITHUB_PAT configured, run:
remotes::install_github('LieberInstitute/libdRSE')
libdRSE is based on many other packages and in particular in those that have implemented the infrastructure needed for dealing with RNA-seq data. That is, packages like rtracklayer that allow you to import the data. A libdRSE user is not expected to deal with those packages directly but will need to be familiar with SummarizedExperiment to understand the results libdRSE generates. It might also prove to be highly beneficial to check the
If you are asking yourself the question “Where do I start using Bioconductor?” you might be interested in this blog post.
As package developers, we try to explain clearly how to use our packages and in which order to use the functions. But R
and Bioconductor
have a steep learning curve so it is critical to learn where to ask for help. The blog post quoted above mentions some but we would like to highlight the Bioconductor support site as the main resource for getting help regarding Bioconductor. Other alternatives are available such as creating GitHub issues and tweeting. However, please note that if you want to receive help you should adhere to the posting guidelines. It is particularly critical that you provide a small reproducible example and your session information so package developers can track down the source of the error.
We hope that libdRSE will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!
## Citation info
citation('libdRSE')
#>
#> Straub A, Collado-Torres L (2019). _Extract coverage from LIBD
#> BigWig files_. https://github.com/LieberInstitute/libdRSE - R
#> package version 0.99.0, <URL:
#> https://github.com/LieberInstitute/libdRSE>.
#>
#> Jaffe AE, and Richard E. Straub, Shin JH, Tao R, Gao Y,
#> Collado-Torres L, Kam-Thong T, Xi HS, Quan J, Chen Q, Colantuoni
#> C, Ulrich WS, Maher BJ, Deep-Soboslay A, Cross AJ, Brandon NJ,
#> Leek JT, Hyde TM, Kleinman JE, Weinberger DR (2018).
#> "Developmental and genetic regulation of the human cortex
#> transcriptome illuminate schizophrenia pathogenesis." _Nature
#> Neuroscience_, *21*(8), 1117-1125. doi: 10.1038/s41593-018-0197-y
#> (URL: http://doi.org/10.1038/s41593-018-0197-y), <URL:
#> https://doi.org/10.1038/s41593-018-0197-y>.
#>
#> Collado-Torres L, Burke EE, Peterson A, Shin J, Straub RE,
#> Rajpurohit A, Semick SA, Ulrich WS, Price AJ, Valencia C, Tao R,
#> Deep-Soboslay A, Hyde TM, Kleinman JE, Weinberger DR, Jaffe AE
#> (2019). "Regional Heterogeneity in Gene Expression, Regulation,
#> and Coherence in the Frontal Cortex and Hippocampus across
#> Development and Schizophrenia." _Neuron_, *103*(2), 203-216.e8.
#> doi: 10.1016/j.neuron.2019.05.013 (URL:
#> http://doi.org/10.1016/j.neuron.2019.05.013), <URL:
#> https://doi.org/10.1016/j.neuron.2019.05.013>.
#>
#> To see these entries in BibTeX format, use 'print(<citation>,
#> bibtex=TRUE)', 'toBibtex(.)', or set
#> 'options(citation.bibtex.max=999)'.
The libdRSE package was developed as an interface for extracting base-pair coverage from scaled BigWig files for a set of LIBD projects. libdRSE makes it easy to obtain base-pair coverage information such as read counts in easy to use R objects powered by SummarizedExperiment. The quantification provided by libdRSE is similar to the one from the recountWorkflow (Collado-Torres, Nellore, and Jaffe, 2017) and recount (Collado-Torres, Nellore, Kammers, Ellis, et al., 2017). That is, base-pair coverage data is summed over for a set of input genomic regions and scaled to a library size of 40 million 100 base-pair reads, which is what you would get with recount::coverage_matrix()
for the over 70,000 human RNA-seq samples uniformly-processed by the recount2 project (Collado-Torres, Nellore, Kammers, Ellis, et al., 2017). With libdRSE however, you get access to similar data for the LIBD public RNA-seq projects, particularly BrainSEQ Phase I (Jaffe, and Richard E. Straub, Shin, Tao, et al., 2018) and BrainSEQ Phase II (Collado-Torres, Burke, Peterson, Shin, et al., 2019).
Through the phenotype tables you can also find the specific BigWig coverage files for the LIBD projects that you can access with other tools. More information on the BigWig file format is available at the UCSC Genome Browser website. NOTE: the data is private at this moment. We are figuring out the best way to host the BigWig files.
Finally, some LIBD projects have stranded information, that is, one BigWig file for each the forward and reverse strand. Stranded data is not present in recount2, so the libdRSE::sense_antisense()
function can be useful if you want to explore the expression for the sense and antisense directions on your stranded genomic regions.
To get started, load the libdRSE package.
library(libdRSE)
BrainSeq aims to characterize the genetic and epigenetic regulation of multiple facets of transcription in distinct brain regions across the human lifespan in samples of major neuropsychiatric disorders and controls.
BrainSEQ Consortium (Schubert, O’Donnell, Quan, Wendland, et al., 2015)
BrainSEQ Phase 1: Data including DLPFC polyA+ RNA-seq on 738 subjects spanning the human lifespan and four main psychiatric diagnostic groups: schizophrenia disorder (Szhizo), major depression disorder (MDD), bipolar disorder (Bipolar) and neurotypical controls (Control).
dim(find_samples('bsp1'))
#> [1] 732 67
table(find_samples('bsp1')$Dx)
#>
#> Bipolar Control MDD Schizo
#> 67 336 150 179
If you use this data, please cite BrainSEQ Phase I (Jaffe, and Richard E. Straub, Shin, Tao, et al., 2018).
BrainSEQ Phase 2: Data including dorsolateral prefrontal cortex (DLPFC) and hippocampus (HIPPO) RiboZero RNA-seq subjects spanning the lifespan and adult patients with schizophrenia disorder (SCZD) or neurotypical controls (Control).
dim(find_samples('bsp2'))
#> [1] 900 79
with(find_samples('bsp2'), addmargins(table(Dx, Region)))
#> Region
#> Dx DLPFC HIPPO Sum
#> Control 300 314 614
#> SCZD 153 133 286
#> Sum 453 447 900
If you use this data, please cite BrainSEQ Phase II (Collado-Torres, Burke, Peterson, Shin, et al., 2019).
Details about the main sample columns is available here.
find_samples()
very simply obtains the phenotype table for the given LIBD project. Inputs to find_samples()
include project
and sample_ids
. Given these inputs, the function returns a DataFrame object (from S4Vectors) with the phenotype table for the given project
, and if the user supplies sample_ids
, then the phenotype table will be subsetted to those ids.
example('find_samples')
#>
#> fnd_sm> ## Subset the first two samples from BrainSEQ Phase I
#> fnd_sm> find_samples('bsp1', sample_ids = c('R2799', 'R2809'))
#> DataFrame with 2 rows and 67 columns
#> RNum BrNum Region RIN Age Sex
#> <character> <character> <character> <numeric> <numeric> <character>
#> R2799 R2799 Br1197 DLPFC 8.4 21.840433 M
#> R2809 R2809 Br1245 DLPFC 7.9 42.624028 M
#> Race Dx SAMPLE_ID FQCbasicStats
#> <character> <character> <CharacterList> <CharacterList>
#> R2799 CAUC Bipolar R2799_C0JYLACXX PASS
#> R2809 CAUC Schizo R2809_D2A01ACXX PASS
#> perBaseQual perTileQual perSeqQual perBaseContent
#> <CharacterList> <CharacterList> <CharacterList> <CharacterList>
#> R2799 FAIL WARN/FAIL PASS FAIL
#> R2809 PASS PASS PASS WARN
#> GCcontent Ncontent SeqLengthDist SeqDuplication
#> <CharacterList> <CharacterList> <CharacterList> <CharacterList>
#> R2799 PASS PASS PASS FAIL/WARN
#> R2809 WARN PASS/WARN PASS FAIL
#> OverrepSeqs AdapterContent KmerContent SeqLength_R1
#> <CharacterList> <CharacterList> <CharacterList> <IntegerList>
#> R2799 PASS PASS FAIL 100
#> R2809 PASS PASS FAIL 100
#> percentGC_R1 phred20.21_R1 phred48.49_R1 phred76.77_R1
#> <IntegerList> <NumericList> <NumericList> <NumericList>
#> R2799 49 38 36 31
#> R2809 47 40 40 34.5
#> phred100_R1 phredGT30_R1 phredGT35_R1 Adapter50.51_R1
#> <NumericList> <NumericList> <NumericList> <NumericList>
#> R2799 24 0.656484218539493 0.402606866892709 0.274446735976933
#> R2809 32 0.926266447639032 0.761714416330939 0.0562325324565277
#> Adapter70.71_R1 Adapter88_R1 SeqLength_R2 percentGC_R2
#> <NumericList> <NumericList> <IntegerList> <IntegerList>
#> R2799 0.34763515474898 0.465147635643079 100 49
#> R2809 0.0731546924482436 0.118039040198923 100 47
#> phred20.21_R2 phred48.49_R2 phred76.77_R2 phred100_R2
#> <NumericList> <NumericList> <NumericList> <NumericList>
#> R2799 38 36 33 25
#> R2809 40 39 35 32
#> phredGT30_R2 phredGT35_R2 Adapter50.51_R2
#> <NumericList> <NumericList> <NumericList>
#> R2799 0.61804735116388 0.379082599346615 0.295031868940981
#> R2809 0.856248317693894 0.657797231213979 0.0597478293398095
#> Adapter70.71_R2 Adapter88_R2
#> <NumericList> <NumericList>
#> R2799 0.382246202829382 0.508998289218168
#> R2809 0.0794013186727738 0.118550825070519
#> bamFile
#> <CharacterList>
#> R2799 /dcl01/ajaffe/data/lab/brainseq_phase1/preprocessed_data/HISAT2_out/R2799_C0JYLACXX_accepted_hits.sorted.bam
#> R2809 /dcl01/ajaffe/data/lab/brainseq_phase1/preprocessed_data/HISAT2_out/R2809_D2A01ACXX_accepted_hits.sorted.bam
#> trimmed numReads numMapped numUnmapped
#> <LogicalList> <IntegerList> <IntegerList> <IntegerList>
#> R2799 FALSE 147441360 126030726 21410634
#> R2809 FALSE 147327528 139736578 7590950
#> overallMapRate concordMapRate totalMapped mitoMapped
#> <NumericList> <NumericList> <IntegerList> <IntegerList>
#> R2799 0.8548 0.7923 109183563 24087218
#> R2809 0.9485 0.9033 118274615 32205618
#> mitoRate totalAssignedGene gene_Assigned
#> <NumericList> <NumericList> <IntegerList>
#> R2799 0.180738927312207 0.786930346008069 55947123
#> R2809 0.214018926990896 0.757340150310627 59327435
#> gene_Unassigned_Ambiguity gene_Unassigned_MultiMapping
#> <IntegerList> <IntegerList>
#> R2799 3707625 6836188
#> R2809 3398051 9383985
#> gene_Unassigned_NoFeatures gene_Unassigned_Unmapped
#> <IntegerList> <IntegerList>
#> R2799 4604458 0
#> R2809 6227106 0
#> gene_Unassigned_MappingQuality gene_Unassigned_FragmentLength
#> <IntegerList> <IntegerList>
#> R2799 0 0
#> R2809 0 0
#> gene_Unassigned_Chimera gene_Unassigned_Secondary
#> <IntegerList> <IntegerList>
#> R2799 0 0
#> R2809 0 0
#> gene_Unassigned_Nonjunction gene_Unassigned_Duplicate
#> <IntegerList> <IntegerList>
#> R2799 0 0
#> R2809 0 0
#> rRNA_rate
#> <NumericList>
#> R2799 5.3979540645906e-06
#> R2809 7.60187929918089e-06
#> bigwig_path
#> <CharacterList>
#> R2799 /dcl01/ajaffe/data/lab/brainseq_phase1/preprocessed_data/Coverage/R2799_C0JYLACXX.bw
#> R2809 /dcl01/ajaffe/data/lab/brainseq_phase1/preprocessed_data/Coverage/R2809_D2A01ACXX.bw
#>
#> fnd_sm> ## Subset the first two samples from BrainSEQ Phase II
#> fnd_sm> find_samples('bsp2', sample_ids = c('R10424', 'R12195'))
#> DataFrame with 2 rows and 79 columns
#> SAMPLE_ID FQCbasicStats perBaseQual
#> <CharacterList> <CharacterList> <CharacterList>
#> R10424 R10424_C4KC9ACXX_TAATGCGC_L00 PASS PASS
#> R12195 R12195_C4E9TACXX_CGGCTATG_L00 PASS PASS
#> perTileQual perSeqQual perBaseContent GCcontent
#> <CharacterList> <CharacterList> <CharacterList> <CharacterList>
#> R10424 PASS PASS FAIL/WARN WARN/FAIL
#> R12195 WARN PASS FAIL/WARN WARN/FAIL
#> Ncontent SeqLengthDist SeqDuplication OverrepSeqs
#> <CharacterList> <CharacterList> <CharacterList> <CharacterList>
#> R10424 PASS PASS FAIL WARN
#> R12195 PASS PASS WARN WARN
#> AdapterContent KmerContent percentGC_R1 phred100_R1
#> <CharacterList> <CharacterList> <IntegerList> <NumericList>
#> R10424 PASS FAIL 43 33
#> R12195 PASS FAIL 44 34
#> phredGT30_R1 phredGT35_R1 Adapter88_R1 percentGC_R2
#> <NumericList> <NumericList> <NumericList> <IntegerList>
#> R10424 0.909087695639853 0.708970145082465 2.74234442228771 43
#> R12195 0.906698288285609 0.70642074635339 1.528805377787 44
#> phred100_R2 phredGT30_R2 phredGT35_R2 Adapter88_R2
#> <NumericList> <NumericList> <NumericList> <NumericList>
#> R10424 32 0.861428425268134 0.663018859063129 2.75279721539108
#> R12195 34 0.853310879714613 0.659080309341084 1.54911822156688
#> ERCCsumLogErr
#> <NumericList>
#> R10424 -110.365792667035
#> R12195 -18.3143733824569
#> bamFile
#> <CharacterList>
#> R10424 /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/HISAT2_out/R10424_C4KC9ACXX_TAATGCGC_L00_accepted_hits.sorted.bam
#> R12195 /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/HISAT2_out/R12195_C4E9TACXX_CGGCTATG_L00_accepted_hits.sorted.bam
#> trimmed numReads numMapped numUnmapped
#> <LogicalList> <IntegerList> <IntegerList> <IntegerList>
#> R10424 FALSE 137207346 125018261 12189085
#> R12195 FALSE 186256540 173098203 13158337
#> overallMapRate concordMapRate totalMapped mitoMapped
#> <NumericList> <NumericList> <IntegerList> <IntegerList>
#> R10424 0.9112 0.8586 127414655 2250800
#> R12195 0.9294 0.8862 179828658 2929274
#> mitoRate totalAssignedGene rRNA_rate
#> <NumericList> <NumericList> <NumericList>
#> R10424 0.0173585169619773 0.360011691217133 5.26367757990158e-05
#> R12195 0.0160281634178264 0.40055783013167 7.92850345490864e-05
#> RNum BrNum Region RIN Age Sex
#> <character> <character> <factor> <NumericList> <numeric> <factor>
#> R10424 R10424 Br5168 DLPFC 6.7 64.08 M
#> R12195 R12195 Br5073 DLPFC 8.4 62.61 M
#> Race Dx snpPC1 snpPC2 snpPC3 snpPC4
#> <factor> <factor> <numeric> <numeric> <numeric> <numeric>
#> R10424 CAUC Control 0.0541862 0.00141503 -0.0222983 6.7169e-06
#> R12195 AA SCZD -0.0255144 -0.00401754 0.00553881 -0.000578574
#> snpPC5 snpPC6 snpPC7 snpPC8 snpPC9
#> <numeric> <numeric> <numeric> <numeric> <numeric>
#> R10424 8.66316e-05 -0.00505508 -0.00105689 0.00511064 -0.00143876
#> R12195 -0.00147552 0.00816687 -0.00528997 0.00590834 -0.00625297
#> snpPC10 DLFPC_HIPPO_correlation_geneRpkm
#> <numeric> <numeric>
#> R10424 0.000946483 NA
#> R12195 0.00801729 NA
#> DLFPC_HIPPO_correlation_exonRpkm DLFPC_HIPPO_correlation_jxnRp10m
#> <numeric> <numeric>
#> R10424 NA NA
#> R12195 NA NA
#> DLFPC_HIPPO_correlation_txTpm mean_mitoRate
#> <numeric> <numeric>
#> R10424 NA 0.0173585169619773
#> R12195 NA 0.0160281634178264
#> mean_totalAssignedGene mean_rRNA_rate mean_RIN
#> <numeric> <numeric> <numeric>
#> R10424 0.360011691217133 5.26367757990158e-05 6.7
#> R12195 0.40055783013167 7.92850345490864e-05 8.4
#> agespline_fetal agespline_birth agespline_infant agespline_child
#> <numeric> <numeric> <numeric> <numeric>
#> R10424 0 64.08 63.08 54.08
#> R12195 0 62.61 61.61 52.61
#> agespline_teen agespline_adult protocol
#> <numeric> <numeric> <character>
#> R10424 44.08 14.08 RiboZeroGold
#> R12195 42.61 12.61 RiboZeroGold
#> analysis_regionspecific_adult analysis_regionspecific_prenatal
#> <logical> <logical>
#> R10424 TRUE FALSE
#> R12195 FALSE FALSE
#> analysis_development analysis_sczd_casecontrol_dlpfc
#> <logical> <logical>
#> R10424 TRUE TRUE
#> R12195 FALSE TRUE
#> analysis_sczd_casecontrol_hippo
#> <logical>
#> R10424 FALSE
#> R12195 FALSE
#> analysis_sczd_casecontrol_interaction analysis_eqtl_dlpfc
#> <logical> <logical>
#> R10424 TRUE TRUE
#> R12195 TRUE TRUE
#> analysis_eqtl_hippo analysis_eqtl_interaction
#> <logical> <logical>
#> R10424 FALSE TRUE
#> R12195 FALSE TRUE
#> bigwig_path
#> <CharacterList>
#> R10424 /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R10424_C4KC9ACXX_TAATGCGC_L00.Forward.bw,/dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R10424_C4KC9ACXX_TAATGCGC_L00.Reverse.bw
#> R12195 /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R12195_C4E9TACXX_CGGCTATG_L00.Forward.bw,/dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R12195_C4E9TACXX_CGGCTATG_L00.Reverse.bw
The goal of this package is to provide a way for users to query the expression of some genomic regions of interest among the LIBD RNA-seq public projects. The genomic regions input can either be a GRanges object, or a BED
file which will be turned into aGRanges object automatically when running the libdRSE package.
An example of a BED
file region input:
chr11 27721411 27721469 . 0 -
chr11 27721967 27722058 . 0 -
chr11 27720474 27720688 . 0 +
chr11 27721469 27721967 . 0 +
chr11 27722350 27722486 . 0 +
chr11 27725777 27726863 . 0 +
An example of a GRanges object:
gr0 <- GRanges(Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
IRanges(1:10, width=10:1))
gr0
#> GRanges object with 10 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr2 1-10 *
#> [2] chr2 2-10 *
#> [3] chr2 3-10 *
#> [4] chr2 4-10 *
#> [5] chr1 5-10 *
#> [6] chr1 6-10 *
#> [7] chr3 7-10 *
#> [8] chr3 8-10 *
#> [9] chr3 9-10 *
#> [10] chr3 10 *
#> -------
#> seqinfo: 3 sequences from an unspecified genome; no seqlengths
To make sure that the genomic input regions are valid, use the check_region()
function in the libdRSE package. check_region()
takes the users regions input file and verifies that the regions are compatible with the LIBD project data. If valid, the function will return a GRanges object that can then be used in the libd_rse()
function.
## A test region that overlaps gene SNAP25
snap25 <- GenomicRanges::GRanges('chr20:10286777-10288069:+')
check_region(snap25)
#> 2019-08-16 12:38:56 The 'regions' selected are valid regions.
#> Returning regions as a GRanges object.
#> GRanges object with 1 range and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr20 10286777-10288069 +
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
libd_rse()
is the main function of the libdRSE package. This function extracts the base-pair coverage from the scaled BigWig files for a set of LIBD projects. The output of this is a RangedSummarizedExperiment object which contains a counts
assay with the base-pair coverage data scaled to a library size of 40 million 100 bp reads. The rows represent ranges of interest and columns represent the input samples.
More details on RangedSummarizedExperiment objects can be found in the SummarizedExperiment package vignette.
## Two samples with multiple sequence lanes and one sample that is single
## lane
snap25_rse_mult <- libd_rse(regions = snap25, 'bsp2',
sample_ids = c('R10424', 'R12258', 'R12283'))
#> 2019-08-16 12:38:56 The 'regions' selected are valid regions.
#> Returning regions as a GRanges object.
#> 2019-08-16 12:38:57 processing chromosome chr20
#> 2019-08-16 12:38:57 railMatrix: processing regions 1 to 1
#> 2019-08-16 12:38:57 railMatrix: processing file /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R10424_C4KC9ACXX_TAATGCGC_L00.Forward.bw
#> 2019-08-16 12:38:57 railMatrix: processing file /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R10424_C4KC9ACXX_TAATGCGC_L00.Reverse.bw
#> 2019-08-16 12:38:57 railMatrix: processing file /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R12258_C4L03ACXX_TCCGGAGA_L00.Forward.bw
#> 2019-08-16 12:38:58 railMatrix: processing file /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R12258_C4L03ACXX_TCCGGAGA_L00.Reverse.bw
#> 2019-08-16 12:38:58 railMatrix: processing file /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R12258_C6185ACXX_TCCGGAGA_L005.Forward.bw
#> 2019-08-16 12:38:59 railMatrix: processing file /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R12258_C6185ACXX_TCCGGAGA_L005.Reverse.bw
#> 2019-08-16 12:38:59 railMatrix: processing file /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R12283_C5GKYACXX_AGCGATAG_L00.Forward.bw
#> 2019-08-16 12:39:00 railMatrix: processing file /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R12283_C5GKYACXX_AGCGATAG_L00.Reverse.bw
#> 2019-08-16 12:39:00 railMatrix: processing file /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R12283_C603PACXX_AGCGATAG_L001.Forward.bw
#> 2019-08-16 12:39:00 railMatrix: processing file /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R12283_C603PACXX_AGCGATAG_L001.Reverse.bw
## Explore the resulting RSE object
snap25_rse_mult
#> class: RangedSummarizedExperiment
#> dim: 1 6
#> metadata(0):
#> assays(1): counts
#> rownames: NULL
#> rowData names(0):
#> colnames(6): R10424.Forward R10424.Reverse ... R12283.Forward
#> R12283.Reverse
#> colData names(79): SAMPLE_ID FQCbasicStats ...
#> analysis_eqtl_interaction bigwig_path
colnames(snap25_rse_mult)
#> [1] "R10424.Forward" "R10424.Reverse" "R12258.Forward" "R12258.Reverse"
#> [5] "R12283.Forward" "R12283.Reverse"
If the genomic regions are stranded and the LIBD project queried by the user is stranded (such as bsp2
), then libd_rse()
returns two columns for each sample (one for the forward and one for the reverse strand). In such cases it’s possible to compute the sense and antisense expression for the input genomic regions. sense_antisense()
is a function which takes the users RangedSummarizedExperiment object created wtih libd_rse()
, and splits it into two new RSE objects: sense and antisense. Each of these objects now only has one column per sample. This potentially makes the output easier to interpret based on strandness. Note that the counts are scaled for 40 million 100 bp reads mapped to each strand (forward and reverse) at this point, so a total of 80 million 100 bp reads across both strands.
## Separate into sense and antisense transcription
snap25_sense_antisense <- sense_antisense(snap25_rse_mult)
## Explore the resulting object
snap25_sense_antisense
#> $sense
#> class: RangedSummarizedExperiment
#> dim: 1 3
#> metadata(0):
#> assays(1): counts
#> rownames: NULL
#> rowData names(0):
#> colnames(3): R10424 R12258 R12283
#> colData names(79): SAMPLE_ID FQCbasicStats ...
#> analysis_eqtl_interaction bigwig_path
#>
#> $antisense
#> class: RangedSummarizedExperiment
#> dim: 1 3
#> metadata(0):
#> assays(1): counts
#> rownames: NULL
#> rowData names(0):
#> colnames(3): R10424 R12258 R12283
#> colData names(79): SAMPLE_ID FQCbasicStats ...
#> analysis_eqtl_interaction bigwig_path
colnames(snap25_sense_antisense$sense)
#> [1] "R10424" "R12258" "R12283"
colnames(snap25_sense_antisense$antisense)
#> [1] "R10424" "R12258" "R12283"
The libdRSE package (Straub and Collado-Torres, 2019) was made possible thanks to:
Code for creating the vignette
## Create the vignette
library('rmarkdown')
system.time(render('libdRSE.Rmd'))
## Extract the R code
library('knitr')
knit('libdRSEt.Rmd', tangle = TRUE)
## Clean up
#file.remove('quickstartRef.bib')
Date the vignette was generated.
#> [1] "2019-08-16 12:39:02 EDT"
Wallclock time spent generating the vignette.
#> Time difference of 33.255 secs
R
session information.
#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#> setting value
#> version R version 3.5.3 Patched (2019-03-11 r76311)
#> os Red Hat Enterprise Linux Server release 6.9 (Santiago)
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz US/Eastern
#> date 2019-08-16
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#> package * version date lib source
#> acepack 1.4.1 2016-10-29 [2] CRAN (R 3.5.0)
#> AnnotationDbi 1.44.0 2018-10-30 [1] Bioconductor
#> assertthat 0.2.1 2019-03-21 [2] CRAN (R 3.5.1)
#> backports 1.1.4 2019-04-10 [2] CRAN (R 3.5.3)
#> base64enc 0.1-3 2015-07-28 [2] CRAN (R 3.5.0)
#> bibtex 0.4.2 2017-06-30 [1] CRAN (R 3.5.0)
#> Biobase * 2.42.0 2018-10-30 [2] Bioconductor
#> BiocGenerics * 0.28.0 2018-10-30 [1] Bioconductor
#> BiocManager 1.30.4 2018-11-13 [1] CRAN (R 3.5.1)
#> BiocParallel * 1.16.6 2019-02-10 [1] Bioconductor
#> BiocStyle * 2.10.0 2018-10-30 [1] Bioconductor
#> biomaRt 2.38.0 2018-10-30 [1] Bioconductor
#> Biostrings 2.50.2 2019-01-03 [1] Bioconductor
#> bit 1.1-14 2018-05-29 [2] CRAN (R 3.5.1)
#> bit64 0.9-7 2017-05-08 [2] CRAN (R 3.5.0)
#> bitops 1.0-6 2013-08-17 [2] CRAN (R 3.5.0)
#> blob 1.2.0 2019-07-09 [2] CRAN (R 3.5.3)
#> BSgenome 1.50.0 2018-10-30 [1] Bioconductor
#> bumphunter 1.24.5 2018-12-01 [1] Bioconductor
#> checkmate 1.9.4 2019-07-04 [1] CRAN (R 3.5.3)
#> cli 1.1.0 2019-03-19 [1] CRAN (R 3.5.3)
#> cluster 2.0.7-1 2018-04-13 [3] CRAN (R 3.5.3)
#> codetools 0.2-16 2018-12-24 [3] CRAN (R 3.5.3)
#> colorspace 1.4-1 2019-03-18 [2] CRAN (R 3.5.1)
#> commonmark 1.7 2018-12-01 [1] CRAN (R 3.5.3)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.0)
#> curl 4.0 2019-07-22 [1] CRAN (R 3.5.3)
#> data.table 1.12.2 2019-04-07 [1] CRAN (R 3.5.3)
#> DBI 1.0.0 2018-05-02 [2] CRAN (R 3.5.0)
#> DelayedArray * 0.8.0 2018-10-30 [2] Bioconductor
#> derfinder 1.16.1 2018-12-03 [1] Bioconductor
#> derfinderHelper 1.16.1 2018-12-03 [1] Bioconductor
#> desc 1.2.0 2018-05-01 [2] CRAN (R 3.5.1)
#> digest 0.6.20 2019-07-04 [1] CRAN (R 3.5.3)
#> doRNG 1.7.1 2018-06-22 [2] CRAN (R 3.5.1)
#> dplyr 0.8.3 2019-07-04 [1] CRAN (R 3.5.3)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.5.3)
#> foreach 1.4.7 2019-07-27 [2] CRAN (R 3.5.3)
#> foreign 0.8-71 2018-07-20 [3] CRAN (R 3.5.3)
#> Formula 1.2-3 2018-05-03 [2] CRAN (R 3.5.1)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.5.3)
#> GenomeInfoDb * 1.18.2 2019-02-12 [1] Bioconductor
#> GenomeInfoDbData 1.2.0 2018-11-02 [2] Bioconductor
#> GenomicAlignments 1.18.1 2019-01-04 [1] Bioconductor
#> GenomicFeatures 1.34.8 2019-04-10 [1] Bioconductor
#> GenomicFiles 1.18.0 2018-10-30 [1] Bioconductor
#> GenomicRanges * 1.34.0 2018-10-30 [1] Bioconductor
#> ggplot2 3.2.1 2019-08-10 [1] CRAN (R 3.5.3)
#> glue 1.3.1 2019-03-12 [1] CRAN (R 3.5.1)
#> gridExtra 2.3 2017-09-09 [2] CRAN (R 3.5.0)
#> gtable 0.3.0 2019-03-25 [2] CRAN (R 3.5.1)
#> Hmisc 4.2-0 2019-01-26 [1] CRAN (R 3.5.1)
#> hms 0.5.0 2019-07-09 [2] CRAN (R 3.5.3)
#> htmlTable 1.13.1 2019-01-07 [2] CRAN (R 3.5.1)
#> htmltools 0.3.6 2017-04-28 [2] CRAN (R 3.5.0)
#> htmlwidgets 1.3 2018-09-30 [1] CRAN (R 3.5.1)
#> httr 1.4.1 2019-08-05 [1] CRAN (R 3.5.3)
#> IRanges * 2.16.0 2018-10-30 [1] Bioconductor
#> iterators 1.0.12 2019-07-26 [2] CRAN (R 3.5.3)
#> jsonlite 1.6 2018-12-07 [2] CRAN (R 3.5.1)
#> knitcitations * 1.0.8 2017-07-04 [1] CRAN (R 3.5.0)
#> knitr 1.24 2019-08-08 [1] CRAN (R 3.5.3)
#> lattice 0.20-38 2018-11-04 [3] CRAN (R 3.5.3)
#> latticeExtra 0.6-28 2016-02-09 [2] CRAN (R 3.5.0)
#> lazyeval 0.2.2 2019-03-15 [2] CRAN (R 3.5.1)
#> libdRSE * 0.99.0 2019-08-16 [1] Github (LieberInstitute/libdRSE@aa9bbe5)
#> locfit 1.5-9.1 2013-04-20 [2] CRAN (R 3.5.0)
#> lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.5.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.0)
#> MASS 7.3-51.1 2018-11-01 [3] CRAN (R 3.5.3)
#> Matrix 1.2-15 2018-11-01 [3] CRAN (R 3.5.3)
#> matrixStats * 0.54.0 2018-07-23 [1] CRAN (R 3.5.1)
#> memoise 1.1.0 2017-04-21 [2] CRAN (R 3.5.0)
#> munsell 0.5.0 2018-06-12 [2] CRAN (R 3.5.1)
#> nnet 7.3-12 2016-02-02 [3] CRAN (R 3.5.3)
#> pillar 1.4.2 2019-06-29 [1] CRAN (R 3.5.3)
#> pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.5.1)
#> pkgdown 1.3.0 2018-12-07 [1] CRAN (R 3.5.3)
#> pkgmaker 0.27 2018-05-25 [2] CRAN (R 3.5.1)
#> plyr 1.8.4 2016-06-08 [2] CRAN (R 3.5.0)
#> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.0)
#> progress 1.2.2 2019-05-16 [1] CRAN (R 3.5.3)
#> purrr 0.3.2 2019-03-15 [2] CRAN (R 3.5.1)
#> qvalue 2.14.1 2019-01-10 [1] Bioconductor
#> R6 2.4.0 2019-02-14 [2] CRAN (R 3.5.1)
#> RColorBrewer 1.1-2 2014-12-07 [2] CRAN (R 3.5.0)
#> Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.5.3)
#> RCurl 1.95-4.12 2019-03-04 [2] CRAN (R 3.5.1)
#> RefManageR 1.2.12 2019-04-03 [1] CRAN (R 3.5.3)
#> registry 0.5-1 2019-03-05 [2] CRAN (R 3.5.1)
#> reshape2 1.4.3 2017-12-11 [2] CRAN (R 3.5.0)
#> rlang 0.4.0 2019-06-25 [1] CRAN (R 3.5.3)
#> rmarkdown 1.14 2019-07-12 [1] CRAN (R 3.5.3)
#> rngtools 1.4 2019-07-01 [2] CRAN (R 3.5.3)
#> roxygen2 6.1.1 2018-11-07 [1] CRAN (R 3.5.3)
#> rpart 4.1-13 2018-02-23 [3] CRAN (R 3.5.3)
#> rprojroot 1.3-2 2018-01-03 [2] CRAN (R 3.5.0)
#> Rsamtools 1.34.1 2019-01-31 [1] Bioconductor
#> RSQLite 2.1.2 2019-07-24 [2] CRAN (R 3.5.3)
#> rstudioapi 0.10 2019-03-19 [2] CRAN (R 3.5.1)
#> rtracklayer 1.42.2 2019-03-01 [1] Bioconductor
#> S4Vectors * 0.20.1 2018-11-09 [1] Bioconductor
#> scales 1.0.0 2018-08-09 [2] CRAN (R 3.5.1)
#> sessioninfo * 1.1.1 2018-11-05 [1] CRAN (R 3.5.1)
#> stringi 1.4.3 2019-03-12 [2] CRAN (R 3.5.1)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.5.1)
#> SummarizedExperiment * 1.12.0 2018-10-30 [1] Bioconductor
#> survival 2.43-3 2018-11-26 [3] CRAN (R 3.5.3)
#> tibble 2.1.3 2019-06-06 [1] CRAN (R 3.5.3)
#> tidyselect 0.2.5 2018-10-11 [2] CRAN (R 3.5.1)
#> VariantAnnotation 1.28.13 2019-03-19 [1] Bioconductor
#> vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.5.3)
#> withr 2.1.2 2018-03-15 [2] CRAN (R 3.5.0)
#> xfun 0.8 2019-06-25 [1] CRAN (R 3.5.3)
#> XML 3.98-1.20 2019-06-06 [2] CRAN (R 3.5.3)
#> xml2 1.2.2 2019-08-09 [2] CRAN (R 3.5.3)
#> xtable 1.8-4 2019-04-21 [2] CRAN (R 3.5.3)
#> XVector 0.22.0 2018-10-30 [1] Bioconductor
#> yaml 2.2.0 2018-07-25 [2] CRAN (R 3.5.1)
#> zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.5.3)
#> zlibbioc 1.28.0 2018-10-30 [2] Bioconductor
#>
#> [1] /users/lcollado/R/x86_64-pc-linux-gnu-library/3.5.x
#> [2] /jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-3.5.x/R/3.5.x/lib64/R/site-library
#> [3] /jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-3.5.x/R/3.5.x/lib64/R/library
This vignette was generated using BiocStyle (Oleś, Morgan, and Huber, 2018), knitr (Xie, 2014) and rmarkdown running behind the scenes.
Citations were made with knitcitations (Boettiger, 2017).
[1] S. Arora, M. Morgan, M. Carlson, and H. Pagès. GenomeInfoDb: Utilities for manipulating chromosome and other ‘seqname’ identifiers. 2017. DOI: 10.18129/B9.bioc.GenomeInfoDb.
[1] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.8. 2017. URL: https://CRAN.R-project.org/package=knitcitations.
[1] G. Csárdi, R. core, H. Wickham, W. Chang, et al. sessioninfo: R Session Information. R package version 1.1.1. 2018. URL: https://CRAN.R-project.org/package=sessioninfo.
[1] M. Morgan, V. Obenchain, J. Hester, and H. Pagès. SummarizedExperiment: SummarizedExperiment container. 2017. DOI: 10.18129/B9.bioc.SummarizedExperiment.
[1] A. Oleś, M. Morgan, and W. Huber. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.10.0. 2018. URL: https://github.com/Bioconductor/BiocStyle.
[1] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2019. URL: https://www.R-project.org/.
[1] A. Straub and L. Collado-Torres. Extract coverage from LIBD BigWig files. https://github.com/LieberInstitute/libdRSE - R package version 0.99.0. 2019. URL: https://github.com/LieberInstitute/libdRSE.
[1] H. Wickham. “testthat: Get Started with Testing”. In: The R Journal 3 (2011), pp. 5–10. URL: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.
[1] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. URL: http://www.crcpress.com/product/isbn/9781466561595.