Basics

Install libdRSE

R is an open-source statistical environment which can be easily modified to enhance its functionality via packages. libdRSE is a R package available via GitHub. R can be installed on any operating system from CRAN after which you can install libdRSE by using the following commands in your R session:

## If necessary
install.packages('remotes')

## Install libdRSE
remotes::install_github('LieberInstitute/libdRSE')

## Note: this package is currently private, so you'll need to follow these steps:

## Add the GITHUB_PAT environment variable
## The process is described in the help page for remotes::install_github

## Once you have GITHUB_PAT configured, run:
remotes::install_github('LieberInstitute/libdRSE')

Required knowledge

libdRSE is based on many other packages and in particular in those that have implemented the infrastructure needed for dealing with RNA-seq data. That is, packages like rtracklayer that allow you to import the data. A libdRSE user is not expected to deal with those packages directly but will need to be familiar with SummarizedExperiment to understand the results libdRSE generates. It might also prove to be highly beneficial to check the

recount package for accessing data from over 70,000 human RNA-seq samples.
GenomicRanges package for dealing with genome coordinates.
rtracklayer package for creating and reading BED files with R.

If you are asking yourself the question “Where do I start using Bioconductor?” you might be interested in this blog post.

Asking for help

As package developers, we try to explain clearly how to use our packages and in which order to use the functions. But R and Bioconductor have a steep learning curve so it is critical to learn where to ask for help. The blog post quoted above mentions some but we would like to highlight the Bioconductor support site as the main resource for getting help regarding Bioconductor. Other alternatives are available such as creating GitHub issues and tweeting. However, please note that if you want to receive help you should adhere to the posting guidelines. It is particularly critical that you provide a small reproducible example and your session information so package developers can track down the source of the error.

Citing libdRSE

We hope that libdRSE will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!

## Citation info
citation('libdRSE')
#> 
#> Straub A, Collado-Torres L (2019). _Extract coverage from LIBD
#> BigWig files_. https://github.com/LieberInstitute/libdRSE - R
#> package version 0.99.0, <URL:
#> https://github.com/LieberInstitute/libdRSE>.
#> 
#> Jaffe AE, and Richard E. Straub, Shin JH, Tao R, Gao Y,
#> Collado-Torres L, Kam-Thong T, Xi HS, Quan J, Chen Q, Colantuoni
#> C, Ulrich WS, Maher BJ, Deep-Soboslay A, Cross AJ, Brandon NJ,
#> Leek JT, Hyde TM, Kleinman JE, Weinberger DR (2018).
#> "Developmental and genetic regulation of the human cortex
#> transcriptome illuminate schizophrenia pathogenesis." _Nature
#> Neuroscience_, *21*(8), 1117-1125. doi: 10.1038/s41593-018-0197-y
#> (URL: http://doi.org/10.1038/s41593-018-0197-y), <URL:
#> https://doi.org/10.1038/s41593-018-0197-y>.
#> 
#> Collado-Torres L, Burke EE, Peterson A, Shin J, Straub RE,
#> Rajpurohit A, Semick SA, Ulrich WS, Price AJ, Valencia C, Tao R,
#> Deep-Soboslay A, Hyde TM, Kleinman JE, Weinberger DR, Jaffe AE
#> (2019). "Regional Heterogeneity in Gene Expression, Regulation,
#> and Coherence in the Frontal Cortex and Hippocampus across
#> Development and Schizophrenia." _Neuron_, *103*(2), 203-216.e8.
#> doi: 10.1016/j.neuron.2019.05.013 (URL:
#> http://doi.org/10.1016/j.neuron.2019.05.013), <URL:
#> https://doi.org/10.1016/j.neuron.2019.05.013>.
#> 
#> To see these entries in BibTeX format, use 'print(<citation>,
#> bibtex=TRUE)', 'toBibtex(.)', or set
#> 'options(citation.bibtex.max=999)'.

Overview

The libdRSE package was developed as an interface for extracting base-pair coverage from scaled BigWig files for a set of LIBD projects. libdRSE makes it easy to obtain base-pair coverage information such as read counts in easy to use R objects powered by SummarizedExperiment. The quantification provided by libdRSE is similar to the one from the recountWorkflow (Collado-Torres, Nellore, and Jaffe, 2017) and recount (Collado-Torres, Nellore, Kammers, Ellis, et al., 2017). That is, base-pair coverage data is summed over for a set of input genomic regions and scaled to a library size of 40 million 100 base-pair reads, which is what you would get with recount::coverage_matrix() for the over 70,000 human RNA-seq samples uniformly-processed by the recount2 project (Collado-Torres, Nellore, Kammers, Ellis, et al., 2017). With libdRSE however, you get access to similar data for the LIBD public RNA-seq projects, particularly BrainSEQ Phase I (Jaffe, and Richard E. Straub, Shin, Tao, et al., 2018) and BrainSEQ Phase II (Collado-Torres, Burke, Peterson, Shin, et al., 2019).

Through the phenotype tables you can also find the specific BigWig coverage files for the LIBD projects that you can access with other tools. More information on the BigWig file format is available at the UCSC Genome Browser website. NOTE: the data is private at this moment. We are figuring out the best way to host the BigWig files.

Finally, some LIBD projects have stranded information, that is, one BigWig file for each the forward and reverse strand. Stranded data is not present in recount2, so the libdRSE::sense_antisense() function can be useful if you want to explore the expression for the sense and antisense directions on your stranded genomic regions.

To get started, load the libdRSE package.

library(libdRSE)

LIBD Phenotype

BrainSeq aims to characterize the genetic and epigenetic regulation of multiple facets of transcription in distinct brain regions across the human lifespan in samples of major neuropsychiatric disorders and controls.

More BrainSEQ info.

BrainSEQ Consortium (Schubert, O’Donnell, Quan, Wendland, et al., 2015)

BSP1

BrainSEQ Phase 1: Data including DLPFC polyA+ RNA-seq on 738 subjects spanning the human lifespan and four main psychiatric diagnostic groups: schizophrenia disorder (Szhizo), major depression disorder (MDD), bipolar disorder (Bipolar) and neurotypical controls (Control).

dim(find_samples('bsp1'))
#> [1] 732  67
table(find_samples('bsp1')$Dx)
#> 
#> Bipolar Control     MDD  Schizo 
#>      67     336     150     179

If you use this data, please cite BrainSEQ Phase I (Jaffe, and Richard E. Straub, Shin, Tao, et al., 2018).

BSP2

BrainSEQ Phase 2: Data including dorsolateral prefrontal cortex (DLPFC) and hippocampus (HIPPO) RiboZero RNA-seq subjects spanning the lifespan and adult patients with schizophrenia disorder (SCZD) or neurotypical controls (Control).

dim(find_samples('bsp2'))
#> [1] 900  79
with(find_samples('bsp2'), addmargins(table(Dx, Region)))
#>          Region
#> Dx        DLPFC HIPPO Sum
#>   Control   300   314 614
#>   SCZD      153   133 286
#>   Sum       453   447 900

If you use this data, please cite BrainSEQ Phase II (Collado-Torres, Burke, Peterson, Shin, et al., 2019).

Details about the main sample columns is available here.

Obtain phenotype data

find_samples() very simply obtains the phenotype table for the given LIBD project. Inputs to find_samples() include project and sample_ids. Given these inputs, the function returns a DataFrame object (from S4Vectors) with the phenotype table for the given project, and if the user supplies sample_ids, then the phenotype table will be subsetted to those ids.

example('find_samples')
#> 
#> fnd_sm> ## Subset the first two samples from BrainSEQ Phase I
#> fnd_sm> find_samples('bsp1', sample_ids = c('R2799', 'R2809'))
#> DataFrame with 2 rows and 67 columns
#>              RNum       BrNum      Region       RIN       Age         Sex
#>       <character> <character> <character> <numeric> <numeric> <character>
#> R2799       R2799      Br1197       DLPFC       8.4 21.840433           M
#> R2809       R2809      Br1245       DLPFC       7.9 42.624028           M
#>              Race          Dx       SAMPLE_ID   FQCbasicStats
#>       <character> <character> <CharacterList> <CharacterList>
#> R2799        CAUC     Bipolar R2799_C0JYLACXX            PASS
#> R2809        CAUC      Schizo R2809_D2A01ACXX            PASS
#>           perBaseQual     perTileQual      perSeqQual  perBaseContent
#>       <CharacterList> <CharacterList> <CharacterList> <CharacterList>
#> R2799            FAIL       WARN/FAIL            PASS            FAIL
#> R2809            PASS            PASS            PASS            WARN
#>             GCcontent        Ncontent   SeqLengthDist  SeqDuplication
#>       <CharacterList> <CharacterList> <CharacterList> <CharacterList>
#> R2799            PASS            PASS            PASS       FAIL/WARN
#> R2809            WARN       PASS/WARN            PASS            FAIL
#>           OverrepSeqs  AdapterContent     KmerContent  SeqLength_R1
#>       <CharacterList> <CharacterList> <CharacterList> <IntegerList>
#> R2799            PASS            PASS            FAIL           100
#> R2809            PASS            PASS            FAIL           100
#>        percentGC_R1 phred20.21_R1 phred48.49_R1 phred76.77_R1
#>       <IntegerList> <NumericList> <NumericList> <NumericList>
#> R2799            49            38            36            31
#> R2809            47            40            40          34.5
#>         phred100_R1      phredGT30_R1      phredGT35_R1    Adapter50.51_R1
#>       <NumericList>     <NumericList>     <NumericList>      <NumericList>
#> R2799            24 0.656484218539493 0.402606866892709  0.274446735976933
#> R2809            32 0.926266447639032 0.761714416330939 0.0562325324565277
#>          Adapter70.71_R1      Adapter88_R1  SeqLength_R2  percentGC_R2
#>            <NumericList>     <NumericList> <IntegerList> <IntegerList>
#> R2799   0.34763515474898 0.465147635643079           100            49
#> R2809 0.0731546924482436 0.118039040198923           100            47
#>       phred20.21_R2 phred48.49_R2 phred76.77_R2   phred100_R2
#>       <NumericList> <NumericList> <NumericList> <NumericList>
#> R2799            38            36            33            25
#> R2809            40            39            35            32
#>            phredGT30_R2      phredGT35_R2    Adapter50.51_R2
#>           <NumericList>     <NumericList>      <NumericList>
#> R2799  0.61804735116388 0.379082599346615  0.295031868940981
#> R2809 0.856248317693894 0.657797231213979 0.0597478293398095
#>          Adapter70.71_R2      Adapter88_R2
#>            <NumericList>     <NumericList>
#> R2799  0.382246202829382 0.508998289218168
#> R2809 0.0794013186727738 0.118550825070519
#>                                                                                                            bamFile
#>                                                                                                    <CharacterList>
#> R2799 /dcl01/ajaffe/data/lab/brainseq_phase1/preprocessed_data/HISAT2_out/R2799_C0JYLACXX_accepted_hits.sorted.bam
#> R2809 /dcl01/ajaffe/data/lab/brainseq_phase1/preprocessed_data/HISAT2_out/R2809_D2A01ACXX_accepted_hits.sorted.bam
#>             trimmed      numReads     numMapped   numUnmapped
#>       <LogicalList> <IntegerList> <IntegerList> <IntegerList>
#> R2799         FALSE     147441360     126030726      21410634
#> R2809         FALSE     147327528     139736578       7590950
#>       overallMapRate concordMapRate   totalMapped    mitoMapped
#>        <NumericList>  <NumericList> <IntegerList> <IntegerList>
#> R2799         0.8548         0.7923     109183563      24087218
#> R2809         0.9485         0.9033     118274615      32205618
#>                mitoRate totalAssignedGene gene_Assigned
#>           <NumericList>     <NumericList> <IntegerList>
#> R2799 0.180738927312207 0.786930346008069      55947123
#> R2809 0.214018926990896 0.757340150310627      59327435
#>       gene_Unassigned_Ambiguity gene_Unassigned_MultiMapping
#>                   <IntegerList>                <IntegerList>
#> R2799                   3707625                      6836188
#> R2809                   3398051                      9383985
#>       gene_Unassigned_NoFeatures gene_Unassigned_Unmapped
#>                    <IntegerList>            <IntegerList>
#> R2799                    4604458                        0
#> R2809                    6227106                        0
#>       gene_Unassigned_MappingQuality gene_Unassigned_FragmentLength
#>                        <IntegerList>                  <IntegerList>
#> R2799                              0                              0
#> R2809                              0                              0
#>       gene_Unassigned_Chimera gene_Unassigned_Secondary
#>                 <IntegerList>             <IntegerList>
#> R2799                       0                         0
#> R2809                       0                         0
#>       gene_Unassigned_Nonjunction gene_Unassigned_Duplicate
#>                     <IntegerList>             <IntegerList>
#> R2799                           0                         0
#> R2809                           0                         0
#>                  rRNA_rate
#>              <NumericList>
#> R2799  5.3979540645906e-06
#> R2809 7.60187929918089e-06
#>                                                                                bigwig_path
#>                                                                            <CharacterList>
#> R2799 /dcl01/ajaffe/data/lab/brainseq_phase1/preprocessed_data/Coverage/R2799_C0JYLACXX.bw
#> R2809 /dcl01/ajaffe/data/lab/brainseq_phase1/preprocessed_data/Coverage/R2809_D2A01ACXX.bw
#> 
#> fnd_sm> ## Subset the first two samples from BrainSEQ Phase II
#> fnd_sm> find_samples('bsp2', sample_ids = c('R10424', 'R12195'))
#> DataFrame with 2 rows and 79 columns
#>                            SAMPLE_ID   FQCbasicStats     perBaseQual
#>                      <CharacterList> <CharacterList> <CharacterList>
#> R10424 R10424_C4KC9ACXX_TAATGCGC_L00            PASS            PASS
#> R12195 R12195_C4E9TACXX_CGGCTATG_L00            PASS            PASS
#>            perTileQual      perSeqQual  perBaseContent       GCcontent
#>        <CharacterList> <CharacterList> <CharacterList> <CharacterList>
#> R10424            PASS            PASS       FAIL/WARN       WARN/FAIL
#> R12195            WARN            PASS       FAIL/WARN       WARN/FAIL
#>               Ncontent   SeqLengthDist  SeqDuplication     OverrepSeqs
#>        <CharacterList> <CharacterList> <CharacterList> <CharacterList>
#> R10424            PASS            PASS            FAIL            WARN
#> R12195            PASS            PASS            WARN            WARN
#>         AdapterContent     KmerContent  percentGC_R1   phred100_R1
#>        <CharacterList> <CharacterList> <IntegerList> <NumericList>
#> R10424            PASS            FAIL            43            33
#> R12195            PASS            FAIL            44            34
#>             phredGT30_R1      phredGT35_R1     Adapter88_R1  percentGC_R2
#>            <NumericList>     <NumericList>    <NumericList> <IntegerList>
#> R10424 0.909087695639853 0.708970145082465 2.74234442228771            43
#> R12195 0.906698288285609  0.70642074635339   1.528805377787            44
#>          phred100_R2      phredGT30_R2      phredGT35_R2     Adapter88_R2
#>        <NumericList>     <NumericList>     <NumericList>    <NumericList>
#> R10424            32 0.861428425268134 0.663018859063129 2.75279721539108
#> R12195            34 0.853310879714613 0.659080309341084 1.54911822156688
#>            ERCCsumLogErr
#>            <NumericList>
#> R10424 -110.365792667035
#> R12195 -18.3143733824569
#>                                                                                                                                            bamFile
#>                                                                                                                                    <CharacterList>
#> R10424 /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/HISAT2_out/R10424_C4KC9ACXX_TAATGCGC_L00_accepted_hits.sorted.bam
#> R12195 /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/HISAT2_out/R12195_C4E9TACXX_CGGCTATG_L00_accepted_hits.sorted.bam
#>              trimmed      numReads     numMapped   numUnmapped
#>        <LogicalList> <IntegerList> <IntegerList> <IntegerList>
#> R10424         FALSE     137207346     125018261      12189085
#> R12195         FALSE     186256540     173098203      13158337
#>        overallMapRate concordMapRate   totalMapped    mitoMapped
#>         <NumericList>  <NumericList> <IntegerList> <IntegerList>
#> R10424         0.9112         0.8586     127414655       2250800
#> R12195         0.9294         0.8862     179828658       2929274
#>                  mitoRate totalAssignedGene            rRNA_rate
#>             <NumericList>     <NumericList>        <NumericList>
#> R10424 0.0173585169619773 0.360011691217133 5.26367757990158e-05
#> R12195 0.0160281634178264  0.40055783013167 7.92850345490864e-05
#>               RNum       BrNum   Region           RIN       Age      Sex
#>        <character> <character> <factor> <NumericList> <numeric> <factor>
#> R10424      R10424      Br5168    DLPFC           6.7     64.08        M
#> R12195      R12195      Br5073    DLPFC           8.4     62.61        M
#>            Race       Dx     snpPC1      snpPC2     snpPC3       snpPC4
#>        <factor> <factor>  <numeric>   <numeric>  <numeric>    <numeric>
#> R10424     CAUC  Control  0.0541862  0.00141503 -0.0222983   6.7169e-06
#> R12195       AA     SCZD -0.0255144 -0.00401754 0.00553881 -0.000578574
#>             snpPC5      snpPC6      snpPC7     snpPC8      snpPC9
#>          <numeric>   <numeric>   <numeric>  <numeric>   <numeric>
#> R10424 8.66316e-05 -0.00505508 -0.00105689 0.00511064 -0.00143876
#> R12195 -0.00147552  0.00816687 -0.00528997 0.00590834 -0.00625297
#>            snpPC10 DLFPC_HIPPO_correlation_geneRpkm
#>          <numeric>                        <numeric>
#> R10424 0.000946483                               NA
#> R12195  0.00801729                               NA
#>        DLFPC_HIPPO_correlation_exonRpkm DLFPC_HIPPO_correlation_jxnRp10m
#>                               <numeric>                        <numeric>
#> R10424                               NA                               NA
#> R12195                               NA                               NA
#>        DLFPC_HIPPO_correlation_txTpm      mean_mitoRate
#>                            <numeric>          <numeric>
#> R10424                            NA 0.0173585169619773
#> R12195                            NA 0.0160281634178264
#>        mean_totalAssignedGene       mean_rRNA_rate  mean_RIN
#>                     <numeric>            <numeric> <numeric>
#> R10424      0.360011691217133 5.26367757990158e-05       6.7
#> R12195       0.40055783013167 7.92850345490864e-05       8.4
#>        agespline_fetal agespline_birth agespline_infant agespline_child
#>              <numeric>       <numeric>        <numeric>       <numeric>
#> R10424               0           64.08            63.08           54.08
#> R12195               0           62.61            61.61           52.61
#>        agespline_teen agespline_adult     protocol
#>             <numeric>       <numeric>  <character>
#> R10424          44.08           14.08 RiboZeroGold
#> R12195          42.61           12.61 RiboZeroGold
#>        analysis_regionspecific_adult analysis_regionspecific_prenatal
#>                            <logical>                        <logical>
#> R10424                          TRUE                            FALSE
#> R12195                         FALSE                            FALSE
#>        analysis_development analysis_sczd_casecontrol_dlpfc
#>                   <logical>                       <logical>
#> R10424                 TRUE                            TRUE
#> R12195                FALSE                            TRUE
#>        analysis_sczd_casecontrol_hippo
#>                              <logical>
#> R10424                           FALSE
#> R12195                           FALSE
#>        analysis_sczd_casecontrol_interaction analysis_eqtl_dlpfc
#>                                    <logical>           <logical>
#> R10424                                  TRUE                TRUE
#> R12195                                  TRUE                TRUE
#>        analysis_eqtl_hippo analysis_eqtl_interaction
#>                  <logical>                 <logical>
#> R10424               FALSE                      TRUE
#> R12195               FALSE                      TRUE
#>                                                                                                                                                                                                                                                    bigwig_path
#>                                                                                                                                                                                                                                                <CharacterList>
#> R10424 /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R10424_C4KC9ACXX_TAATGCGC_L00.Forward.bw,/dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R10424_C4KC9ACXX_TAATGCGC_L00.Reverse.bw
#> R12195 /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R12195_C4E9TACXX_CGGCTATG_L00.Forward.bw,/dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R12195_C4E9TACXX_CGGCTATG_L00.Reverse.bw

Usage

The goal of this package is to provide a way for users to query the expression of some genomic regions of interest among the LIBD RNA-seq public projects. The genomic regions input can either be a GRanges object, or a BED file which will be turned into aGRanges object automatically when running the libdRSE package.

An example of a BED file region input:

chr11   27721411    27721469    .   0   -
chr11   27721967    27722058    .   0   -
chr11   27720474    27720688    .   0   +
chr11   27721469    27721967    .   0   +
chr11   27722350    27722486    .   0   +
chr11   27725777    27726863    .   0   +

An example of a GRanges object:

gr0 <- GRanges(Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
               IRanges(1:10, width=10:1))
gr0
#> GRanges object with 10 ranges and 0 metadata columns:
#>        seqnames    ranges strand
#>           <Rle> <IRanges>  <Rle>
#>    [1]     chr2      1-10      *
#>    [2]     chr2      2-10      *
#>    [3]     chr2      3-10      *
#>    [4]     chr2      4-10      *
#>    [5]     chr1      5-10      *
#>    [6]     chr1      6-10      *
#>    [7]     chr3      7-10      *
#>    [8]     chr3      8-10      *
#>    [9]     chr3      9-10      *
#>   [10]     chr3        10      *
#>   -------
#>   seqinfo: 3 sequences from an unspecified genome; no seqlengths

Check Input Regions

To make sure that the genomic input regions are valid, use the check_region() function in the libdRSE package. check_region() takes the users regions input file and verifies that the regions are compatible with the LIBD project data. If valid, the function will return a GRanges object that can then be used in the libd_rse() function.

## A test region that overlaps gene SNAP25
snap25 <- GenomicRanges::GRanges('chr20:10286777-10288069:+')
check_region(snap25)
#> 2019-08-16 12:38:56 The 'regions' selected are valid regions.
#>  Returning regions as a GRanges object.
#> GRanges object with 1 range and 0 metadata columns:
#>       seqnames            ranges strand
#>          <Rle>         <IRanges>  <Rle>
#>   [1]    chr20 10286777-10288069      +
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

Create the RangedSummarizedExperiment Object

libd_rse() is the main function of the libdRSE package. This function extracts the base-pair coverage from the scaled BigWig files for a set of LIBD projects. The output of this is a RangedSummarizedExperiment object which contains a counts assay with the base-pair coverage data scaled to a library size of 40 million 100 bp reads. The rows represent ranges of interest and columns represent the input samples.

More details on RangedSummarizedExperiment objects can be found in the SummarizedExperiment package vignette.

## Two samples with multiple sequence lanes and one sample that is single
## lane
snap25_rse_mult <- libd_rse(regions = snap25, 'bsp2',
    sample_ids = c('R10424', 'R12258', 'R12283'))
#> 2019-08-16 12:38:56 The 'regions' selected are valid regions.
#>  Returning regions as a GRanges object.
#> 2019-08-16 12:38:57 processing chromosome chr20
#> 2019-08-16 12:38:57 railMatrix: processing regions 1 to 1
#> 2019-08-16 12:38:57 railMatrix: processing file /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R10424_C4KC9ACXX_TAATGCGC_L00.Forward.bw
#> 2019-08-16 12:38:57 railMatrix: processing file /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R10424_C4KC9ACXX_TAATGCGC_L00.Reverse.bw
#> 2019-08-16 12:38:57 railMatrix: processing file /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R12258_C4L03ACXX_TCCGGAGA_L00.Forward.bw
#> 2019-08-16 12:38:58 railMatrix: processing file /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R12258_C4L03ACXX_TCCGGAGA_L00.Reverse.bw
#> 2019-08-16 12:38:58 railMatrix: processing file /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R12258_C6185ACXX_TCCGGAGA_L005.Forward.bw
#> 2019-08-16 12:38:59 railMatrix: processing file /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R12258_C6185ACXX_TCCGGAGA_L005.Reverse.bw
#> 2019-08-16 12:38:59 railMatrix: processing file /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R12283_C5GKYACXX_AGCGATAG_L00.Forward.bw
#> 2019-08-16 12:39:00 railMatrix: processing file /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R12283_C5GKYACXX_AGCGATAG_L00.Reverse.bw
#> 2019-08-16 12:39:00 railMatrix: processing file /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R12283_C603PACXX_AGCGATAG_L001.Forward.bw
#> 2019-08-16 12:39:00 railMatrix: processing file /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/Coverage/R12283_C603PACXX_AGCGATAG_L001.Reverse.bw

## Explore the resulting RSE object
snap25_rse_mult
#> class: RangedSummarizedExperiment 
#> dim: 1 6 
#> metadata(0):
#> assays(1): counts
#> rownames: NULL
#> rowData names(0):
#> colnames(6): R10424.Forward R10424.Reverse ... R12283.Forward
#>   R12283.Reverse
#> colData names(79): SAMPLE_ID FQCbasicStats ...
#>   analysis_eqtl_interaction bigwig_path
colnames(snap25_rse_mult)
#> [1] "R10424.Forward" "R10424.Reverse" "R12258.Forward" "R12258.Reverse"
#> [5] "R12283.Forward" "R12283.Reverse"

Stranded data

If the genomic regions are stranded and the LIBD project queried by the user is stranded (such as bsp2), then libd_rse() returns two columns for each sample (one for the forward and one for the reverse strand). In such cases it’s possible to compute the sense and antisense expression for the input genomic regions. sense_antisense() is a function which takes the users RangedSummarizedExperiment object created wtih libd_rse(), and splits it into two new RSE objects: sense and antisense. Each of these objects now only has one column per sample. This potentially makes the output easier to interpret based on strandness. Note that the counts are scaled for 40 million 100 bp reads mapped to each strand (forward and reverse) at this point, so a total of 80 million 100 bp reads across both strands.

## Separate into sense and antisense transcription
snap25_sense_antisense <- sense_antisense(snap25_rse_mult)

## Explore the resulting object
snap25_sense_antisense
#> $sense
#> class: RangedSummarizedExperiment 
#> dim: 1 3 
#> metadata(0):
#> assays(1): counts
#> rownames: NULL
#> rowData names(0):
#> colnames(3): R10424 R12258 R12283
#> colData names(79): SAMPLE_ID FQCbasicStats ...
#>   analysis_eqtl_interaction bigwig_path
#> 
#> $antisense
#> class: RangedSummarizedExperiment 
#> dim: 1 3 
#> metadata(0):
#> assays(1): counts
#> rownames: NULL
#> rowData names(0):
#> colnames(3): R10424 R12258 R12283
#> colData names(79): SAMPLE_ID FQCbasicStats ...
#>   analysis_eqtl_interaction bigwig_path
colnames(snap25_sense_antisense$sense)
#> [1] "R10424" "R12258" "R12283"
colnames(snap25_sense_antisense$antisense)
#> [1] "R10424" "R12258" "R12283"

Reproducibility

The libdRSE package (Straub and Collado-Torres, 2019) was made possible thanks to:

Code for creating the vignette

## Create the vignette
library('rmarkdown')
system.time(render('libdRSE.Rmd'))

## Extract the R code
library('knitr')
knit('libdRSEt.Rmd', tangle = TRUE)

## Clean up
#file.remove('quickstartRef.bib')

Date the vignette was generated.

#> [1] "2019-08-16 12:39:02 EDT"

Wallclock time spent generating the vignette.

#> Time difference of 33.255 secs

R session information.

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value                                                 
#>  version  R version 3.5.3 Patched (2019-03-11 r76311)           
#>  os       Red Hat Enterprise Linux Server release 6.9 (Santiago)
#>  system   x86_64, linux-gnu                                     
#>  ui       X11                                                   
#>  language (EN)                                                  
#>  collate  en_US.UTF-8                                           
#>  ctype    en_US.UTF-8                                           
#>  tz       US/Eastern                                            
#>  date     2019-08-16                                            
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  package              * version   date       lib source                                  
#>  acepack                1.4.1     2016-10-29 [2] CRAN (R 3.5.0)                          
#>  AnnotationDbi          1.44.0    2018-10-30 [1] Bioconductor                            
#>  assertthat             0.2.1     2019-03-21 [2] CRAN (R 3.5.1)                          
#>  backports              1.1.4     2019-04-10 [2] CRAN (R 3.5.3)                          
#>  base64enc              0.1-3     2015-07-28 [2] CRAN (R 3.5.0)                          
#>  bibtex                 0.4.2     2017-06-30 [1] CRAN (R 3.5.0)                          
#>  Biobase              * 2.42.0    2018-10-30 [2] Bioconductor                            
#>  BiocGenerics         * 0.28.0    2018-10-30 [1] Bioconductor                            
#>  BiocManager            1.30.4    2018-11-13 [1] CRAN (R 3.5.1)                          
#>  BiocParallel         * 1.16.6    2019-02-10 [1] Bioconductor                            
#>  BiocStyle            * 2.10.0    2018-10-30 [1] Bioconductor                            
#>  biomaRt                2.38.0    2018-10-30 [1] Bioconductor                            
#>  Biostrings             2.50.2    2019-01-03 [1] Bioconductor                            
#>  bit                    1.1-14    2018-05-29 [2] CRAN (R 3.5.1)                          
#>  bit64                  0.9-7     2017-05-08 [2] CRAN (R 3.5.0)                          
#>  bitops                 1.0-6     2013-08-17 [2] CRAN (R 3.5.0)                          
#>  blob                   1.2.0     2019-07-09 [2] CRAN (R 3.5.3)                          
#>  BSgenome               1.50.0    2018-10-30 [1] Bioconductor                            
#>  bumphunter             1.24.5    2018-12-01 [1] Bioconductor                            
#>  checkmate              1.9.4     2019-07-04 [1] CRAN (R 3.5.3)                          
#>  cli                    1.1.0     2019-03-19 [1] CRAN (R 3.5.3)                          
#>  cluster                2.0.7-1   2018-04-13 [3] CRAN (R 3.5.3)                          
#>  codetools              0.2-16    2018-12-24 [3] CRAN (R 3.5.3)                          
#>  colorspace             1.4-1     2019-03-18 [2] CRAN (R 3.5.1)                          
#>  commonmark             1.7       2018-12-01 [1] CRAN (R 3.5.3)                          
#>  crayon                 1.3.4     2017-09-16 [1] CRAN (R 3.5.0)                          
#>  curl                   4.0       2019-07-22 [1] CRAN (R 3.5.3)                          
#>  data.table             1.12.2    2019-04-07 [1] CRAN (R 3.5.3)                          
#>  DBI                    1.0.0     2018-05-02 [2] CRAN (R 3.5.0)                          
#>  DelayedArray         * 0.8.0     2018-10-30 [2] Bioconductor                            
#>  derfinder              1.16.1    2018-12-03 [1] Bioconductor                            
#>  derfinderHelper        1.16.1    2018-12-03 [1] Bioconductor                            
#>  desc                   1.2.0     2018-05-01 [2] CRAN (R 3.5.1)                          
#>  digest                 0.6.20    2019-07-04 [1] CRAN (R 3.5.3)                          
#>  doRNG                  1.7.1     2018-06-22 [2] CRAN (R 3.5.1)                          
#>  dplyr                  0.8.3     2019-07-04 [1] CRAN (R 3.5.3)                          
#>  evaluate               0.14      2019-05-28 [1] CRAN (R 3.5.3)                          
#>  foreach                1.4.7     2019-07-27 [2] CRAN (R 3.5.3)                          
#>  foreign                0.8-71    2018-07-20 [3] CRAN (R 3.5.3)                          
#>  Formula                1.2-3     2018-05-03 [2] CRAN (R 3.5.1)                          
#>  fs                     1.3.1     2019-05-06 [1] CRAN (R 3.5.3)                          
#>  GenomeInfoDb         * 1.18.2    2019-02-12 [1] Bioconductor                            
#>  GenomeInfoDbData       1.2.0     2018-11-02 [2] Bioconductor                            
#>  GenomicAlignments      1.18.1    2019-01-04 [1] Bioconductor                            
#>  GenomicFeatures        1.34.8    2019-04-10 [1] Bioconductor                            
#>  GenomicFiles           1.18.0    2018-10-30 [1] Bioconductor                            
#>  GenomicRanges        * 1.34.0    2018-10-30 [1] Bioconductor                            
#>  ggplot2                3.2.1     2019-08-10 [1] CRAN (R 3.5.3)                          
#>  glue                   1.3.1     2019-03-12 [1] CRAN (R 3.5.1)                          
#>  gridExtra              2.3       2017-09-09 [2] CRAN (R 3.5.0)                          
#>  gtable                 0.3.0     2019-03-25 [2] CRAN (R 3.5.1)                          
#>  Hmisc                  4.2-0     2019-01-26 [1] CRAN (R 3.5.1)                          
#>  hms                    0.5.0     2019-07-09 [2] CRAN (R 3.5.3)                          
#>  htmlTable              1.13.1    2019-01-07 [2] CRAN (R 3.5.1)                          
#>  htmltools              0.3.6     2017-04-28 [2] CRAN (R 3.5.0)                          
#>  htmlwidgets            1.3       2018-09-30 [1] CRAN (R 3.5.1)                          
#>  httr                   1.4.1     2019-08-05 [1] CRAN (R 3.5.3)                          
#>  IRanges              * 2.16.0    2018-10-30 [1] Bioconductor                            
#>  iterators              1.0.12    2019-07-26 [2] CRAN (R 3.5.3)                          
#>  jsonlite               1.6       2018-12-07 [2] CRAN (R 3.5.1)                          
#>  knitcitations        * 1.0.8     2017-07-04 [1] CRAN (R 3.5.0)                          
#>  knitr                  1.24      2019-08-08 [1] CRAN (R 3.5.3)                          
#>  lattice                0.20-38   2018-11-04 [3] CRAN (R 3.5.3)                          
#>  latticeExtra           0.6-28    2016-02-09 [2] CRAN (R 3.5.0)                          
#>  lazyeval               0.2.2     2019-03-15 [2] CRAN (R 3.5.1)                          
#>  libdRSE              * 0.99.0    2019-08-16 [1] Github (LieberInstitute/libdRSE@aa9bbe5)
#>  locfit                 1.5-9.1   2013-04-20 [2] CRAN (R 3.5.0)                          
#>  lubridate              1.7.4     2018-04-11 [1] CRAN (R 3.5.0)                          
#>  magrittr               1.5       2014-11-22 [1] CRAN (R 3.5.0)                          
#>  MASS                   7.3-51.1  2018-11-01 [3] CRAN (R 3.5.3)                          
#>  Matrix                 1.2-15    2018-11-01 [3] CRAN (R 3.5.3)                          
#>  matrixStats          * 0.54.0    2018-07-23 [1] CRAN (R 3.5.1)                          
#>  memoise                1.1.0     2017-04-21 [2] CRAN (R 3.5.0)                          
#>  munsell                0.5.0     2018-06-12 [2] CRAN (R 3.5.1)                          
#>  nnet                   7.3-12    2016-02-02 [3] CRAN (R 3.5.3)                          
#>  pillar                 1.4.2     2019-06-29 [1] CRAN (R 3.5.3)                          
#>  pkgconfig              2.0.2     2018-08-16 [1] CRAN (R 3.5.1)                          
#>  pkgdown                1.3.0     2018-12-07 [1] CRAN (R 3.5.3)                          
#>  pkgmaker               0.27      2018-05-25 [2] CRAN (R 3.5.1)                          
#>  plyr                   1.8.4     2016-06-08 [2] CRAN (R 3.5.0)                          
#>  prettyunits            1.0.2     2015-07-13 [1] CRAN (R 3.5.0)                          
#>  progress               1.2.2     2019-05-16 [1] CRAN (R 3.5.3)                          
#>  purrr                  0.3.2     2019-03-15 [2] CRAN (R 3.5.1)                          
#>  qvalue                 2.14.1    2019-01-10 [1] Bioconductor                            
#>  R6                     2.4.0     2019-02-14 [2] CRAN (R 3.5.1)                          
#>  RColorBrewer           1.1-2     2014-12-07 [2] CRAN (R 3.5.0)                          
#>  Rcpp                   1.0.2     2019-07-25 [1] CRAN (R 3.5.3)                          
#>  RCurl                  1.95-4.12 2019-03-04 [2] CRAN (R 3.5.1)                          
#>  RefManageR             1.2.12    2019-04-03 [1] CRAN (R 3.5.3)                          
#>  registry               0.5-1     2019-03-05 [2] CRAN (R 3.5.1)                          
#>  reshape2               1.4.3     2017-12-11 [2] CRAN (R 3.5.0)                          
#>  rlang                  0.4.0     2019-06-25 [1] CRAN (R 3.5.3)                          
#>  rmarkdown              1.14      2019-07-12 [1] CRAN (R 3.5.3)                          
#>  rngtools               1.4       2019-07-01 [2] CRAN (R 3.5.3)                          
#>  roxygen2               6.1.1     2018-11-07 [1] CRAN (R 3.5.3)                          
#>  rpart                  4.1-13    2018-02-23 [3] CRAN (R 3.5.3)                          
#>  rprojroot              1.3-2     2018-01-03 [2] CRAN (R 3.5.0)                          
#>  Rsamtools              1.34.1    2019-01-31 [1] Bioconductor                            
#>  RSQLite                2.1.2     2019-07-24 [2] CRAN (R 3.5.3)                          
#>  rstudioapi             0.10      2019-03-19 [2] CRAN (R 3.5.1)                          
#>  rtracklayer            1.42.2    2019-03-01 [1] Bioconductor                            
#>  S4Vectors            * 0.20.1    2018-11-09 [1] Bioconductor                            
#>  scales                 1.0.0     2018-08-09 [2] CRAN (R 3.5.1)                          
#>  sessioninfo          * 1.1.1     2018-11-05 [1] CRAN (R 3.5.1)                          
#>  stringi                1.4.3     2019-03-12 [2] CRAN (R 3.5.1)                          
#>  stringr                1.4.0     2019-02-10 [1] CRAN (R 3.5.1)                          
#>  SummarizedExperiment * 1.12.0    2018-10-30 [1] Bioconductor                            
#>  survival               2.43-3    2018-11-26 [3] CRAN (R 3.5.3)                          
#>  tibble                 2.1.3     2019-06-06 [1] CRAN (R 3.5.3)                          
#>  tidyselect             0.2.5     2018-10-11 [2] CRAN (R 3.5.1)                          
#>  VariantAnnotation      1.28.13   2019-03-19 [1] Bioconductor                            
#>  vctrs                  0.2.0     2019-07-05 [1] CRAN (R 3.5.3)                          
#>  withr                  2.1.2     2018-03-15 [2] CRAN (R 3.5.0)                          
#>  xfun                   0.8       2019-06-25 [1] CRAN (R 3.5.3)                          
#>  XML                    3.98-1.20 2019-06-06 [2] CRAN (R 3.5.3)                          
#>  xml2                   1.2.2     2019-08-09 [2] CRAN (R 3.5.3)                          
#>  xtable                 1.8-4     2019-04-21 [2] CRAN (R 3.5.3)                          
#>  XVector                0.22.0    2018-10-30 [1] Bioconductor                            
#>  yaml                   2.2.0     2018-07-25 [2] CRAN (R 3.5.1)                          
#>  zeallot                0.1.0     2018-01-28 [1] CRAN (R 3.5.3)                          
#>  zlibbioc               1.28.0    2018-10-30 [2] Bioconductor                            
#> 
#> [1] /users/lcollado/R/x86_64-pc-linux-gnu-library/3.5.x
#> [2] /jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-3.5.x/R/3.5.x/lib64/R/site-library
#> [3] /jhpce/shared/jhpce/core/conda/miniconda-3/envs/svnR-3.5.x/R/3.5.x/lib64/R/library

Bibliography

This vignette was generated using BiocStyle (Oleś, Morgan, and Huber, 2018), knitr (Xie, 2014) and rmarkdown running behind the scenes.

Citations were made with knitcitations (Boettiger, 2017).

[1] S. Arora, M. Morgan, M. Carlson, and H. Pagès. GenomeInfoDb: Utilities for manipulating chromosome and other ‘seqname’ identifiers. 2017. DOI: 10.18129/B9.bioc.GenomeInfoDb.

[1] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.8. 2017. URL: https://CRAN.R-project.org/package=knitcitations.

[1] G. Csárdi, R. core, H. Wickham, W. Chang, et al. sessioninfo: R Session Information. R package version 1.1.1. 2018. URL: https://CRAN.R-project.org/package=sessioninfo.

[1] M. Morgan, V. Obenchain, J. Hester, and H. Pagès. SummarizedExperiment: SummarizedExperiment container. 2017. DOI: 10.18129/B9.bioc.SummarizedExperiment.

[1] A. Oleś, M. Morgan, and W. Huber. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.10.0. 2018. URL: https://github.com/Bioconductor/BiocStyle.

[1] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2019. URL: https://www.R-project.org/.

[1] A. Straub and L. Collado-Torres. Extract coverage from LIBD BigWig files. https://github.com/LieberInstitute/libdRSE - R package version 0.99.0. 2019. URL: https://github.com/LieberInstitute/libdRSE.

[1] H. Wickham. “testthat: Get Started with Testing”. In: The R Journal 3 (2011), pp. 5–10. URL: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.

[1] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. URL: http://www.crcpress.com/product/isbn/9781466561595.

libdRSE