Quick overview on the new Bioconductor 3.8 release
Every six months the Bioconductor project releases itβs new version of packages. This allows developers a time window to try out new methods and test them rigorously before releasing them to the community at large. It also means that this is an exciting time π. With every release there are dozens1 of new software packages. Bioconductor version 3.8 was just released on Halloween: October 31st, 2018. Thus, this is the perfect time to browse through their descriptions and find out whatβs new that can be of use to your research.
Thatβs exactly what our post today is about. We looked at the list of new packages as well as updates to find those that we think could be useful for us. That is, packages that we might want to explore further.
AffiXcan
Impute a GReX (Genetically Regulated Expression) for a set of genes in a sample of individuals, using a method based on the Total Binding Affinity (TBA). Statistical models to impute GReX can be trained with a training dataset where the real total expression values are known.
Looks interesting from the name but the description is too vague.
BiocPkgTools
Bioconductor has a rich ecosystem of metadata around packages, usage, and build status. This package is a simple collection of functions to access that metadata from R. The goal is to expose metadata for data mining and value-added functionality such as package searching, text mining, and analytics on packages.
As Bioconductor developers this package sounds useful. Maybe it can be used to see if any of your packages is broken (errors, warnings) in BioC release or BioC devel.
brainImageR
BrainImageR is a package that provides the user with information of where in the human brain their gene set corresponds to. This is provided both as a continuous variable and as a easily-interpretable image. BrainImageR has additional functionality of identifying approximately when in developmental time that a gene expression dataset corresponds to. Both the spatial gene set enrichment and the developmental time point prediction are assessed in comparison to the Allen Brain Atlas reference data.
Sounds interesting since we work with brain data ourselves. We are curious about where the data comes from!
BUScorrect
High-throughput experimental data are accumulating exponentially in public databases. However, mining valid scientific discoveries from these abundant resources is hampered by technical artifacts and inherent biological heterogeneity. The former are usually termed βbatch effects,β and the latter is often modelled by βsubtypes.β The R package BUScorrect fits a Bayesian hierarchical model, the Batch-effects-correction-with-Unknown-Subtypes model (BUS), to correct batch effects in the presence of unknown subtypes. BUS is capable of (a) correcting batch effects explicitly, (b) grouping samples that share similar characteristics into subtypes, (c) identifying features that distinguish subtypes, and (d) enjoying a linear-order computation complexity.
Hm⦠maybe this can be used with data from recount. We also have to work sometimes with data from multiple labs.
consensusDE
This package allows users to perform DE analysis using multiple algorithms. It seeks consensus from multiple methods. Currently it supports βVoomβ, βEdgeRβ and βDESeqβ, but can be easily extended. It uses RUV-seq (optional) to remove batch effects.
Depending how flexible this new package is, it could be useful for saving time. If itβs not flexible we wonβt really use it.
EnhancedVolcano
Volcano plots represent a useful way to visualise the results of differential expression analyses. Here, we present a highly-configurable function that produces publication-ready volcano plots. EnhancedVolcano will attempt to fit as many transcript names in the plot window as possible, thus avoiding βcloggingβ up the plot with labels that could not otherwise have been read.
Enhanced volcano plots? Cool! We use volcano plots all the time, so they sold us with the name. Plus, who doesnβt want publication ready plots?2
ExCluster
ExCluster flattens Ensembl and GENCODE GTF files into GFF files, which are used to count reads per non-overlapping exon bin from BAM files. This read counting is done using the function featureCounts from the package Rsubread. Library sizes are normalized across all biological replicates, and ExCluster then compares two different conditions to detect signifcantly differentially spliced genes. This process requires at least two independent biological repliates per condition, and ExCluster accepts only exactly two conditions at a time. ExCluster ultimately produces false discovery rates (FDRs) per gene, which are used to detect significance. Exon log2 fold change (log2FC) means and variances may be plotted for each significantly differentially spliced gene, which helps scientists develop hypothesis and target differential splicing events for RT-qPCR validation in the wet lab.
Hm⦠this one could be useful for some future work related to recount. Specially the part about simplifying a GTF file.
maser
This package provides functionalities for analysis, annotation and visualizaton of alternative splicing events.
Visualization of splicing events is something that can be useful. But the description is too vague and will require more investigation.
miRSM
The package aims to identify miRNA sponge modules by integrating expression data and miRNA-target binding information. It provides several functions to study miRNA sponge modules, including popular methods for inferring gene modules (candidate miRNA sponge modules), and a function to identify miRNA sponge modules, as well as a function to conduct functional analysis of miRNA sponge modules.
Iβm still reading the description, hold on! Weβll look into this more!
OUTRIDER
Identification of aberrent gene expression in RNA-seq data. Read count expectations are modeled by an autoencoder to control for confounders in the data. Given these expectations, the RNA-seq read counts are assumed to follow a negative binomial distribution with a gene-specific dispersion. Outliers are then identified as read counts that significantly deviate from this distribution. Further OUTRIDER provides useful plotting function to analyze and visualize the results.
Weβd be interested in testing the performance of OUTRIDER in our already analyzed datasets.
primirTSS
A fast, convenient tool to identify the TSSs of miRNAs by integrating the data of H3K4me3 and Pol II as well as combining the conservation level and sequence feature, provided within both command-line and graphical interfaces, which achieves a better performance than the previous non-cell-specific methods on miRNA TSSs.
We donβt have this kind of data (Pol II ChIP-seq) but it looks useful if you do have it.
TimeSeriesExperiment
Visualization and analysis toolbox for short time course data which includes dimensionality reduction, clustering, two-sample differential expression testing and gene ranking techniques. The package also provides methods for retrieving enriched pathways.
We have time series data and could maybe try this package out for clustering and pathway analyses. We are not sure why it matters that the time course is short, what does this really mean? Could they mean that the time course is complete (no dropouts) unlike a longitudinal time course project? From the vignette: they are talking about small number of time points: that is, statistical methods for this scenario.
tRNA
The tRNA package allows tRNA sequences and structures to be accessed and used for subsetting. In addition, it provides visualization tools to compare feature parameters of multiple tRNA sets and correlate them to additional data. The tRNA package uses GRanges objects as inputs requiring only few additional column data sets.
Wow! tRNAs!? Letβs look into this!
tximeta
Transcript quantification import from Salmon with automatic population of metadata and transcript ranges. Filtered, combined, or de novo transcriptomes can be linked to the appropriate sources with linkedTxomes and shared for reproducible analyses.
Just look at this great video tweet by Michael Love!
.@Bioconductor 3.8 is released, which means so is tximeta! this idea came up more than 2 years ago, to auto-populate metadata for Salmon quant directories. the goal is no more guessing for the data you quantified earlier in a project, or from public archive. here's a demo pic.twitter.com/6r1yoNcIyj
— Michael Love (@mikelove) November 1, 2018
Ularcirc
Ularcirc reads in STAR aligned splice junction files and provides visualisation and analysis tools for splicing analysis. Users can assess backsplice junctions and forward canonical junctions.
We are interested in exploring its visualization tools and if they are specific STAR output files or if this package works with a wider set of aligners.
We also appreciate the name of this package!
Wrapping up
We liked how many of the new software packages emphasized visualization! The package with the best name was COCOA hehe.
We hope that you are as excited as we are about trying out the new Bioconductor 3.8 packages! If we implement any of these packages into our analysis routine we want to come back and write a blog post about them3.
Acknowledgments
This blog post was made possible thanks to:
- BiocStyle (OleΕ, Morgan, and Huber, 2018)
- blogdown (Xie, Hill, and Thomas, 2017)
- knitcitations (Boettiger, 2017)
- sessioninfo (CsΓ‘rdi, core, Wickham, Chang, et al., 2018)
and all the Bioconductor package developers and maintainers!
References
[1] C. Boettiger. knitcitations: Citations for βKnitrβ Markdown Files. R package version 1.0.8. 2017. URL: https://CRAN.R-project.org/package=knitcitations.
[2] G. CsΓ‘rdi, R. core, H. Wickham, W. Chang, et al. sessioninfo: R Session Information. R package version 1.1.0.9000. 2018. URL: https://github.com/r-lib/sessioninfo#readme.
[3] A. OleΕ, M. Morgan and W. Huber. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.10.0. 2018. URL: https://github.com/Bioconductor/BiocStyle.
[4] Y. Xie, A. P. Hill and A. Thomas. blogdown: Creating Websites with R Markdown. ISBN 978-0815363729. Boca Raton, Florida: Chapman and Hall/CRC, 2017. URL: https://github.com/rstudio/blogdown.
Reproducibility
## β Session info βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
## setting value
## version R version 3.5.1 Patched (2018-10-14 r75439)
## os macOS Mojave 10.14.1
## system x86_64, darwin15.6.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz America/New_York
## date 2018-11-02
##
## β Packages βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
## package * version date lib source
## assertthat 0.2.0 2017-04-11 [1] CRAN (R 3.5.0)
## backports 1.1.2 2017-12-13 [1] CRAN (R 3.5.0)
## bibtex 0.4.2 2017-06-30 [1] CRAN (R 3.5.0)
## BiocManager 1.30.3 2018-10-10 [1] CRAN (R 3.5.0)
## BiocStyle * 2.10.0 2018-10-30 [1] Bioconductor
## blogdown 0.9 2018-10-23 [1] CRAN (R 3.5.0)
## bookdown 0.7 2018-02-18 [1] CRAN (R 3.5.0)
## cli 1.0.1 2018-09-25 [1] CRAN (R 3.5.0)
## colorout * 1.2-0 2018-05-03 [1] Github (jalvesaq/colorout@c42088d)
## crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.0)
## digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.0)
## evaluate 0.12 2018-10-09 [1] CRAN (R 3.5.0)
## htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.5.0)
## httr 1.3.1 2017-08-20 [1] CRAN (R 3.5.0)
## jsonlite 1.5 2017-06-01 [1] CRAN (R 3.5.0)
## knitcitations * 1.0.8 2017-07-04 [1] CRAN (R 3.5.0)
## knitr 1.20 2018-02-20 [1] CRAN (R 3.5.0)
## lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.5.0)
## magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.0)
## plyr 1.8.4 2016-06-08 [1] CRAN (R 3.5.0)
## R6 2.3.0 2018-10-04 [1] CRAN (R 3.5.0)
## Rcpp 0.12.19 2018-10-01 [1] CRAN (R 3.5.1)
## RefManageR 1.2.0 2018-04-25 [1] CRAN (R 3.5.0)
## rmarkdown 1.10 2018-06-11 [1] CRAN (R 3.5.0)
## rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.0)
## sessioninfo * 1.1.0.9000 2018-10-02 [1] Github (r-lib/sessioninfo@4f91fad)
## stringi 1.2.4 2018-07-20 [1] CRAN (R 3.5.0)
## stringr 1.3.1 2018-05-10 [1] CRAN (R 3.5.0)
## withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.0)
## xfun 0.4 2018-10-23 [1] CRAN (R 3.5.0)
## xml2 1.2.0 2018-01-24 [1] CRAN (R 3.5.0)
## yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.0)
##
## [1] /Library/Frameworks/R.framework/Versions/3.5devel/Resources/library