Quick overview on the new Bioconductor 3.8 release

Nov 2, 2018 9 min read rstats

Every six months the Bioconductor project releases it’s new version of packages. This allows developers a time window to try out new methods and test them rigorously before releasing them to the community at large. It also means that this is an exciting time 🎉. With every release there are dozens¹ of new software packages. Bioconductor version 3.8 was just released on Halloween: October 31st, 2018. Thus, this is the perfect time to browse through their descriptions and find out what’s new that can be of use to your research.

That’s exactly what our post today is about. We looked at the list of new packages as well as updates to find those that we think could be useful for us. That is, packages that we might want to explore further.

AffiXcan

AffiXcan

Impute a GReX (Genetically Regulated Expression) for a set of genes in a sample of individuals, using a method based on the Total Binding Affinity (TBA). Statistical models to impute GReX can be trained with a training dataset where the real total expression values are known.

Looks interesting from the name but the description is too vague.

BiocPkgTools

BiocPkgTools

Bioconductor has a rich ecosystem of metadata around packages, usage, and build status. This package is a simple collection of functions to access that metadata from R. The goal is to expose metadata for data mining and value-added functionality such as package searching, text mining, and analytics on packages.

As Bioconductor developers this package sounds useful. Maybe it can be used to see if any of your packages is broken (errors, warnings) in BioC release or BioC devel.

brainImageR

brainImageR

BrainImageR is a package that provides the user with information of where in the human brain their gene set corresponds to. This is provided both as a continuous variable and as a easily-interpretable image. BrainImageR has additional functionality of identifying approximately when in developmental time that a gene expression dataset corresponds to. Both the spatial gene set enrichment and the developmental time point prediction are assessed in comparison to the Allen Brain Atlas reference data.

Sounds interesting since we work with brain data ourselves. We are curious about where the data comes from!

BUScorrect

BUScorrect

High-throughput experimental data are accumulating exponentially in public databases. However, mining valid scientific discoveries from these abundant resources is hampered by technical artifacts and inherent biological heterogeneity. The former are usually termed “batch effects,” and the latter is often modelled by “subtypes.” The R package BUScorrect fits a Bayesian hierarchical model, the Batch-effects-correction-with-Unknown-Subtypes model (BUS), to correct batch effects in the presence of unknown subtypes. BUS is capable of (a) correcting batch effects explicitly, (b) grouping samples that share similar characteristics into subtypes, (c) identifying features that distinguish subtypes, and (d) enjoying a linear-order computation complexity.

Hm… maybe this can be used with data from recount. We also have to work sometimes with data from multiple labs.

consensusDE

consensusDE

This package allows users to perform DE analysis using multiple algorithms. It seeks consensus from multiple methods. Currently it supports “Voom”, “EdgeR” and “DESeq”, but can be easily extended. It uses RUV-seq (optional) to remove batch effects.

Depending how flexible this new package is, it could be useful for saving time. If it’s not flexible we won’t really use it.

EnhancedVolcano

EnhancedVolcano

Volcano plots represent a useful way to visualise the results of differential expression analyses. Here, we present a highly-configurable function that produces publication-ready volcano plots. EnhancedVolcano will attempt to fit as many transcript names in the plot window as possible, thus avoiding ‘clogging’ up the plot with labels that could not otherwise have been read.

Enhanced volcano plots? Cool! We use volcano plots all the time, so they sold us with the name. Plus, who doesn’t want publication ready plots?²

ExCluster

ExCluster

ExCluster flattens Ensembl and GENCODE GTF files into GFF files, which are used to count reads per non-overlapping exon bin from BAM files. This read counting is done using the function featureCounts from the package Rsubread. Library sizes are normalized across all biological replicates, and ExCluster then compares two different conditions to detect signifcantly differentially spliced genes. This process requires at least two independent biological repliates per condition, and ExCluster accepts only exactly two conditions at a time. ExCluster ultimately produces false discovery rates (FDRs) per gene, which are used to detect significance. Exon log2 fold change (log2FC) means and variances may be plotted for each significantly differentially spliced gene, which helps scientists develop hypothesis and target differential splicing events for RT-qPCR validation in the wet lab.

Hm… this one could be useful for some future work related to recount. Specially the part about simplifying a GTF file.

maser

maser

This package provides functionalities for analysis, annotation and visualizaton of alternative splicing events.

Visualization of splicing events is something that can be useful. But the description is too vague and will require more investigation.

miRSM

miRSM

The package aims to identify miRNA sponge modules by integrating expression data and miRNA-target binding information. It provides several functions to study miRNA sponge modules, including popular methods for inferring gene modules (candidate miRNA sponge modules), and a function to identify miRNA sponge modules, as well as a function to conduct functional analysis of miRNA sponge modules.

I’m still reading the description, hold on! We’ll look into this more!

OUTRIDER

OUTRIDER

Identification of aberrent gene expression in RNA-seq data. Read count expectations are modeled by an autoencoder to control for confounders in the data. Given these expectations, the RNA-seq read counts are assumed to follow a negative binomial distribution with a gene-specific dispersion. Outliers are then identified as read counts that significantly deviate from this distribution. Further OUTRIDER provides useful plotting function to analyze and visualize the results.

We’d be interested in testing the performance of OUTRIDER in our already analyzed datasets.

primirTSS

primirTSS

A fast, convenient tool to identify the TSSs of miRNAs by integrating the data of H3K4me3 and Pol II as well as combining the conservation level and sequence feature, provided within both command-line and graphical interfaces, which achieves a better performance than the previous non-cell-specific methods on miRNA TSSs.

We don’t have this kind of data (Pol II ChIP-seq) but it looks useful if you do have it.

TimeSeriesExperiment

TimeSeriesExperiment

Visualization and analysis toolbox for short time course data which includes dimensionality reduction, clustering, two-sample differential expression testing and gene ranking techniques. The package also provides methods for retrieving enriched pathways.

We have time series data and could maybe try this package out for clustering and pathway analyses. We are not sure why it matters that the time course is short, what does this really mean? Could they mean that the time course is complete (no dropouts) unlike a longitudinal time course project? From the vignette: they are talking about small number of time points: that is, statistical methods for this scenario.

tRNA

tRNA

The tRNA package allows tRNA sequences and structures to be accessed and used for subsetting. In addition, it provides visualization tools to compare feature parameters of multiple tRNA sets and correlate them to additional data. The tRNA package uses GRanges objects as inputs requiring only few additional column data sets.

Wow! tRNAs!? Let’s look into this!

tximeta

tximeta

Transcript quantification import from Salmon with automatic population of metadata and transcript ranges. Filtered, combined, or de novo transcriptomes can be linked to the appropriate sources with linkedTxomes and shared for reproducible analyses.

Just look at this great video tweet by Michael Love!

.@Bioconductor 3.8 is released, which means so is tximeta! this idea came up more than 2 years ago, to auto-populate metadata for Salmon quant directories. the goal is no more guessing for the data you quantified earlier in a project, or from public archive. here's a demo pic.twitter.com/6r1yoNcIyj
— Michael Love (@mikelove) November 1, 2018

Ularcirc

Ularcirc

Ularcirc reads in STAR aligned splice junction files and provides visualisation and analysis tools for splicing analysis. Users can assess backsplice junctions and forward canonical junctions.

We are interested in exploring its visualization tools and if they are specific STAR output files or if this package works with a wider set of aligners.

We also appreciate the name of this package!

Wrapping up

We liked how many of the new software packages emphasized visualization! The package with the best name was COCOA hehe.

We hope that you are as excited as we are about trying out the new Bioconductor 3.8 packages! If we implement any of these packages into our analysis routine we want to come back and write a blog post about them³.

Acknowledgments

This blog post was made possible thanks to:

and all the Bioconductor package developers and maintainers!

References

[1] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.8. 2017. URL: https://CRAN.R-project.org/package=knitcitations.

[2] G. Csárdi, R. core, H. Wickham, W. Chang, et al. sessioninfo: R Session Information. R package version 1.1.0.9000. 2018. URL: https://github.com/r-lib/sessioninfo#readme.

[3] A. Oleś, M. Morgan and W. Huber. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.10.0. 2018. URL: https://github.com/Bioconductor/BiocStyle.

[4] Y. Xie, A. P. Hill and A. Thomas. blogdown: Creating Websites with R Markdown. ISBN 978-0815363729. Boca Raton, Florida: Chapman and Hall/CRC, 2017. URL: https://github.com/rstudio/blogdown.

Reproducibility

## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                                      
##  version  R version 3.5.1 Patched (2018-10-14 r75439)
##  os       macOS Mojave 10.14.1                       
##  system   x86_64, darwin15.6.0                       
##  ui       X11                                        
##  language (EN)                                       
##  collate  en_US.UTF-8                                
##  ctype    en_US.UTF-8                                
##  tz       America/New_York                           
##  date     2018-11-02                                 
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package       * version    date       lib source                            
##  assertthat      0.2.0      2017-04-11 [1] CRAN (R 3.5.0)                    
##  backports       1.1.2      2017-12-13 [1] CRAN (R 3.5.0)                    
##  bibtex          0.4.2      2017-06-30 [1] CRAN (R 3.5.0)                    
##  BiocManager     1.30.3     2018-10-10 [1] CRAN (R 3.5.0)                    
##  BiocStyle     * 2.10.0     2018-10-30 [1] Bioconductor                      
##  blogdown        0.9        2018-10-23 [1] CRAN (R 3.5.0)                    
##  bookdown        0.7        2018-02-18 [1] CRAN (R 3.5.0)                    
##  cli             1.0.1      2018-09-25 [1] CRAN (R 3.5.0)                    
##  colorout      * 1.2-0      2018-05-03 [1] Github (jalvesaq/colorout@c42088d)
##  crayon          1.3.4      2017-09-16 [1] CRAN (R 3.5.0)                    
##  digest          0.6.18     2018-10-10 [1] CRAN (R 3.5.0)                    
##  evaluate        0.12       2018-10-09 [1] CRAN (R 3.5.0)                    
##  htmltools       0.3.6      2017-04-28 [1] CRAN (R 3.5.0)                    
##  httr            1.3.1      2017-08-20 [1] CRAN (R 3.5.0)                    
##  jsonlite        1.5        2017-06-01 [1] CRAN (R 3.5.0)                    
##  knitcitations * 1.0.8      2017-07-04 [1] CRAN (R 3.5.0)                    
##  knitr           1.20       2018-02-20 [1] CRAN (R 3.5.0)                    
##  lubridate       1.7.4      2018-04-11 [1] CRAN (R 3.5.0)                    
##  magrittr        1.5        2014-11-22 [1] CRAN (R 3.5.0)                    
##  plyr            1.8.4      2016-06-08 [1] CRAN (R 3.5.0)                    
##  R6              2.3.0      2018-10-04 [1] CRAN (R 3.5.0)                    
##  Rcpp            0.12.19    2018-10-01 [1] CRAN (R 3.5.1)                    
##  RefManageR      1.2.0      2018-04-25 [1] CRAN (R 3.5.0)                    
##  rmarkdown       1.10       2018-06-11 [1] CRAN (R 3.5.0)                    
##  rprojroot       1.3-2      2018-01-03 [1] CRAN (R 3.5.0)                    
##  sessioninfo   * 1.1.0.9000 2018-10-02 [1] Github (r-lib/sessioninfo@4f91fad)
##  stringi         1.2.4      2018-07-20 [1] CRAN (R 3.5.0)                    
##  stringr         1.3.1      2018-05-10 [1] CRAN (R 3.5.0)                    
##  withr           2.1.2      2018-03-15 [1] CRAN (R 3.5.0)                    
##  xfun            0.4        2018-10-23 [1] CRAN (R 3.5.0)                    
##  xml2            1.2.0      2018-01-24 [1] CRAN (R 3.5.0)                    
##  yaml            2.2.0      2018-07-25 [1] CRAN (R 3.5.0)                    
## 
## [1] /Library/Frameworks/R.framework/Versions/3.5devel/Resources/library

Soon it’ll be in the hundreds! Wow!↩
We’ll see if that’s true hehe.↩
Time permitting :P↩

bioconductor

Continuous rstats learning

We are researchers at the @LieberInstitute, blogging about R packages, how-to guides & occasionally our own open-source software (opinions r our own) #rstats