SPEAQeasy requires the following to be installed:
- Java 8 or later
- Singularity or Docker (recommended)
- Python 3 and pip (only for local installation, which is not recommended)
If java is not installed, you can install it on
apt install default-jre, or with a different package manager as appropriate
for your distribution. The above installations are typically done by an administrator.
SPEAQeasy has been tested on Linux, but it designed to run on any of a number of POSIX-compliant systems, including MacOS and FreeBSD.
SPEAQeasy makes use of a number of different additional software tools. The user is provided three options to automatically manage these dependencies.
Docker: The recommended option is to manage software with docker, if it is available. From within the repository, perform the one-time setup by running
bash install_software.sh "docker". This installs nextflow, pulls docker images containing required software, and sets up some test files. When running SPEAQeasy, components of the pipeline run within the associated containers. A full list of the images that are used is here. If root permissions are needed to run docker, one can instruct the installation script to use
sudoin front of any docker commands by running
bash install_software.sh "docker" "sudo".
Singularity: An alternative recommended option is to manage software with singularity, if it is installed. From within the repository, perform the one-time setup by running
bash install_software.sh "singularity". In practice this configures SPEAQeasy to use singularity to run the required software as originally packaged into docker images. A full list of the images that are used is here.
Local install: The alternative is to locally install all dependencies. This option is only officially supported on Linux; it is currently experimental and recommended against on other platforms. Installation is done by running
bash install_software.sh "local"from within the repository. This installs nextflow, several bioinformatics tools, R and packages, and sets up some test files. A full list of software used is here. The script
install_software.shbuilds each software tool from source, and hence relies on some common utilities which are often pre-installed in many unix-like systems:
Note: users at the JHPCE cluster do not need to worry about managing software via the above methods (required software is automatically available through modules). Simply run
bash install_software.sh "jhpce" to install any missing R packages and set up some test files.
Some users may encounter errors during the installation process, particularly when installing software locally. We provide a list below of the most common installation-related issues. Please note that “local” installation is only officially supported on Linux, and Mac users should install in “docker” mode! Below solutions for Mac OS are experimental, and not yet complete.
SPEAQeasy has been tested on:
- CentOS 7 (Linux)
- Ubuntu 20.04.2 LTS (Linux)
On some high-performance-computing clusters,
singularity tends to use an unexpectedly large amount of memory during installation (when converting from the
Docker images we host). This can lead to error messages like the following when performing a singularity-based installation (i.e.
bash install_software.sh singularity):
Sometimes, requesting a potentially extraordinary amount of memory (e.g. > 64GB) while performing the installation is an effective workaround. If not, it may be appropriate to talk to your system administrator about configuring
Singularity (properly editing
singularity.conf) appropriately for a computing-cluster setting.
This is particularly common issue for MacOS users, or Linux users trying to get SPEAQeasy running on a local machine (like a laptop). In either case, we will assume the user has root privileges for the solutions suggested below.
Mac OS users may be missing utilities required for the SPEAQeasy installation. A solution on Mac is to install brew, and install the required utilities through
# Install brew, if you don't already have it /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" # Install most required utilities brew install autoconf automake make gcc zlib bzip2 xz pcre openssl texinfo llvm libomp # Install openJDK8 (Java) brew install --cask homebrew/cask-versions/adoptopenjdk8
It is also recommended that Linux users install some basic dependencies if local installation fails for any reason.
# On Debian or Ubuntu: sudo apt install autoconf automake make gcc zlib1g-dev libbz2-dev liblzma-dev libpcre3-dev libcurl4-openssl-dev texinfo texlive-base default-jre default-jdk # On RedHat or CentOS: sudo yum install autoconf automake make gcc zlib-devel bzip2 bzip2-devel xz-devel pcre-devel curl-devel texi2html texinfo java-1.8.0-openjdk java-1.8.0-openjdk-devel
Users managing dependencies with docker might encounter error messages if docker is not properly configured:
On a computing cluster, a system administrator is responsible for correctly configuring docker so that such errors do not occur. We provide a brief guide here for users who want to set up docker on a local machine.
The “main” script used to run the pipeline depends on the environment you will run it on.
- (Optional) Adjust configuration: hardware resource usage, software versioning, and cluster option choices are specified in conf/slurm.config.
- Modify the main script and run: the main script is run_pipeline_slurm.sh. Submit as a job to your cluster with
sbatch run_pipeline_slurm.sh. See the full list of command-line options for other details about modifying the script for your use-case.
See here for Nextflow’s documentation regarding SLURM environments.
- (Optional) Adjust configuration: hardware resource usage, software versioning, and cluster option choices are specified in conf/sge.config.
- Modify the main script and run: the main script is run_pipeline_sge.sh. Submit as a job to your cluster with
qsub run_pipeline_sge.sh. See the full list of command-line options for other details about modifying the script for your use-case.
See here for additional information on nextflow for SGE environments.
- (Optional) Adjust configuration: hardware resource usage and other configurables are located in conf/local.config. Note that defaults assume access to 8 CPUs and 16GB of RAM.
- Modify the main script and run: the main script is run_pipeline_local.sh. After configuring options for your use-case (See the full list of command-line options), simply run on the command-line with
2.3.4 Run on the JHPCE cluster
- (Optional) Adjust configuration: default configuration with thoroughly testing hardware resource specification is described within
conf/jhpce.config. Other settings, such as annotation release/version, can also be tweaked via this file.
- Modify the main script and run: the “main” script is
run_pipeline_jhpce.sh. The pipeline run is submitted as a job to the cluster by executing
sbatch run_pipeline_jhpce.sh. See the full list of command-line options for other details about modifying the script for your use-case.
Below is a full example of a typical main script, modified from the
run_pipeline_jhpce.sh script. At the top are some cluster-specific options, recognized by SGE, the grid scheduler at the JHPCE cluster. These are optional, and you may consider adding appropriate options similarly, if you plan to use SPEAQeasy on a computing cluster.
After the main call,
nextflow $ORIG_DIR/main.nf, each command option can be described line by line:
--sample "paired": input samples are paired-end
--reference "mm10": these are mouse samples, to be aligned to the mm10 genome
--strand "reverse": the user expects the samples to be reverse-stranded, which SPEAQeasy will verify
--ercc: the samples have ERCC spike-ins, which the pipeline should quantify as a QC measure.
--trim_mode "skip": trimming is not to be performed on any samples
--experiment "mouse_brain": the main pipeline outputs should be labelled with the experiment name “mouse_brain”
/users/neagles/RNA_inputis a directory that contains the
samples.manifestfile, describing the samples.
-profile jhpce: configuration of hardware resource usage, and more detailed pipeline settings, is described at
conf/jhpce.config, since this is a run using the JHPCE cluster
-w "/scratch/nextflow_runs": this is a nextflow-specific command option (note the single dash), telling SPEAQeasy that temporary files for the pipeline run can be placed under
--output "/users/neagles/RNA_output": SPEAQeasy output files should be placed under
#!/bin/bash #SBATCH -q shared #SBATCH --mem=40G #SBATCH --job-name=SPEAQeasy #SBATCH -o ./SPEAQeasy_output.log #SBATCH -e ./SPEAQeasy_output.log # After running 'install_software.sh', this should point to the directory # where SPEAQeasy was installed, and not say "$PWD" ORIG_DIR=/dcl01/lieber/ajaffe/Nick/SPEAQeasy module load nextflow/20.01.0 export _JAVA_OPTIONS="-Xms8g -Xmx10g" nextflow $ORIG_DIR/main.nf \ --sample "paired" \ --reference "mm10" \ --strand "reverse" \ --ercc \ --trim_mode "skip" \ --experiment "mouse_brain" \ --input "/users/neagles/RNA_input" \ -profile jhpce \ -w "/scratch/nextflow_runs" \ --output "/users/neagles/RNA_output" # Log successful runs on non-test data in a central location. Please adjust # the log path here if it is changed at the top! bash $ORIG_DIR/scripts/track_runs.sh $PWD/SPEAQeasy_output.log # Produces a report for each sample tracing the pipeline steps # performed (can be helpful for debugging). # # Note that the reports are generated from the output log produced in the above # section, and so if you rename the log, you must also pass replace the filename # in the bash call below. echo "Generating per-sample logs for debugging..." bash $ORIG_DIR/scripts/generate_logs.sh $PWD/SPEAQeasy_output.log
- If you are installing software to run the pipeline locally, all dependencies are installed into
[repo directory]/Software/, and
[repo directory]/conf/command_paths_long.configis configured to show nextflow the default installation locations of each software tool. Thus, this config file can be tweaked to manually point to different paths, if need be (though this shouldn’t be necessary).
- Nextflow supports the use of Lmod modules to conveniently point the pipeline to the bioinformatics software it needs. If you neither wish to use docker nor wish to install the many dependencies locally– and already have Lmod modules on your cluster– this is another option. In the appropriate config file (as determined in step 3 in the section you choose below), you can include a module specification line in the associated process (such as
module = 'hisat2/2.2.1'for BuildHisatIndex) as configured in conf/jhpce.config. In most cases this will be more work to fully configure, and so running the pipeline with docker or locally installing software is generally recommended instead. See nextflow modules for some more information.