2 Setup Details
SPEAQeasy requires that the following be installed:
- Java 8 or later
- Python 3 (tested with 3.7.3), with pip
If java is not installed, you can install it on linux with
apt install default-jre, or with a different package manager you prefer. Python 3 and pip (automatically installed with typical installations of python) are required as well. These installations are typically done by an administrator (they require root access/ use of “sudo”).
SPEAQeasy has been tested on Linux, but it designed to run on any of a number of POSIX-compliant systems, including MacOS and FreeBSD.
SPEAQeasy makes use of a number of different additional software tools. The user is provided three options to automatically manage these dependencies.
Docker: The recommended option is to manage software with docker, if it is available. From within the repository, perform the one-time setup by running
bash install_software.sh "docker". This installs nextflow, pulls docker images containing required software, and sets up some test files. When running SPEAQeasy, components of the pipeline run within the associated containers. A full list of the images that are used is here. If root permissions are needed to run docker, one can instruct the installation script to use
sudoin front of any docker commands by running
bash install_software.sh "docker" "sudo".
Singularity: An alternative recommended option is to manage software with singularity, if it is installed. From within the repository, perform the one-time setup by running
bash install_software.sh "singularity". In practice this configures SPEAQeasy to use singularity to run the required software as originally packaged into docker images. A full list of the images that are used is here.
Local install: The alternative is to locally install all dependencies. This option is only officially supported on Linux; it is currently experimental and recommended against on other platforms. Installation is done by running
bash install_software.sh "local"from within the repository. This installs nextflow, several bioinformatics tools, R and packages, and sets up some test files. A full list of software used is here. The script
install_software.shbuilds each software tool from source, and hence relies on some common utilities which are often pre-installed in many unix-like systems:
Note: users at the JHPCE cluster do not need to worry about managing software via the above methods (required software is automatically available through modules). Simply run
bash install_software.sh "jhpce" to install any missing R packages and set up some test files. Next, make sure you have the following lines added to your
if [[ $HOSTNAME == compute-* ]]; then module use /jhpce/shared/jhpce/modulefiles/libd fi
Some users may encounter errors during the installation process, particularly when installing software locally. We provide a list below of the most common installation-related issues. Please note that “local” installation is only officially supported on Linux, and Mac users should install in “docker” mode! Below solutions for Mac OS are experimental, and not yet complete.
SPEAQeasy has been tested on:
- CentOS 7 (Linux)
- Ubuntu 20.04.2 LTS (Linux)
18.104.22.168 Required utilities are missing
This is particularly common issue for MacOS users, or Linux users trying to get SPEAQeasy running on a local machine (like a laptop). In either case, we will assume the user has root privileges for the solutions suggested below.
Mac OS users may be missing utilities required for the SPEAQeasy installation. A solution on Mac is to install brew, and install the required utilities through
# Install brew, if you don't already have it /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" # Install most required utilities brew install autoconf automake make gcc zlib bzip2 xz pcre openssl texinfo llvm libomp # Install openJDK8 (Java) brew install --cask homebrew/cask-versions/adoptopenjdk8
It is also recommended that Linux users install some basic dependencies if local installation fails for any reason.
# On Debian or Ubuntu: sudo apt install autoconf automake make gcc zlib1g-dev libbz2-dev liblzma-dev libpcre3-dev libcurl4-openssl-dev texinfo texlive-base default-jre default-jdk # On RedHat or CentOS: sudo yum install autoconf automake make gcc zlib-devel bzip2 bzip2-devel xz-devel pcre-devel curl-devel texi2html texinfo java-1.8.0-openjdk java-1.8.0-openjdk-devel
22.214.171.124 Docker permissions issues
Users managing dependencies with docker might encounter error messages if docker is not properly configured:
docker: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/create: dial unix /var/run/docker.sock: connect: permission denied. See 'docker run --help'.
On a computing cluster, a system administrator is responsible for correctly configuring docker so that such errors do not occur. We provide a brief guide here for users who want to set up docker on a local machine.
2.3 Run the Pipeline
The “main” script used to run the pipeline depends on the environment you will run it on.
2.3.1 Run in a SLURM environment/ cluster
- (Optional) Adjust configuration: hardware resource usage, software versioning, and cluster option choices are specified in conf/slurm.config.
- Modify the main script and run: the main script is run_pipeline_slurm.sh. Submit as a job to your cluster with
sbatch run_pipeline_slurm.sh. See the full list of command-line options for other details about modifying the script for your use-case.
See here for Nextflow’s documentation regarding SLURM environments.
2.3.2 Run on a Sun Grid Engines (SGE) cluster
- (Optional) Adjust configuration: hardware resource usage, software versioning, and cluster option choices are specified in conf/sge.config.
- Modify the main script and run: the main script is run_pipeline_sge.sh. Submit as a job to your cluster with
qsub run_pipeline_sge.sh. See the full list of command-line options for other details about modifying the script for your use-case.
See here for additional information on nextflow for SGE environments.
2.3.3 Run locally
- (Optional) Adjust configuration: hardware resource usage and other configurables are located in conf/local.config. Note that defaults assume access to 8 CPUs and 16GB of RAM.
- Modify the main script and run: the main script is run_pipeline_local.sh. After configuring options for your use-case (See the full list of command-line options), simply run on the command-line with
2.3.4 Run on the JHPCE cluster
- (Optional) Adjust configuration: default configuration with thoroughly testing hardware resource specification is described within
conf/jhpce.config. Other settings, such as annotation release/version, can also be tweaked via this file.
- Modify the main script and run: the “main” script is
run_pipeline_jhpce_qsub.sh. The pipeline run is submitted as a job to the cluster by executing
qsub run_pipeline_jhpce.sh. See the full list of command-line options for other details about modifying the script for your use-case.
2.3.5 Example main script
Below is a full example of a typical main script, modified from the
run_pipeline_jhpce.sh script. At the top are some cluster-specific options, recognized by SGE, the grid scheduler at the JHPCE cluster. These are optional, and you may consider adding appropriate options similarly, if you plan to use SPEAQeasy on a computing cluster.
After the main call,
nextflow /dcl01/lieber/ajaffe/Nick/SPEAQeasy/main.nf, each command option can be described line by line:
--sample "paired": input samples are paired-end
--reference "mm10": these are mouse samples, to be aligned to the mm10 genome
--strand "reverse": the user expects the samples to be reverse-stranded, which SPEAQeasy will verify
--ercc: the samples have ERCC spike-ins, which the pipeline should quantify as a QC measure.
--trim_mode "skip": trimming is not to be performed on any samples
--experiment "mouse_brain": the main pipeline outputs should be labelled with the experiment name “mouse_brain”
/users/neagles/RNA_inputis a directory that contains the
samples.manifestfile, describing the samples.
-profile jhpce: configuration of hardware resource usage, and more detailed pipeline settings, is described at
conf/jhpce.config, since this is a run using the JHPCE cluster
-w "/scratch/nextflow_runs": this is a nextflow-specific command option (note the single dash), telling SPEAQeasy that temporary files for the pipeline run can be placed under
--output "/users/neagles/RNA_output": SPEAQeasy output files should be placed under
#!/bin/bash #$ -l bluejay,mem_free=40G,h_vmem=40G,h_fsize=150G #$ -o ./run_brain_subset.log #$ -e ./run_brain_subset.log #$ -cwd module use /jhpce/shared/jhpce/modulefiles/libd module load nextflow export _JAVA_OPTIONS="-Xms8g -Xmx10g" nextflow /dcl01/lieber/ajaffe/Nick/SPEAQeasy/main.nf \ --sample "paired" \ --reference "mm10" \ --strand "reverse" \ --ercc \ --trim_mode "skip" \ --experiment "mouse_brain" \ --input "/users/neagles/RNA_input" \ -profile jhpce \ -w "/scratch/nextflow_runs" \ --output "/users/neagles/RNA_output" # Produces a report for each sample tracing the pipeline steps # performed (can be helpful for debugging). # # Note that the reports are generated from the output log produced in the above # section, and so if you rename the log, you must also pass replace the filename # in the bash call below. echo "Generating per-sample logs for debugging..." bash /dcl01/lieber/ajaffe/Nick/SPEAQeasy/scripts/generate_logs.sh $PWD/run_brain_subset.log
2.3.6 Advanced info regarding installation
- If you are installing software to run the pipeline locally, all dependencies are installed into
[repo directory]/Software/, and
[repo directory]/conf/command_paths_long.configis configured to show nextflow the default installation locations of each software tool. Thus, this config file can be tweaked to manually point to different paths, if need be (though this shouldn’t be necessary).
- Nextflow supports the use of Lmod modules to conveniently point the pipeline to the bioinformatics software it needs. If you neither wish to use docker nor wish to install the many dependencies locally– and already have Lmod modules on your cluster– this is another option. In the appropriate config file (as determined in step 3 in the section you choose below), you can include a module specification line in the associated process (such as
module = 'hisat2/2.2.1'for buildHISATindex) as configured in conf/jhpce.config. In most cases this will be more work to fully configure, and so running the pipeline with docker or locally installing software is generally recommended instead. See nextflow modules for some more information.