10 Help
10.1 Common Errors
SPEAQeasy should be configured so that fundamental issues related to pipeline function do not arise. If you encounter an error and believe it to be a design flaw in SPEAQeasy, you can always submit a github issue. However, please take a look at the following common issues:
- A job/ process is not given enough memory or time: pipeline runs on large samples or datasets may require more memory or a higher time limit. When reported correctly, the pipeline will indicate an error status of 140 (for SGE or SLURM environments); however, memory issues can take many forms, and related error messages are not always clear. In this example case, the process PairedEndHISAT failed due to insufficient memory, but indicated a general error status (1):

How Nextflow may report memory-related errors
Attempt to provide the process more memory in your config. In this case the configuration for PairedEndHISAT looks like this (for SGE users):
: PairedEndHISAT {
withName= 4
cpus = '20.GB'
memory = '-l h_fsize=500G -R y'
clusterOptions }
Note that disk space may also be the limitation (particularly for SGE users). See the configuration section for more info.
Strandness for at least one sample disagrees with the asserted value: all samples for a particular pipeline run are expected to have the same “strandness”, specified by the
--strand
command option in your main script. SPEAQeasy infers the strandness for each sample to verify your data has its expected structure; by default, when any sample is determined to have a strandness different than what is specified, the pipeline halts with an error. This is designed to catch faulty data quickly before analysis, but also consider using the--strand_mode "accept"
or--strand_mode "declare"
command options rather than the error-inducing default of--strand_mode "strict"
. In a typical case,--strand_mode "declare"
is preferable to--strand_mode "accept"
when choosing to bypass strandness errors. This is because--strand_mode "declare"
ensures all samples are treated with the same strandness, which will be the case in typical experiments where all samples share the same library preparation kit. However, if samples prepared with different kits are to be processed with SPEAQeasy,--strand_mode "accept"
should be used.A backslash is missing in the main script: in each main script, options passed to nextflow are by default split across several lines for clarity. When changing around options in this script, it is easy to omit a backslash (
\
), which is required for the script to interpret a series of lines of options as a single command. In many cases, the pipeline will run successfully– at least for a large fraction of the workflow– but many desired options may be completely ignored. Here is an example of the mistake in a modified version of therun_pipeline_sge.sh
script:
"--output" line has no "\" at the end, and both the "--input" and
# The "-profile" options will be ignored!
# /Software/nextflow $ORIG_DIR/main.nf \
$ORIG_DIR--sample "single" \
--reference "hg38" \
--strand "unstranded" \
--annotation "$ORIG_DIR/Annotation" \
-with-report execution_reports/pipeline_report.html \
--output "/work/SPEAQeasy/outputs"
--input "/work/SPEAQeasy" \
-profile sge
- Nextflow has trouble communicating with your cluster: a less common issue can occur on slower clusters, related to nextflow failing to poll your grid scheduler (like SGE or SLURM) for information it needs about the jobs that are running. This can show up in an error message like:
Process
PairedEndHISAT (Prefix: Sample_FE2P1_blood)terminated for an unknown reason -- Likely it has been terminated by the external system
. We have found that raising theexitReadTimeout
to a large value (such as 30 minutes) solves this issue, but consider raising it further if needed.
{
executor = 'sge'
name = 40
queueSize = '1 sec'
submitRateLimit = '30 min'
exitReadTimeout }
10.2 Deeper Investigation
If pipeline execution halts with an error, and the cause/solution does not appear to be described in the common issues, what can be done? This section provides information about next steps for debugging a potential problem.
10.2.1 Checking Per-Sample Logs
At the end of each pipeline run (whether complete or after halting due to an error), SPEAQeasy generates logs to trace the exact series of steps each sample underwent. These are designed primarily as a debugging tool, especially to catch issues that may be sample-specific. Identifying the source of an error generally involves the following steps:
- Identify the failing process and sample name: the main output log from nextflow will be called “SPEAQeasy_output.log” by default (it is generally a very long text file). SPEAQeasy automatically compiles the most typically relevant information in this log; however, the error message (if applicable) at the bottom of the log is typically helpful for pinpointing the process and sample where the error occurred. Note: if a sample name is not given with the process name, the process involves all samples. Look for a message similar to the following:

Main error message in SPEAQeasy_output.log
Find the SPEAQeasy-generated log for the failing sample: logs are located under the output folder specified by the
--output
command flag- by default, this will beSPEAQeasy/results/logs/
. Simply find the sample name given by step 1 (or pick any sample if no name was provided).Locate the process from step 1 in the log from step 2: because the processes are in order of reported submission, the failed process will typically be near the bottom of the log. The most immediately informative portion to look for is usually the logged output from executing that process. This is at the bottom between the dashed lines labelled BEGIN LOG and END LOG.

Locating the appropriate process in your log

Locating the output for the process
If the log output does not make it clear what wrong, explore the following:
Check the exit code: this is near the top for a given process. An exit code of 1 is often not very informative by itself (a generic error status), but other values can sometimes inform you of the exact issue. For example, the common exit status of 140 indicates that a process failed due to exceeding memory, disk, CPU, or time constraints (for SGE or SLURM clusters). Most frequently, memory is the limiting factor; consider raising the memory allocated for the process in the relevant configuration file.
Check the content of the working directory: the path to the relevant working directory is given on the first line of the log for a given process. One immediate “red flag” could be a core dump (a file named something like
core.[numbers]
). These files are produced when a process unexpectedly terminates, which often can indicate the process needs more memory or disk space (see configuration). It is also worth checking for other log files (certain processes, like SingleEndHISAT and PairedEndHISAT, have output which is not printed between the BEGIN LOG and END LOG per-sample log sections).
10.3 Resuming
An important feature of SPEAQeasy (because it is based on nextflow) is the ability to resume pipeline execution if an error occurs for any reason. To resume, you must add the -resume
flag to your “main” script, determined here, and rerun. Otherwise, the default is to restart the entire pipeline, regardless of how much progress was made!
10.3.1 Trouble resuming?
In some cases, nextflow will improperly determine which processes have finished, when a pipeline run halts with an error. This can be related to how your filesystem communicates with nextflow. We have found that specifying cache = 'lenient'
in the process section of the config fixes issues with resuming (such as re-running a process which had actually completed). See nextflow’s documentation on this. In your config, this could look like:
{
process
= { 10.hour * task.attempt }
time = { task.exitStatus == 140 ? 'retry' : 'terminate' }
errorStrategy = 1
maxRetries
// this can be placed here or anywhere inside the 'process' brackets starting
// at the top of this code chunk
= 'lenient'
cache
: PullAssemblyFasta {
withName= 1
cpus = '-l mf=2G,h_vmem=2G'
clusterOptions }
This is the default in jhpce.config
, sge.config
, and docker_sge.config
, but may be needed on other cluster types as well.
10.4 No Internet Access?
- For users who do not have internet access when executing pipeline runs, you may first run
bash scripts/manual_annotation.sh
. This script must be run from the repository directory (from a machine with internet access). Modify the four lines in the “user configuration section” at the top of the script for your particular set-up. This sets up everything so that subsequent runs of the pipeline do not need an internet connection to complete.
10.5 Docker help
For those who wish to use docker to manage SPEAQeasy software dependencies, we provide a brief set-up guide.
- Install docker
A set of instructions for different operating systems are available on the Docker site.
- Create a docker group
sudo addgroup docker
- Add user to docker group
sudo usermod -aG docker <your_user>
- Checking installation
Log out and log back in to ensure your user is running with the correct permissions.
Test Docker installation by running:
docker run hello-world