8 Configuration
- Minimum cutoff minor allele frequency used to filter (‘–maf’ option in plink) Default to maf_min = 0.005 -Minimum Hardy-Weinberg equilibrium exact test p-value threshold (‘–hwe’ option in plink) hwe_min = 0.00001 -Value for the ‘–indep-pairwise’ argument to plink indep_pairwise = “50 5 0.2”
8.1 Specifying Options for your Cluster
In many cases, a user has access to a computing cluster which they intend to run PopTop on. If your cluster is SLURM or SGE-based, the pipeline is pre-configured with options you may be used to specifying (such as disk usage, time for a job to run, etc). However, these are straightforward to modify, should there be a need/desire. Common settings are described in detail below; however, a more comprehensive list of settings from nextflow can be found here.
8.1.1 Time
The maximum allowed run time for a process, or step in the pipeline, can be specified. This may be necessary for users who are charged based on run time for their jobs. 8.1.1.1 Default for all processes The simplest change you may wish to make is to relax time constraints for all processes. The setting for this is here:
executor {
name = 'sge'
queueSize = 40
submitRateLimit = '1 sec'
exitReadTimeout = '30 min'
}
process {
time = 10.hour // this can be adjusted as needed
errorStrategy = { task.exitStatus == 140 ? 'retry' : 'terminate' }
maxRetries = 1
While the syntax is not strict, some examples for properly specifying the option are time = ‘5m’, time = ‘2h’, and time = ‘1d’ (minutes, hours, and days, respectively).
8.1.1.1 Specify per-process
Time restrictions may also be specified for individual workflow steps. This can be done with the same syntax- suppose you wish to request 30 minutes for a given sample to be trimmed (note process names here). In the “process” section of your config, find the section labelled “withName: flipit” as in the example here:
withName: flipit {
cpus = 4
memory = 16.GB
time = '30m'
}
8.1.2 Cluster-specific options
In some cases, you may find it simpler to directly specify options accepted by your cluster. For example, SGE users with a default limit on the maximum file size they may write might specify the following:
withName: flipit {
cpus = 4
memory = '20.GB'
clusterOptions = '-l h_fsize=500G' // This is SGE-specific syntax
}
As with the time option, this can be specified per-process or for all processes. Any option recognized by your cluster may be used.