Introduction to slurmjobs

Basics

Install `slurmjobs`

R is an open-source statistical environment which can be easily modified to enhance its functionality via packages. slurmjobs is a R package available via the Bioconductor repository for packages. R can be installed on any operating system from CRAN after which you can install slurmjobs by using the following commands in your R session:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("slurmjobs")

## Check that you have a valid Bioconductor installation
BiocManager::valid()

Required knowledge

slurmjobs is designed for interacting with the SLURM job scheduler, and assumes basic familiarity with terms like “job”, “task”, and “array”, as well as the sbatch command. Background knowledge about memory (such as virtual memory and resident set size (RSS)) is helpful but not critical in using this package.

If you are asking yourself the question “Where do I start using Bioconductor?” you might be interested in this blog post.

Asking for help

As package developers, we try to explain clearly how to use our packages and in which order to use the functions. But R and Bioconductor have a steep learning curve so it is critical to learn where to ask for help. The blog post quoted above mentions some but we would like to highlight the Bioconductor support site as the main resource for getting help: remember to use the slurmjobs tag and check the older posts. Other alternatives are available such as creating GitHub issues and tweeting. However, please note that if you want to receive help you should adhere to the posting guidelines. It is particularly critical that you provide a small reproducible example and your session information so package developers can track down the source of the error.

Citing `slurmjobs`

We hope that slurmjobs will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!

## Citation info
citation("slurmjobs")
#> To cite package 'slurmjobs' in publications use:
#> 
#>   LieberInstitute (2025). _slurmjobs: Helper Functions for SLURM Jobs_.
#>   doi:10.18129/B9.bioc.slurmjobs
#>   <https://doi.org/10.18129/B9.bioc.slurmjobs>,
#>   https://github.com/LieberInstitute/slurmjobs/slurmjobs - R package
#>   version 1.3.0, <http://www.bioconductor.org/packages/slurmjobs>.
#> 
#>   LieberInstitute (2025). "slurmjobs: Helper Functions for SLURM Jobs."
#>   _bioRxiv_. doi:10.1101/TODO <https://doi.org/10.1101/TODO>,
#>   <https://www.biorxiv.org/content/10.1101/TODO>.
#> 
#> To see these entries in BibTeX format, use 'print(<citation>,
#> bibtex=TRUE)', 'toBibtex(.)', or set
#> 'options(citation.bibtex.max=999)'.

Overview

slurmjobs provides helper functions for interacting with SLURM-managed high-performance-computing environments from R. It includes functions for creating submittable jobs (including array jobs), monitoring partitions, and extracting info about running or complete jobs. In addition to loading slurmjobs, we’ll be using dplyr to manipulate example data about jobs.

library("slurmjobs")
library("dplyr")

Creating Shell Scripts to `sbatch`

When processing data on a SLURM-managed system, primarily running R code, you’ll likely find yourself writing many “wrapper” shell scripts that can be submitted via sbatch to the job scheduler. This process requires precise SLURM-specific syntax and a large amount of repetition. job_single aims to reduce required configuration from the user to just a handful of options that tend to vary most often between shell scripts (e.g. memory, number of CPUs, time limit), and automate the rest of the shell-script-creation process.

Shell scripts created by job_single log key reproducibility information, such as the user, job ID, job name, node name, and when the job starts and ends.

#   With 'create_shell = FALSE', the contents of the potential shell script are
#   only printed to the screen
job_single(
    name = "my_shell_script", memory = "10G", cores = 2, create_shell = FALSE
)
#> 2025-02-04 17:57:12.289844 creating the logs directory at:  logs
#> #!/bin/bash
#> #SBATCH -p shared
#> #SBATCH --mem=10G
#> #SBATCH --job-name=my_shell_script
#> #SBATCH -c 2
#> #SBATCH -t 1-00:00:00
#> #SBATCH -o logs/my_shell_script.txt
#> #SBATCH -e logs/my_shell_script.txt
#> #SBATCH --mail-type=ALL
#> 
#> set -e
#> 
#> echo "**** Job starts ****"
#> date
#> 
#> echo "**** JHPCE info ****"
#> echo "User: ${USER}"
#> echo "Job id: ${SLURM_JOB_ID}"
#> echo "Job name: ${SLURM_JOB_NAME}"
#> echo "Node name: ${HOSTNAME}"
#> echo "Task id: ${SLURM_ARRAY_TASK_ID}"
#> 
#> ## Load the R module
#> module load conda_R/4.4
#> 
#> ## List current modules for reproducibility
#> module list
#> 
#> ## Edit with your job command
#> Rscript -e "options(width = 120); sessioninfo::session_info()"
#> 
#> echo "**** Job ends ****"
#> date
#> 
#> ## This script was made using slurmjobs version 1.3.0
#> ## available from http://research.libd.org/slurmjobs/

Similarly, we can specify task_num to create an array job– in this case, one with 10 tasks.

job_single(
    name = "my_array_job", memory = "5G", cores = 1, create_shell = FALSE,
    task_num = 10
)
#> 2025-02-04 17:57:12.362359 creating the logs directory at:  logs
#> #!/bin/bash
#> #SBATCH -p shared
#> #SBATCH --mem=5G
#> #SBATCH --job-name=my_array_job
#> #SBATCH -c 1
#> #SBATCH -t 1-00:00:00
#> #SBATCH -o logs/my_array_job.%a.txt
#> #SBATCH -e logs/my_array_job.%a.txt
#> #SBATCH --mail-type=ALL
#> #SBATCH --array=1-10%20
#> 
#> set -e
#> 
#> echo "**** Job starts ****"
#> date
#> 
#> echo "**** JHPCE info ****"
#> echo "User: ${USER}"
#> echo "Job id: ${SLURM_JOB_ID}"
#> echo "Job name: ${SLURM_JOB_NAME}"
#> echo "Node name: ${HOSTNAME}"
#> echo "Task id: ${SLURM_ARRAY_TASK_ID}"
#> 
#> ## Load the R module
#> module load conda_R/4.4
#> 
#> ## List current modules for reproducibility
#> module list
#> 
#> ## Edit with your job command
#> Rscript -e "options(width = 120); sessioninfo::session_info()"
#> 
#> echo "**** Job ends ****"
#> date
#> 
#> ## This script was made using slurmjobs version 1.3.0
#> ## available from http://research.libd.org/slurmjobs/

Another function, job_loop(), can be used to create more complex array jobs as compared with job_single(). It’s useful when looping through one or more variables with pre-defined values, and applying the same processing steps. The key difference is that rather than specifying task_num, you specify loops, a named list of variables to loop through. An array job then gets created that can directly refer to the values of these variables, rather than referring to just the array’s task ID.

job_loop(), unlike job_single(), also creates an R script. The idea is that the shell script invokes the R script internally, with a particular combination of variables. The getopt package is then used to read in this combination from the command line, so that each variable can be accessed by name in R. Let’s make that a bit more concrete.

#   'job_loop' returns a list containing the contents of the to-be-created shell
#   and R scripts. Let's take a look at the shell script first
script_pair <- job_loop(
    loops = list(region = c("DLPFC", "HIPPO"), feature = c("gene", "exon", "tx", "jxn")),
    name = "bsp2_test", create_logdir = FALSE
)
cat(script_pair[["shell"]], sep = "\n")
#> #!/bin/bash
#> #SBATCH -p shared
#> #SBATCH --mem=10G
#> #SBATCH --job-name=bsp2_test
#> #SBATCH -c 1
#> #SBATCH -t 1-00:00:00
#> #SBATCH -o /dev/null
#> #SBATCH -e /dev/null
#> #SBATCH --mail-type=ALL
#> #SBATCH --array=1-8%20
#> 
#> ## Define loops and appropriately subset each variable for the array task ID
#> all_region=(DLPFC HIPPO)
#> region=${all_region[$(( $SLURM_ARRAY_TASK_ID / 4 % 2 ))]}
#> 
#> all_feature=(gene exon tx jxn)
#> feature=${all_feature[$(( $SLURM_ARRAY_TASK_ID / 1 % 4 ))]}
#> 
#> ## Explicitly pipe script output to a log
#> log_path=logs/bsp2_test_${region}_${feature}_${SLURM_ARRAY_TASK_ID}.txt
#> 
#> {
#> set -e
#> 
#> echo "**** Job starts ****"
#> date
#> 
#> echo "**** JHPCE info ****"
#> echo "User: ${USER}"
#> echo "Job id: ${SLURM_JOB_ID}"
#> echo "Job name: ${SLURM_JOB_NAME}"
#> echo "Node name: ${HOSTNAME}"
#> echo "Task id: ${SLURM_ARRAY_TASK_ID}"
#> 
#> ## Load the R module
#> module load conda_R/4.4
#> 
#> ## List current modules for reproducibility
#> module list
#> 
#> ## Edit with your job command
#> Rscript bsp2_test.R --region ${region} --feature ${feature}
#> 
#> echo "**** Job ends ****"
#> date
#> 
#> } > $log_path 2>&1
#> 
#> ## This script was made using slurmjobs version 1.3.0
#> ## available from http://research.libd.org/slurmjobs/

First, note the line Rscript bsp2_test.R --region ${region} --feature ${feature}. Every task of the array job passes a unique combination of ${region} and ${feature} to R.

Notice also that logs from executing this shell script get named with each of the variables’ values in addition to the array task ID. For example, the log for the first task would be logs/DLPFC_gene_1.txt. Also, the array specifies 8 tasks total (the product of the number of regions and features).

Let’s also look at the R script.

cat(script_pair[["R"]], sep = "\n")
#> library(getopt)
#> library(sessioninfo)
#> 
#> # Import command-line parameters
#> spec <- matrix(
#>     c(
#>         c("region", "feature"),
#>         c("r", "f"),
#>         rep("1", 2),
#>         rep("character", 2),
#>         rep("Add variable description here", 2)
#>     ),
#>     ncol = 5
#> )
#> opt <- getopt(spec)
#> 
#> message("Using the following parameters:")
#> print(opt)
#> 
#> message("Memory usage:")
#> gc()
#> 
#> session_info()
#> 
#> ## This script was made using slurmjobs version 1.3.0
#> ## available from http://research.libd.org/slurmjobs/

The code related to getopt at the top of the script reads in the unique combination of variable values into a list called opt here. For example, one task of the array job might yield values for opt$region and opt$feature to be "DLPFC" and "gene", respectively.

Submitting and Resubmitting Jobs

Shell scripts created with job_single() or job_loop() may be submitted as batch jobs with sbatch (e.g. sbatch myscript.sh). Note no additional arguments to sbatch are required since all configuration is specified within the shell script.

The array_submit() helper function was also intended to make job submission easier. In particular, it addresses a common case where after a large array job was run, a handful of tasks fail (such as due to temporary file-system issues). array_submit() helps re-submit failed tasks.

Below we’ll create an example array job with job_single(), then do a dry run of array_submit() to demonstrate its basic usage.

job_single(
    name = "my_array_job", memory = "5G", cores = 1, create_shell = TRUE,
    task_num = 10
)
#> 2025-02-04 17:57:12.563552 creating the logs directory at:  logs
#> 2025-02-04 17:57:12.564984 creating the shell file my_array_job.sh
#> To submit the job use: sbatch my_array_job.sh

#   Suppose that tasks 3, 6, 7, and 8 failed
array_submit(name = "my_array_job", task_ids = c(3, 6:8), submit = FALSE)

While task_ids can be provided explicitly as above, the real convenience comes from the ability to run array_submit() without specifying task_ids. As long as the original array job was created with job_single() or job_loop() and submitted as-is (on the full set of tasks), array_submit() can automatically find the failed tasks by reading the shell script (my_array_job.sh), grabbing the original array job ID from the log, and internally calling job_report()).

#   Not run here, since we aren't on a SLURM cluster
array_submit(name = "my_array_job", submit = FALSE)

Monitoring Running Jobs

The job_info() function provides wrappers around the squeue and sstat utilities SLURM provides for monitoring specific jobs and how busy partitions are. The general idea is to provide the information output from squeue into a tibble, while retrieving memory-utilization information that ordinarily must be retrieved manually on a job-by-job basis with sstat -j [specific job ID].

On a SLURM system, you’d run job_info_df = job_info(user = NULL, partition = "shared") here, to get every user’s jobs running on the “shared” partition. We’ll load an example output directly here.

#   On a real SLURM system
print(job_info_df)
#> # A tibble: 100 × 11
#>    job_id max_rss_gb max_vmem_gb user  array_task_id name     partition  cpus
#>     <dbl>      <dbl>       <dbl> <chr>         <int> <chr>    <fct>     <int>
#>  1 222106         NA          NA user1            69 my_job_1 shared        2
#>  2 271213         NA          NA user1            37 my_job_2 shared        1
#>  3 280839         NA          NA user1            11 my_job_3 shared        2
#>  4 285265         NA          NA user1            31 my_job_3 shared        2
#>  5 285275         NA          NA user1            41 my_job_3 shared        2
#>  6 285276         NA          NA user1            42 my_job_3 shared        2
#>  7 285281         NA          NA user1            47 my_job_3 shared        2
#>  8 285282         NA          NA user1            48 my_job_3 shared        2
#>  9 301953         NA          NA user2           180 my_job_4 shared        2
#> 10 301954         NA          NA user2           440 my_job_5 shared        2
#> # ℹ 90 more rows
#> # ℹ 3 more variables: requested_mem_gb <dbl>, status <fct>,
#> #   wallclock_time <drtn>

The benefit to having this data in R, now, is to be able to trivially ask summarizing questions. First, “how much memory and how many CPUs am I currently using?” Knowing this answer can help ensure fair and civil use of shared computing resources, for example on a computing cluster.

job_info_df |>
    #   Or your username here
    filter(user == "user21") |>
    #   Get the number of CPUs requested and the memory requested in GB
    summarize(
        total_mem_req = sum(requested_mem_gb),
        total_cpus = sum(cpus)
    ) |>
    print()
#> # A tibble: 1 × 2
#>   total_mem_req total_cpus
#>           <dbl>      <int>
#> 1            50          2

Monitoring Partitions

Sometimes, it’s useful to know about the partitions as a whole rather than about specific jobs. partition_info() serves this purpose, and parses sinfo output into a tibble. We’ll load an example of the output from partition_info(partition = NULL, all_nodes = FALSE).

print(partition_df)
#> # A tibble: 5 × 7
#>   partition   free_cpus total_cpus prop_free_cpus free_mem_gb total_mem_gb
#>   <chr>           <int>      <int>          <dbl>       <dbl>        <dbl>
#> 1 partition_1        48         48          1            126.         128.
#> 2 partition_2       324        384          0.844       1050.        1643.
#> 3 partition_3        48         48          1            127.         128.
#> 4 partition_4       412       1024          0.402       2806.        4126.
#> 5 partition_5        76        128          0.594        519.        1000.
#> # ℹ 1 more variable: prop_free_mem_gb <dbl>

Since all_nodes was FALSE, there’s one row per partition, summarizing information across all nodes that compose each partition. Alternatively, set all_nodes to TRUE to yield one row per node.

With partition_df, let’s summarize how busy the cluster is as a whole, then rank partitions by amount of free memory.

#   Print the proportion of CPUs and memory available for the whole cluster
partition_df |>
    summarize(
        prop_free_cpus = sum(free_cpus) / sum(total_cpus),
        prop_free_mem_gb = sum(free_mem_gb) / sum(total_mem_gb)
    ) |>
    print()
#> # A tibble: 1 × 2
#>   prop_free_cpus prop_free_mem_gb
#>            <dbl>            <dbl>
#> 1          0.556            0.659

#   Now let's take the top 3 partitions by memory currently available
partition_df |>
    arrange(desc(free_mem_gb)) |>
    select(partition, free_mem_gb) |>
    slice_head(n = 3)
#> # A tibble: 3 × 2
#>   partition   free_mem_gb
#>   <chr>             <dbl>
#> 1 partition_4       2806.
#> 2 partition_2       1050.
#> 3 partition_5        519.

Analyzing Finished Jobs

The job_report() function returns in-depth information about a single queued, running, or finished job (including a single array job). It combines functionality from SLURM’s sstat and sacct to return a tibble for easy manipulation in R.

Suppose you have a workflow that operates as an array job, and you’d like to profile memory usage across the many tasks. Suppose we’ve done an initial trial, setting memory relatively high just to get the jobs running without issues. One use of job_report could be to determine a better memory request in a data-driven way– the better settings can then be run on the larger dataset after the initial test.

On an actual system with SLURM installed, you’d normally run something like job_df = job_report(slurm_job_id) for the slurm_job_id (character or integer) representing the small test. For convenience, we’ll start from the output of job_report as available in the slurmjobs package.

job_df <- readRDS(
    system.file("extdata", "job_report_df.rds", package = "slurmjobs")
)
print(job_df)
#> # A tibble: 10 × 12
#>    job_id user  name     partition  cpus requested_mem_gb max_rss_gb max_vmem_gb
#>     <int> <chr> <chr>    <fct>     <int>            <dbl>      <dbl>       <dbl>
#>  1 297332 user1 broken_… shared        2                5       0.04        0.04
#>  2 297333 user1 broken_… shared        2                5       0.48        0.48
#>  3 297334 user1 broken_… shared        2                5       0.61        0.61
#>  4 297335 user1 broken_… shared        2                5       0.04        0.04
#>  5 297336 user1 broken_… shared        2                5       1.15        1.15
#>  6 297337 user1 broken_… shared        2                5       1.38        1.38
#>  7 297338 user1 broken_… shared        2                5       0.04        0.04
#>  8 297339 user1 broken_… shared        2                5       0.04        0.04
#>  9 297340 user1 broken_… shared        2                5       0.04        0.04
#> 10 297331 user1 broken_… shared        2                5       1.16        1.16
#> # ℹ 4 more variables: array_task_id <int>, exit_code <int>,
#> #   wallclock_time <drtn>, status <fct>

Now let’s choose a better memory request:

stat_df <- job_df |>
    #   This example includes tasks that fail. We're only interested in memory
    #   for successfully completed tasks
    filter(status != "FAILED") |>
    summarize(
        mean_mem = mean(max_vmem_gb),
        std_mem = sd(max_vmem_gb),
        max_mem = max(max_vmem_gb)
    )

#   We could choose a new memory request as 3 standard deviations above the mean
#   of actual memory usage
new_limit <- stat_df$mean_mem + 3 * stat_df$std_mem

print(
    sprintf(
        "%.02fG is a better memory request than %.02fG, which was used before",
        new_limit,
        job_df$requested_mem_gb[1]
    )
)
#> [1] "2.12G is a better memory request than 5.00G, which was used before"

Re-Ordering Scripts

Background

As a data scientist in the R/Bioconductor-powered Team Data Science, I organize my scripts such that there is an explicit order within a given directory (e.g. 01_first_script.R, 02_second_script.R, etc). This approach helps with readability and clarity, but the reality is that the order of scripts can easily change as new information is gathered from EDA, or analysis goals shift. A particularly annoying case arises when you have a series of say, 7 scripts, and you wish to insert a step between the 3rd and 4th scripts, which would involve renaming 4 scripts to preserve the 01 through 07 prefices. In the Team Data Science, we also typically generate one or more logs associated with each script, and shell scripts generally contain references to the name of the R script to submit with sbatch to the SLURM scheduler. In total, renaming scripts can be quite a process.

The renumber() function was developed to address situations like this. It renames scripts based on a set of prefices, updates references to the old script names with each shell script, and even renames any existing logs associated with each script. To demonstrate its function, we’ll generate some scripts and associated logs, then renumber them.

Create Simulated Data

#   Create a temporary directory that's guaranteed to be empty
base_dir <- file.path(tempdir(), "slurmjobs_scripts")
dir.create(base_dir)

#   Create a shell script that submits a corresponding R script
job_single(
    file.path(base_dir, "01_should_be_second.sh"),
    logdir = file.path(base_dir, "logs"), create_logdir = TRUE,
    create_shell = TRUE, command = "Rscript 01_should_be_second.R"
)
#> 2025-02-04 17:57:13.350336 creating the logs directory at:  /tmp/RtmpwfkuTO/slurmjobs_scripts/logs
#> 2025-02-04 17:57:13.351695 creating the shell file /tmp/RtmpwfkuTO/slurmjobs_scripts/01_should_be_second.sh
#> To submit the job use: sbatch /tmp/RtmpwfkuTO/slurmjobs_scripts/01_should_be_second.sh
writeLines("# some code", con = file.path(base_dir, "01_should_be_second.R"))

#   Create an array originally designed to be submitted second
job_loop(
    file.path(base_dir, "02_should_be_first.sh"),
    create_shell = TRUE, logdir = file.path(base_dir, "logs"),
    create_logdir = FALSE,
    loops = list(
        gene = c("gene_1", "gene_2"), method = c("method_1", "method_2")
    ) 
)
#> 2025-02-04 17:57:13.355825 Creating the shell file 02_should_be_first.sh and corresponding R script 02_should_be_first.R
#> To submit the script pair, use: sbatch /tmp/RtmpwfkuTO/slurmjobs_scripts/02_should_be_first.sh

Next, generate some corresponding logs. Note that renumber() can handle missing logs, so we’ll intentionally generate only some of the logs to demonstrate this.

all_log_base_names <- c(
    "01_should_be_second.txt", "02_should_be_first_gene_1_method_1_1.txt",
    "02_should_be_first_gene_2_method_1_2.txt"
)
for (base_name in all_log_base_names) {
    writeLines(
        "# some log text",
        con = file.path(base_dir, "logs", base_name)
    )
}

In summary, we have two shell scripts and three corresponding logs whose names indicate a confusing ordering.

list.files(base_dir)
#> [1] "01_should_be_second.R"  "01_should_be_second.sh" "02_should_be_first.R"  
#> [4] "02_should_be_first.sh"  "logs"

list.files(file.path(base_dir, 'logs'))
#> [1] "01_should_be_second.txt"                 
#> [2] "02_should_be_first_gene_1_method_1_1.txt"
#> [3] "02_should_be_first_gene_2_method_1_2.txt"

Re-Order

Suppose we wish to simply swap the order of the first and second script. We must provide the directory containing the scripts, the beginning prefices to change, finally the replacement prefices.

#   Re-order scripts (and logs)
renumber(base_dir, c("01", "02"), c("02", "01"))

We can see the names of the scripts and logs have been correctly updated.

list.files(base_dir)
#> [1] "01_should_be_first.R"   "01_should_be_first.sh"  "02_should_be_second.R" 
#> [4] "02_should_be_second.sh" "logs"

list.files(file.path(base_dir, 'logs'))
#> [1] "01_should_be_first_gene_1_method_1_1.txt"
#> [2] "01_should_be_first_gene_2_method_1_2.txt"
#> [3] "02_should_be_second.txt"

Reading the first script in, we can also see that references to the old script name have been updated. For example, the log path and Rscript command were updated.

#   Read in the first shell script
readLines(file.path(base_dir, "01_should_be_first.sh"))
#>  [1] "#!/bin/bash"                                                                                                    
#>  [2] "#SBATCH -p shared"                                                                                              
#>  [3] "#SBATCH --mem=10G"                                                                                              
#>  [4] "#SBATCH --job-name=01_should_be_first"                                                                          
#>  [5] "#SBATCH -c 1"                                                                                                   
#>  [6] "#SBATCH -t 1-00:00:00"                                                                                          
#>  [7] "#SBATCH -o /dev/null"                                                                                           
#>  [8] "#SBATCH -e /dev/null"                                                                                           
#>  [9] "#SBATCH --mail-type=ALL"                                                                                        
#> [10] "#SBATCH --array=1-4%20"                                                                                         
#> [11] ""                                                                                                               
#> [12] "## Define loops and appropriately subset each variable for the array task ID"                                   
#> [13] "all_gene=(gene_1 gene_2)"                                                                                       
#> [14] "gene=${all_gene[$(( $SLURM_ARRAY_TASK_ID / 2 % 2 ))]}"                                                          
#> [15] ""                                                                                                               
#> [16] "all_method=(method_1 method_2)"                                                                                 
#> [17] "method=${all_method[$(( $SLURM_ARRAY_TASK_ID / 1 % 2 ))]}"                                                      
#> [18] ""                                                                                                               
#> [19] "## Explicitly pipe script output to a log"                                                                      
#> [20] "log_path=/tmp/RtmpwfkuTO/slurmjobs_scripts/logs/01_should_be_first_${gene}_${method}_${SLURM_ARRAY_TASK_ID}.txt"
#> [21] ""                                                                                                               
#> [22] "{"                                                                                                              
#> [23] "set -e"                                                                                                         
#> [24] ""                                                                                                               
#> [25] "echo \"**** Job starts ****\""                                                                                  
#> [26] "date"                                                                                                           
#> [27] ""                                                                                                               
#> [28] "echo \"**** JHPCE info ****\""                                                                                  
#> [29] "echo \"User: ${USER}\""                                                                                         
#> [30] "echo \"Job id: ${SLURM_JOB_ID}\""                                                                               
#> [31] "echo \"Job name: ${SLURM_JOB_NAME}\""                                                                           
#> [32] "echo \"Node name: ${HOSTNAME}\""                                                                                
#> [33] "echo \"Task id: ${SLURM_ARRAY_TASK_ID}\""                                                                       
#> [34] ""                                                                                                               
#> [35] "## Load the R module"                                                                                           
#> [36] "module load conda_R/4.4"                                                                                        
#> [37] ""                                                                                                               
#> [38] "## List current modules for reproducibility"                                                                    
#> [39] "module list"                                                                                                    
#> [40] ""                                                                                                               
#> [41] "## Edit with your job command"                                                                                  
#> [42] "Rscript 01_should_be_first.R --gene ${gene} --method ${method}"                                                 
#> [43] ""                                                                                                               
#> [44] "echo \"**** Job ends ****\""                                                                                    
#> [45] "date"                                                                                                           
#> [46] ""                                                                                                               
#> [47] "} > $log_path 2>&1"                                                                                             
#> [48] ""                                                                                                               
#> [49] "## This script was made using slurmjobs version 1.3.0"                                                          
#> [50] "## available from http://research.libd.org/slurmjobs/"                                                          
#> [51] ""

Other Uses

Despite being developed for a somewhat specific use case, renumber() can be applied to other renaming tasks. It can rename a set of directories, and prefices need not be numbers.

Let’s first create some directories as an example.

#   Create a base directory where we could have plots
base_dir = file.path(tempdir(), "plots")
dir.create(base_dir)

#   Create some directories under 'plots'
dir.create(file.path(base_dir, "first_visualization"))
dir.create(file.path(base_dir, "then_QC"))
dir.create(file.path(base_dir, "finally_analysis"))

#   We now have three plotting directories
list.files(base_dir)
#> [1] "finally_analysis"    "first_visualization" "then_QC"

Let’s rename them and check the result.

#   Re-order directories and check their new names
renumber(base_dir, c("first", "then", "finally"), c("3rd", "1st", "2nd"))
list.files(base_dir)
#> [1] "1st_QC"            "2nd_analysis"      "3rd_visualization"

Reproducibility

The slurmjobs package (LieberInstitute, 2025) was made possible thanks to:

R (R Core Team, 2024)
BiocStyle (Oleś, 2024)
dplyr (Wickham, François, Henry, Müller, and Vaughan, 2023)
knitr (Xie, 2024)
RefManageR (McLean, 2017)
rmarkdown (Allaire, Xie, Dervieux, McPherson, Luraschi, Ushey, Atkins, Wickham, Cheng, Chang, and Iannone, 2024)
sessioninfo (Wickham, Chang, Flight, Müller, and Hester, 2021)
testthat (Wickham, 2011)

This package was developed using biocthis.

Code for creating the vignette

## Create the vignette
library("rmarkdown")
system.time(render("slurmjobs.Rmd", "BiocStyle::html_document"))

## Extract the R code
library("knitr")
knit("slurmjobs.Rmd", tangle = TRUE)

Date the vignette was generated.

#> [1] "2025-02-04 17:57:14 UTC"

Wallclock time spent generating the vignette.

#> Time difference of 2.775 secs

R session information.

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.2 (2024-10-31)
#>  os       Ubuntu 24.04.1 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language en
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       UTC
#>  date     2025-02-04
#>  pandoc   3.6 @ /usr/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  package       * version date (UTC) lib source
#>  backports       1.5.0   2024-05-23 [1] RSPM (R 4.4.0)
#>  bibtex          0.5.1   2023-01-26 [1] RSPM (R 4.4.0)
#>  BiocManager     1.30.25 2024-08-28 [2] CRAN (R 4.4.2)
#>  BiocStyle     * 2.34.0  2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
#>  bookdown        0.42    2025-01-07 [1] RSPM (R 4.4.0)
#>  bslib           0.9.0   2025-01-30 [2] RSPM (R 4.4.0)
#>  cachem          1.1.0   2024-05-16 [2] RSPM (R 4.4.0)
#>  cli             3.6.3   2024-06-21 [2] RSPM (R 4.4.0)
#>  crayon          1.5.3   2024-06-20 [2] RSPM (R 4.4.0)
#>  desc            1.4.3   2023-12-10 [2] RSPM (R 4.4.0)
#>  digest          0.6.37  2024-08-19 [2] RSPM (R 4.4.0)
#>  dplyr         * 1.1.4   2023-11-17 [1] RSPM (R 4.4.0)
#>  evaluate        1.0.3   2025-01-10 [2] RSPM (R 4.4.0)
#>  fastmap         1.2.0   2024-05-15 [2] RSPM (R 4.4.0)
#>  fs              1.6.5   2024-10-30 [2] RSPM (R 4.4.0)
#>  generics        0.1.3   2022-07-05 [1] RSPM (R 4.4.0)
#>  glue            1.8.0   2024-09-30 [2] RSPM (R 4.4.0)
#>  htmltools       0.5.8.1 2024-04-04 [2] RSPM (R 4.4.0)
#>  htmlwidgets     1.6.4   2023-12-06 [2] RSPM (R 4.4.0)
#>  httr            1.4.7   2023-08-15 [1] RSPM (R 4.4.0)
#>  jquerylib       0.1.4   2021-04-26 [2] RSPM (R 4.4.0)
#>  jsonlite        1.8.9   2024-09-20 [2] RSPM (R 4.4.0)
#>  knitcitations * 1.0.12  2021-01-10 [1] RSPM (R 4.4.0)
#>  knitr           1.49    2024-11-08 [2] RSPM (R 4.4.0)
#>  lifecycle       1.0.4   2023-11-07 [2] RSPM (R 4.4.0)
#>  lubridate       1.9.4   2024-12-08 [1] RSPM (R 4.4.0)
#>  magrittr        2.0.3   2022-03-30 [2] RSPM (R 4.4.0)
#>  pillar          1.10.1  2025-01-07 [2] RSPM (R 4.4.0)
#>  pkgconfig       2.0.3   2019-09-22 [2] RSPM (R 4.4.0)
#>  pkgdown         2.1.1   2024-09-17 [2] RSPM (R 4.4.0)
#>  plyr            1.8.9   2023-10-02 [1] RSPM (R 4.4.0)
#>  purrr           1.0.2   2023-08-10 [2] RSPM (R 4.4.0)
#>  R6              2.5.1   2021-08-19 [2] RSPM (R 4.4.0)
#>  ragg            1.3.3   2024-09-11 [2] RSPM (R 4.4.0)
#>  Rcpp            1.0.14  2025-01-12 [2] RSPM (R 4.4.0)
#>  RefManageR    * 1.4.0   2022-09-30 [1] RSPM (R 4.4.0)
#>  rlang           1.1.5   2025-01-17 [2] RSPM (R 4.4.0)
#>  rmarkdown       2.29    2024-11-04 [2] RSPM (R 4.4.0)
#>  sass            0.4.9   2024-03-15 [2] RSPM (R 4.4.0)
#>  sessioninfo   * 1.2.2   2021-12-06 [2] RSPM (R 4.4.0)
#>  slurmjobs     * 1.3.0   2025-02-04 [1] local
#>  stringi         1.8.4   2024-05-06 [2] RSPM (R 4.4.0)
#>  stringr         1.5.1   2023-11-14 [2] RSPM (R 4.4.0)
#>  systemfonts     1.2.1   2025-01-20 [2] RSPM (R 4.4.0)
#>  textshaping     1.0.0   2025-01-20 [2] RSPM (R 4.4.0)
#>  tibble          3.2.1   2023-03-20 [2] RSPM (R 4.4.0)
#>  tidyselect      1.2.1   2024-03-11 [1] RSPM (R 4.4.0)
#>  timechange      0.3.0   2024-01-18 [1] RSPM (R 4.4.0)
#>  utf8            1.2.4   2023-10-22 [2] RSPM (R 4.4.0)
#>  vctrs           0.6.5   2023-12-01 [2] RSPM (R 4.4.0)
#>  withr           3.0.2   2024-10-28 [2] RSPM (R 4.4.0)
#>  xfun            0.50    2025-01-07 [2] RSPM (R 4.4.0)
#>  xml2            1.3.6   2023-12-04 [2] RSPM (R 4.4.0)
#>  yaml            2.3.10  2024-07-26 [2] RSPM (R 4.4.0)
#> 
#>  [1] /__w/_temp/Library
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/local/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Bibliography

This vignette was generated using BiocStyle (Oleś, 2024) with knitr (Xie, 2024) and rmarkdown (Allaire, Xie, Dervieux et al., 2024) running behind the scenes.

Citations made with RefManageR (McLean, 2017).

[1] J. Allaire, Y. Xie, C. Dervieux, et al. rmarkdown: Dynamic Documents for R. R package version 2.29. 2024. URL: https://github.com/rstudio/rmarkdown.

[2] LieberInstitute. slurmjobs: Helper Functions for SLURM Jobs. https://github.com/LieberInstitute/slurmjobs/slurmjobs - R package version 1.3.0. 2025. DOI: 10.18129/B9.bioc.slurmjobs. URL: http://www.bioconductor.org/packages/slurmjobs.

[3] M. W. McLean. “RefManageR: Import and Manage BibTeX and BibLaTeX References in R”. In: The Journal of Open Source Software (2017). DOI: 10.21105/joss.00338.

[4] A. Oleś. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.34.0. 2024. DOI: 10.18129/B9.bioc.BiocStyle. URL: https://bioconductor.org/packages/BiocStyle.

[5] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2024. URL: https://www.R-project.org/.

[6] H. Wickham. “testthat: Get Started with Testing”. In: The R Journal 3 (2011), pp. 5–10. URL: https://journal.r-project.org/archive/2011-1/RJournal_2011-1_Wickham.pdf.

[7] H. Wickham, W. Chang, R. Flight, et al. sessioninfo: R Session Information. R package version 1.2.2, https://r-lib.github.io/sessioninfo/. 2021. URL: https://github.com/r-lib/sessioninfo#readme.

[8] H. Wickham, R. François, L. Henry, et al. dplyr: A Grammar of Data Manipulation. R package version 1.1.4, https://github.com/tidyverse/dplyr. 2023. URL: https://dplyr.tidyverse.org.

[9] Y. Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.49. 2024. URL: https://yihui.org/knitr/.

Nicholas J. Eagles

Leonardo Collado-Torres

4 February 2025

Basics

Install `slurmjobs`

Required knowledge

Asking for help

Citing `slurmjobs`

Overview

Creating Shell Scripts to `sbatch`

Submitting and Resubmitting Jobs

Monitoring Running Jobs

Monitoring Partitions

Analyzing Finished Jobs

Re-Ordering Scripts

Background

Create Simulated Data

Re-Order

Other Uses

Reproducibility

Bibliography

Introduction to slurmjobs

Nicholas J. Eagles

Leonardo Collado-Torres

4 February 2025

Basics

Install slurmjobs

Required knowledge

Asking for help

Citing slurmjobs

Overview

Creating Shell Scripts to sbatch

Submitting and Resubmitting Jobs

Monitoring Running Jobs

Monitoring Partitions

Analyzing Finished Jobs

Re-Ordering Scripts

Background

Create Simulated Data

Re-Order

Other Uses

Reproducibility

Bibliography

Install `slurmjobs`

Citing `slurmjobs`

Creating Shell Scripts to `sbatch`