Internal function for creating a recount3 RangedSummarizedExperiment object

This function is used internally by create_rse() to construct a recount3 RangedSummarizedExperiment-class object that contains the base-pair coverage counts at the gene or exon feature level for a given annotation.

create_rse_manual(
  project,
  project_home = project_homes(organism = organism, recount3_url = recount3_url),
  type = c("gene", "exon", "jxn"),
  organism = c("human", "mouse"),
  annotation = annotation_options(organism),
  bfc = recount3_cache(),
  jxn_format = c("ALL", "UNIQUE"),
  recount3_url = getOption("recount3_url", "http://duffel.rail.bio/recount3"),
  verbose = getOption("recount3_verbose", TRUE)
)

Arguments

project: A character(1) with the ID for a given study.
project_home: A character(1) with the home directory for the project. You can find these using project_homes().
type: A character(1) specifying whether you want to access gene, exon, or exon-exon junction counts.
organism: A character(1) specifying which organism you want to download data from. Supported options are "human" or "mouse".
annotation: A character(1) specifying which annotation you want to download. Only used when type is either gene or exon.
bfc: A BiocFileCache-class object where the files will be cached to, typically created by recount3_cache().
jxn_format: A character(1) specifying whether the exon-exon junction files are derived from all the reads (ALL) or only the uniquely mapping read counts (UNIQUE). Note that UNIQUE is only available for some projects: GTEx and TCGA for human.
recount3_url: A character(1) specifying the home URL for recount3 or a local directory where you have mirrored recount3. Defaults to the load balancer http://duffel.rail.bio/recount3, but can also be https://recount-opendata.s3.amazonaws.com/recount3/release from https://registry.opendata.aws/recount/ or from IDIES at JHU https://idies.jhu.edu/recount3/data (which redirects to https://data.idies.jhu.edu/recount3/data/). You can set the R option recount3_url (for example in your .Rprofile) if you have a favorite mirror.
verbose: A logical(1) indicating whether to show messages with updates.

Value

A RangedSummarizedExperiment-class object.

References

https://doi.org/10.12688/f1000research.12223.1 for details on the base-pair coverage counts used in recount2 and recount3.

Examples


## Unlike create_rse(), here we create an RSE object by
## fully specifying all the arguments for locating this study
rse_gene_SRP009615_manual <- create_rse_manual(
    "SRP009615",
    "data_sources/sra"
)
#> 2025-09-24 22:48:29.709663 downloading and reading the metadata.
#> 2025-09-24 22:48:30.352762 caching file sra.sra.SRP009615.MD.gz.
#> 2025-09-24 22:48:30.927683 caching file sra.recount_project.SRP009615.MD.gz.
#> 2025-09-24 22:48:31.574576 caching file sra.recount_qc.SRP009615.MD.gz.
#> 2025-09-24 22:48:32.194179 caching file sra.recount_seq_qc.SRP009615.MD.gz.
#> 2025-09-24 22:48:32.895844 caching file sra.recount_pred.SRP009615.MD.gz.
#> 2025-09-24 22:48:32.958578 downloading and reading the feature information.
#> 2025-09-24 22:48:33.824208 caching file human.gene_sums.G026.gtf.gz.
#> 2025-09-24 22:48:34.184957 downloading and reading the counts: 12 samples across 63856 features.
#> 2025-09-24 22:48:34.934369 caching file sra.gene_sums.SRP009615.G026.gz.
#> 2025-09-24 22:48:35.071503 constructing the RangedSummarizedExperiment (rse) object.
rse_gene_SRP009615_manual
#> class: RangedSummarizedExperiment 
#> dim: 63856 12 
#> metadata(8): time_created recount3_version ... annotation recount3_url
#> assays(1): raw_counts
#> rownames(63856): ENSG00000278704.1 ENSG00000277400.1 ...
#>   ENSG00000182484.15_PAR_Y ENSG00000227159.8_PAR_Y
#> rowData names(10): source type ... havana_gene tag
#> colnames(12): SRR387777 SRR387778 ... SRR389077 SRR389078
#> colData names(175): rail_id external_id ...
#>   recount_pred.curated.cell_line BigWigURL

## Check how much memory this RSE object uses
pryr::object_size(rse_gene_SRP009615_manual)
#> 24.81 MB

## Test with a collection that has a single sample
## NOTE: this requires loading the full data for this study when
## creating the RSE object
rse_gene_ERP110066_collection_manual <- create_rse_manual(
    "ERP110066",
    "collections/geuvadis_smartseq",
    recount3_url = "http://snaptron.cs.jhu.edu/data/temp/recount3"
)
#> 2025-09-24 22:48:35.162494 downloading and reading the metadata.
#> 2025-09-24 22:48:35.359777 caching file geuvadis_smartseq.recount_project.gz.
#> 2025-09-24 22:48:35.483551 caching file sra.sra.ERP110066.MD.gz.
#> 2025-09-24 22:48:35.600442 caching file sra.recount_project.ERP110066.MD.gz.
#> 2025-09-24 22:48:35.907604 caching file sra.recount_qc.ERP110066.MD.gz.
#> Warning: The 'url' <http://snaptron.cs.jhu.edu/data/temp/recount3/human/data_sources/sra/metadata/66/ERP110066/sra.recount_seq_qc.ERP110066.MD.gz> does not exist or is not available.
#> 2025-09-24 22:48:36.074936 caching file sra.recount_pred.ERP110066.MD.gz.
#> 2025-09-24 22:48:36.183037 caching file geuvadis_smartseq.custom.gz.
#> 2025-09-24 22:48:36.380186 downloading and reading the feature information.
#> 2025-09-24 22:48:36.443753 caching file human.gene_sums.G026.gtf.gz.
#> 2025-09-24 22:48:36.799678 downloading and reading the counts: 1 sample across 63856 features.
#> 2025-09-24 22:48:36.86341 caching file sra.gene_sums.ERP110066.G026.gz.
#> 2025-09-24 22:48:38.206916 constructing the RangedSummarizedExperiment (rse) object.
rse_gene_ERP110066_collection_manual
#> class: RangedSummarizedExperiment 
#> dim: 63856 1 
#> metadata(8): time_created recount3_version ... annotation recount3_url
#> assays(1): raw_counts
#> rownames(63856): ENSG00000278704.1 ENSG00000277400.1 ...
#>   ENSG00000182484.15_PAR_Y ENSG00000227159.8_PAR_Y
#> rowData names(10): source type ... havana_gene tag
#> colnames(1): ERR2713106
#> colData names(162): rail_id external_id ... custom.sequencing_type
#>   BigWigURL

## Check how much memory this RSE object uses
pryr::object_size(rse_gene_ERP110066_collection_manual)
#> 19.16 MB

## Mouse example
rse_gene_DRP002367_manual <- create_rse_manual(
    "DRP002367",
    "data_sources/sra",
    organism = "mouse"
)
#> 2025-09-24 22:48:38.305464 downloading and reading the metadata.
#> 2025-09-24 22:48:39.351688 caching file sra.sra.DRP002367.MD.gz.
#> 2025-09-24 22:48:40.142531 caching file sra.recount_project.DRP002367.MD.gz.
#> 2025-09-24 22:48:40.774426 caching file sra.recount_qc.DRP002367.MD.gz.
#> 2025-09-24 22:48:41.66678 caching file sra.recount_seq_qc.DRP002367.MD.gz.
#> 2025-09-24 22:48:42.783109 caching file sra.recount_pred.DRP002367.MD.gz.
#> 2025-09-24 22:48:42.849799 downloading and reading the feature information.
#> 2025-09-24 22:48:43.674058 caching file mouse.gene_sums.M023.gtf.gz.
#> 2025-09-24 22:48:44.051297 downloading and reading the counts: 4 samples across 55421 features.
#> 2025-09-24 22:48:44.64444 caching file sra.gene_sums.DRP002367.M023.gz.
#> 2025-09-24 22:48:44.759023 constructing the RangedSummarizedExperiment (rse) object.
rse_gene_DRP002367_manual
#> class: RangedSummarizedExperiment 
#> dim: 55421 4 
#> metadata(8): time_created recount3_version ... annotation recount3_url
#> assays(1): raw_counts
#> rownames(55421): ENSMUSG00000079800.2 ENSMUSG00000095092.1 ...
#>   ENSMUSG00000096850.1 ENSMUSG00000099871.1
#> rowData names(11): source type ... havana_gene tag
#> colnames(4): DRR023307 DRR023308 DRR023309 DRR023310
#> colData names(177): rail_id external_id ...
#>   recount_pred.curated.cell_line BigWigURL

## Information about how this RSE was made
metadata(rse_gene_DRP002367_manual)
#> $time_created
#> [1] "2025-09-24 22:48:44 UTC"
#> 
#> $recount3_version
#>           package ondiskversion loadedversion                        path
#> recount3 recount3        1.19.3        1.19.3 /__w/_temp/Library/recount3
#>                           loadedpath attached is_base       date       source
#> recount3 /__w/_temp/Library/recount3     TRUE   FALSE 2025-09-24 Bioconductor
#>          md5ok            library
#> recount3    NA /__w/_temp/Library
#> 
#> $project
#> [1] "DRP002367"
#> 
#> $project_home
#> [1] "data_sources/sra"
#> 
#> $type
#> [1] "gene"
#> 
#> $organism
#> [1] "mouse"
#> 
#> $annotation
#> [1] "gencode_v23"
#> 
#> $recount3_url
#> [1] "http://duffel.rail.bio/recount3"
#> 

## Test with a collection that has one sample, at the exon level
## NOTE: this requires loading the full data for this study (nearly 6GB!)
if (FALSE) { # \dontrun{
rse_exon_ERP110066_collection_manual <- create_rse_manual(
    "ERP110066",
    "collections/geuvadis_smartseq",
    type = "exon",
    recount3_url = "http://snaptron.cs.jhu.edu/data/temp/recount3"
)
rse_exon_ERP110066_collection_manual


## Check how much memory this RSE object uses
pryr::object_size(rse_exon_ERP110066_collection_manual)
# 409 MB

## Test with a collection that has one sample, at the junction level
## NOTE: this requires loading the full data for this study
system.time(rse_jxn_ERP110066_collection_manual <- create_rse_manual(
    "ERP110066",
    "collections/geuvadis_smartseq",
    type = "jxn",
    recount3_url = "http://snaptron.cs.jhu.edu/data/temp/recount3"
))
rse_jxn_ERP110066_collection_manual

## Check how much memory this RSE object uses
## NOTE: this doesn't run since 2 files are missing on the test site!
pryr::object_size(rse_jxn_ERP110066_collection_manual)
} # }

if (FALSE) { # \dontrun{
## For testing and debugging
project <- "ERP110066"
project_home <- "collections/geuvadis_smartseq"

project <- "SRP009615"
project_home <- "data_sources/sra"
type <- "gene"
organism <- "human"
annotation <- "gencode_v26"
jxn_format <- "ALL"
bfc <- recount3_cache()
recount3_url <- "http://idies.jhu.edu/recount3/data"
verbose <- TRUE
} # }

Internal function for creating a recount3 RangedSummarizedExperiment object

Arguments

Value

References

See also

Examples