5 Annotation
BiocMAP can be run with hg38
, hg19
, or mm10
references. The pipeline has a default and automated process for pulling and building annotation-related files, but the user can opt to provide their own annotation as an alternative. Both of these options are documented below. Example annotation files below are the ones used with default configuration when hg38
reference is selected.
- A genome assembly fasta: the reference genome to align reads to, like the file here (but unzipped).
- Gene annotation gtf: containing transcript data, like the file here (but unzipped).
- The lambda transcriptome: for experiments utilizing spike-ins of the lambda bacteriophage genome, the transcriptome provided here is used (but unzipped).
5.1 Default Annotation
BiocMAP uses annotation files provided by GENCODE.
5.1.1 Choosing a release
With genome annotation constantly being updated, the user may want to use a particular GENCODE release. The configuration variables gencode_version_human
and gencode_version_mouse
control which GENCODE release is used for the human and mouse genomes, respectively.
5.1.2 Choosing a “build”
Depending on the analysis you are doing, you may wish to only consider the reference chromosomes (for humans, the 25 sequences “chr1” through “chrM”) for alignment and methylation extraction. BiocMAP provides the option to choose from two annotation “builds” for a given release and reference, called “main” and “primary” (following the naming convention from GENCODE databases).
- The “main” build consists of only the canonical “reference” sequences for each species
- The “primary” build consists of the canonical “reference” sequences and additional scaffolds, as a genome primary assembly fasta from GENCODE would contain.
See the variable annotation_build
in your configuration file for making this selection for your pipeline run.
5.2 Custom Annotation
You may wish to provide a genome FASTA (the reference genome to align reads to), such as the file here, in place of the automatically managed GENCODE files described in the above section.
You must also add the --custom_anno [label]
argument to your execution scripts, to specify you are using custom annotation files. The “label” is a string you want to include in filenames generated from the annotation files you provided. This is intended to allow the use of potentially many different custom annotations, assigned a unique and informative name you choose each time. This can be anything except an empty string (which internally signifies not to use custom annotation).