R/gencode_genomic_state.R
gencode_genomic_state.Rd
Based on a TxDb
object built by gencode_txdb()
this function builds a
GenomicState
object which you can then use with
derfinder::annotateRegions()
. This information is then used by packages
like derfinderPlot
.
gencode_genomic_state(txdb)
A GenomicFeatures::TxDb object built with
gencode_txdb()
.
A GenomicState object with the gene symbols as built using
derfinder::makeGenomicState()
.
Note that not all genes will have symbols as many will be NA
.
Based on code for the brainflowprobes
package at:
https://github.com/LieberInstitute/brainflowprobes/blob/devel/data-raw/create_sysdata.R
## Start from scratch if you want:
if (FALSE) {
txdb_v31_hg19_chr21 <- gencode_txdb("31", "hg19", chrs = "chr21")
}
## or read in the txdb object for hg19 chr21 from this package
txdb_v31_hg19_chr21 <- AnnotationDbi::loadDb(
system.file("extdata", "txdb_v31_hg19_chr21.sqlite",
package = "GenomicState"
)
)
## Now build the GenomicState object
gs_v31_hg19_chr21 <- gencode_genomic_state(txdb_v31_hg19_chr21)
#> 2023-05-07 06:38:09.080103 making the GenomicState object
#> extendedMapSeqlevels: sequence names mapped from NCBI to UCSC for species homo_sapiens
#> 'select()' returned 1:1 mapping between keys and columns
#> 2023-05-07 06:38:14.614311 finding gene symbols
#> 'select()' returned 1:many mapping between keys and columns
#> 2023-05-07 06:38:14.989506 adding gene symbols to the GenomicState
## Explore the result
gs_v31_hg19_chr21
#> $fullGenome
#> GRanges object with 7871 ranges and 5 metadata columns:
#> seqnames ranges strand | theRegion tx_id
#> <Rle> <IRanges> <Rle> | <character> <IntegerList>
#> 1 chr21 9492380-9492817 + | exon 1
#> 2 chr21 9590294-9590395 + | exon 2
#> 3 chr21 9647910-9648694 + | exon 3,4
#> 4 chr21 9648695-9650108 + | intron 3,4
#> 5 chr21 9650109-9650168 + | exon 3,4
#> ... ... ... ... . ... ...
#> 7867 chr21 47865683-47878803 * | intergenic
#> 7868 chr21 47989929-48018516 * | intergenic
#> 7869 chr21 48025122-48055506 * | intergenic
#> 7870 chr21 48085037-48110675 * | intergenic
#> 7871 chr21 48111139-48129895 * | intergenic
#> tx_name gene symbol
#> <CharacterList> <IntegerList> <CharacterList>
#> 1 ENST00000625020.1_1 776 <NA>
#> 2 ENST00000625098.1_1 754 <NA>
#> 3 ENST00000623794.3_1,ENST00000624813.1_1 766 <NA>
#> 4 ENST00000623794.3_1,ENST00000624813.1_1 766 <NA>
#> 5 ENST00000623794.3_1,ENST00000624813.1_1 766 <NA>
#> ... ... ... ...
#> 7867
#> 7868
#> 7869
#> 7870
#> 7871
#> -------
#> seqinfo: 1 sequence from hg19 genome
#>
#> $codingGenome
#> GRanges object with 10361 ranges and 5 metadata columns:
#> seqnames ranges strand | theRegion tx_id
#> <Rle> <IRanges> <Rle> | <character> <IntegerList>
#> 1 chr21 9490380-9492379 + | promoter 1
#> 2 chr21 9492380-9492817 + | exon 1
#> 3 chr21 9588294-9590293 + | promoter 2
#> 4 chr21 9590294-9590395 + | exon 2
#> 5 chr21 9590396-9590493 + | promoter 2
#> ... ... ... ... . ... ...
#> 10357 chr21 47865683-47876803 * | intergenic
#> 10358 chr21 47989929-48018516 * | intergenic
#> 10359 chr21 48027122-48053506 * | intergenic
#> 10360 chr21 48085037-48108675 * | intergenic
#> 10361 chr21 48111139-48129895 * | intergenic
#> tx_name gene symbol
#> <CharacterList> <IntegerList> <CharacterList>
#> 1 ENST00000625020.1_1 776 <NA>
#> 2 ENST00000625020.1_1 776 <NA>
#> 3 ENST00000625098.1_1 754 <NA>
#> 4 ENST00000625098.1_1 754 <NA>
#> 5 ENST00000625098.1_1 754 <NA>
#> ... ... ... ...
#> 10357
#> 10358
#> 10359
#> 10360
#> 10361
#> -------
#> seqinfo: 1 sequence from hg19 genome
#>