Annotate bedfile with Transcript, Exon No & CDS
1
0
Entering edit mode
3.6 years ago
mrxcm3 ▴ 80

I have a BED file containing sequencing gaps

17  41223034    41223081
13  32929236    32929276
17  7574019 7574057

I would like to annotate this BED file with gene symbol, transcript id (RefSeq) and map the genomic coordinates to the CDS.

I have been trying to use GenomicFeatures in R to do this, and have got part way.

# import BED as GRanges object
bedfile <- rtracklayer::import.bed('./data/test.bed')

# import refseq transcripts as TxDB object
refseq_txdb <- makeTxDbFromUCSC(genome="hg19", tablename="ncbiRefSeq")

# split cds by transcript
cds  <- cdsBy(x = refseq_txdb, by = c("tx"), use.names=TRUE)

# map between genomic and cds coordinate system
bed2cds <- mapToTranscripts(x = bedfile, transcripts = cds, intronJunctions=TRUE)

This gives me a GRanges object that has the RefSeq IDs and CDS coordinates. However, I cannot easily find a way to update each entry of this object with Gene ID (Symbol) and Exon Number. And advice greatly appreciated.

GRanges object with 8 ranges and 3 metadata columns:
            seqnames      ranges strand |     xHits transcriptsHits  intronic
               <Rle>   <IRanges>  <Rle> | <integer>       <integer> <logical>
  [1]    NM_007298.3   1538-1584      - |         1           43630     FALSE
  [2]    NM_007294.4   4850-4896      - |         1           43631     FALSE
  [3]    NM_007297.4   4709-4755      - |         1           43632     FALSE
  [4]    NM_007299.4   1538-1584      - |         1           43633     FALSE
  [5]    NM_007300.4   4913-4959      - |         1           43634     FALSE
  [6]    NM_000059.3   7247-7286      + |         2           35159     FALSE
  [7] NM_001276697.2 -1986--1987      - |         3           43034     FALSE
  [8] NM_001126113.2   1042-1041      - |         3           43046      TRUE
  -------
R bioconductor • 840 views
ADD COMMENT
1
Entering edit mode
3.6 years ago

If going this approach via rtracklayer, then you could do an additional overlap by following the code here: A: How to extract the list of genes from TCGA CNV data

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6