Is my use of mapfromtranscripts correct in GenomicFeatures package? converting cDNA coordinates to genomic coordinates with GenomicFeatures
0
0
Entering edit mode
3.0 years ago
anon • 0

Hello, I am trying to convert cDNA coordinates to 3'UTR coordinates using the mapFromTranscripts function from the GenomicFeatures package.

I've defined my ranges and transcript ID:

txcoords_test<- IRanges(start=c(2306),end=c(2329), width=c(24), 
        names=c("ENST00000370115"))                                                   
gr_testcoords<-GRanges(seqnames="ENST00000370115", ranges=txcoords_test, strand = "*")

My all_utr3 is a GRanges object from threeUTRsByTranscript (txdb) modified to contain the tx_id, tx_name and gene_id.

txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene #hg19 txdb works as well as has apparently been updated to h38

#try on transcripts 
txdb_transcripts<- transcripts(txdb)
#remove the versions and make names 
mcols(txdb_transcripts)$tx_name<- gsub('.{2}$', '', txdb_transcripts$tx_name) 
names(txdb_transcripts)<-mcols(txdb_transcripts)$tx_name
head(txdb_transcripts)

#use mapfrom on transcripts
mapFromTranscripts(gr_testcoords, txdb_transcripts)#we get the genomic coords for the cds

#coords for the 3'UTRs
ucsc_utr3<- threeUTRsByTranscript(txdb)

#remove the versions at the end of tx_id and #add names to all_utr3
all_utr3<- unlist(ucsc_utr3, use.names=FALSE)
head(all_utr3)

mcols(all_utr3)$tx_id<-rep(as.integer(names(ucsc_utr3)), lengths(ucsc_utr3))

#all_utr3 doesnt have the transcript information
#add the tx_id, tx_name metadata column
tx2gene<- mcols(transcripts(txdb, columns=c("tx_id","tx_name","gene_id")))
head(tx2gene)

#add tx_name and to all_utr3 by matching tx_id
m<- match(mcols(all_utr3)$tx_id, tx2gene$tx_id)
mcols(all_utr3)<- cbind(mcols(all_utr3),tx2gene[m,-1L, drop=FALSE])


#modify the transcript versions (no version since my transcript names dont have any)
mcols(all_utr3)$tx_name<- gsub('.{2}$', '', all_utr3$tx_name) 

#set the names to the tx names
names(all_utr3)<-mcols(all_utr3)$tx_name

When I run the mapFromTranscripts, I get back an empty GRanges object.

mapFromTranscripts(gr_testcoords, all_utr3)

However, on subsetting all_utr3 for my transcript ID, the 3'UTR coordinates are present. I'm not sure what I'm missing that will allow my coordinates to map onto the 3'UTR coordinates in all_utr3.

subset(all_utr3, names(all_utr3)=="ENST00000370115")

What I'm expecting for ENST00000370115 is a range within the 3'UTR: 100738869-100738892. (from BioMart, the full 3'UTR is between 100738284-100739038). I'm not sure what I'm missing that will allow my cDNA coordinates to map onto the 3'UTR coordinates in all_utr3.

bioconductor genomicfeatures r rna22 • 676 views
ADD COMMENT

Login before adding your answer.

Traffic: 2049 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6