Hello, I am trying to convert cDNA coordinates to 3'UTR coordinates using the mapFromTranscripts function from the GenomicFeatures package.
I've defined my ranges and transcript ID:
txcoords_test<- IRanges(start=c(2306),end=c(2329), width=c(24),
names=c("ENST00000370115"))
gr_testcoords<-GRanges(seqnames="ENST00000370115", ranges=txcoords_test, strand = "*")
My all_utr3 is a GRanges object from threeUTRsByTranscript (txdb)
modified to contain the tx_id, tx_name and gene_id.
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene #hg19 txdb works as well as has apparently been updated to h38
#try on transcripts
txdb_transcripts<- transcripts(txdb)
#remove the versions and make names
mcols(txdb_transcripts)$tx_name<- gsub('.{2}$', '', txdb_transcripts$tx_name)
names(txdb_transcripts)<-mcols(txdb_transcripts)$tx_name
head(txdb_transcripts)
#use mapfrom on transcripts
mapFromTranscripts(gr_testcoords, txdb_transcripts)#we get the genomic coords for the cds
#coords for the 3'UTRs
ucsc_utr3<- threeUTRsByTranscript(txdb)
#remove the versions at the end of tx_id and #add names to all_utr3
all_utr3<- unlist(ucsc_utr3, use.names=FALSE)
head(all_utr3)
mcols(all_utr3)$tx_id<-rep(as.integer(names(ucsc_utr3)), lengths(ucsc_utr3))
#all_utr3 doesnt have the transcript information
#add the tx_id, tx_name metadata column
tx2gene<- mcols(transcripts(txdb, columns=c("tx_id","tx_name","gene_id")))
head(tx2gene)
#add tx_name and to all_utr3 by matching tx_id
m<- match(mcols(all_utr3)$tx_id, tx2gene$tx_id)
mcols(all_utr3)<- cbind(mcols(all_utr3),tx2gene[m,-1L, drop=FALSE])
#modify the transcript versions (no version since my transcript names dont have any)
mcols(all_utr3)$tx_name<- gsub('.{2}$', '', all_utr3$tx_name)
#set the names to the tx names
names(all_utr3)<-mcols(all_utr3)$tx_name
When I run the mapFromTranscripts, I get back an empty GRanges object.
mapFromTranscripts(gr_testcoords, all_utr3)
However, on subsetting all_utr3 for my transcript ID, the 3'UTR coordinates are present. I'm not sure what I'm missing that will allow my coordinates to map onto the 3'UTR coordinates in all_utr3.
subset(all_utr3, names(all_utr3)=="ENST00000370115")
What I'm expecting for ENST00000370115 is a range within the 3'UTR: 100738869-100738892. (from BioMart, the full 3'UTR is between 100738284-100739038). I'm not sure what I'm missing that will allow my cDNA coordinates to map onto the 3'UTR coordinates in all_utr3.