Get the longest transcript in GenomicRanges
0
2
Entering edit mode
3.9 years ago

I am trying to plot an specific region using ggbio. I am using the below code that produces my desire output, except that it contains several transcript. Is it possible to only plot the longest transcript for each gene? I've not been able to access the genomic ranges object within Homo.sapiens that I assume contains this information.

library(ggbio)
library(Homo.sapiens)
data(genesymbol, package = "biovizBase")

range <- GRanges("chr10"  , IRanges(start = 78000000 , end = 79000000))
p.txdb <- autoplot(Homo.sapiens, which = range)
p.txdb
ggbio r genomicranges • 2.0k views
ADD COMMENT
0
Entering edit mode

I tried this yesterday for quite some time and did not come up with a satisfying and simple solution, maybe some Twitter people have an idea: https://twitter.com/ATpoint90/status/1265946486602444800

ADD REPLY
0
Entering edit mode

Thanks for the help. I've checked which data can be retrieved from the Homo.sapiens object using columns(Homo.sapiens), and there is actually information on the transcript start and transcript end. I think I will need to get the width first, and then modify the Homo.sapiens object somehow?

ADD REPLY
0
Entering edit mode

Looks like this is what you want to (once you get the ID of the longest transcript) : A: Plotting a single transcript with ggbio

ADD REPLY
0
Entering edit mode

I checked that, and a TxDb object does not contain gene names... I would like to show the name of the genes not the transcripts names.

ADD REPLY
0
Entering edit mode

That definitely works but there must be an easier (in-built way, I hope). Anyway, getting the longest transcript is simple:

library(ggbio)
library(Homo.sapiens)

data(genesymbol, package = "biovizBase")

GetLongest <- function(Ranges){

  ## list transcripts
  tx <- transcripts(Homo.sapiens)

  ## list genes
  gns <- genes(Homo.sapiens)

  ## get transcripts overlapping the ranges
  subs <- subsetByOverlaps(tx, Ranges)

  ## overlap with genes to get gene names
  olap <- findOverlaps(subs, gns)

  ## some transcripts might not have a matched gene in the database
  subs <- subs[olap@from]

  ## add gene ID to the metadata:
  elementMetadata(subs) <- data.frame(elementMetadata(subs), 
                                      GENEID = as.character(gns[olap@to]$GENEID))

  ## sort by length and take the unique genes.
  ## that gives you the longest transcript per annotated gene
  subs <- subs[order(width(subs), decreasing = TRUE)]
  longest <- subs[!duplicated(subs$GENEID)]
  return(longest)
}

GetLongest(GRanges("chr10"  , IRanges(start = 78000000 , end = 79000000)))

GRanges object with 2 ranges and 3 metadata columns:
      seqnames            ranges strand |   TXID     TXNAME      GENEID
         <Rle>         <IRanges>  <Rle> | <list>     <list> <character>
  [1]    chr10 77542519-78317126      + |  38363 uc001jxi.3       83938
  [2]    chr10 78629359-79397577      - |  40147 uc001jxj.2        3778
  -------
  seqinfo: 93 sequences (1 circular) from hg19 genome
ADD REPLY
0
Entering edit mode

Great, thanks! So, then, within the autoplot function, how do I select only those transcripts? I am trying to do that, but it keeps printing all transcripts.

ADD REPLY
0
Entering edit mode

I just found out that I can reduce the transcripts... which is not exactly what I want, as I think this is collapsing all the transcripts... i.e. p.txdb <- autoplot(Homo.sapiens, which = range, stat = "reduce")

ADD REPLY
0
Entering edit mode

I am trying to do that, but it keeps printing all transcripts.

Yeah, that is unfortunately what I did not find out yet.

ADD REPLY

Login before adding your answer.

Traffic: 2469 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6